Monday, April 28, 2014

Sitecore: Set Up New Lucene Index and Search Result Class

Background

Lucene is great when you don't want to implement Solr.  Sure, it is not as powerful as Solr nor does it do faceting, but for all simple searches that return results, Lucene does the job well.  Both utilize the concept of creating indexes and querying for results against the indexes.


Requirements

Out-of-the-box, Sitecore defined indexes for general items like the content node, the media library, and templates across all three databases.  I am sure you can query the content node of the master database to get anything you want but you don't have to.  A better way is to create indexes that are less general and more specific and tailored to your needs. 


Set Up New Index

The general master database index is defined here:

Sitecore.ContentSearch.Lucene.Index.Master.config

The smaller indexes are boken down and defined here:

Sitecore.ContentSearch.Lucene.Indexes.Sharded.Master.config

If you want to define a new smaller index for the master database, this is where it should be defined.

Add a new index definition like this:



Here you will see that a custom crawler has been used.  You can refer to an older posting for more details about setting up custom crawlers:

http://mrstevenzhao.blogspot.com/2014/04/sitecore-custom-item-crawler.html

At this point, if you go to Sitecore admin control panel, you will see this new index listed when you navigate to the section for rebuilding indexes.



Set Up New Search Result Class

Sitecore has a BuilInFields class that contains index field names of general fields.  We can use those index fields to create a base class that only contains fields we care about.  Lets create one right now:




Please remember that all search result classes have to inherit from the SearchResultItem class which gives you even more fields and methods already defined there.

Now, we can create more custom classes based on top of the base result item class we just created.  Let's create one:



Keep in mind that Fields is just a static mapping class that stores key-value pairs for index field names.  Since these two new fields are not standard fields, we have to define these two new fields in the configuration file, Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config.  Add this to the <fieldNames> section:



Now, Lucene knows about these two new fields and where to store the indexed values under.  Rebuild the index and the values for these new fields will be available to query against.




Summary

It is quite easy and beneficial to add new indexes.  If you add custom fields in your templates that need to be indexed, you would have to create custom indexes and search result classes to handle the new fields.  Also, you can isolate your queries away from the main master database index which also speeds up your queries and speeds up index rebuilds as well.

Friday, April 11, 2014

Sitecore: Custom Item Crawler

Background

An Item Crawler crawls the child nodes within a given parent node and determines if any of the items are fit to be added to a Lucene index.  Let's start by looking at a configuration file that defines all the indexes in the master database:

Sitecore.ContentSearch.Lucene.Indexes.Sharded.Master.config

All the indexes defined in this file used the standard crawler:



This is the built-in crawler for general purposes.  It includes methods like "IsExcludedFromIndex" to determine if an item should be added to the index or not.


Requirements

Not let's say you have a "/sitecore/content" node that has lots and lots of child nodes, in the hundreds.  This is a likely scenario for single-instance, multi-site solutions for companies that have hundreds of sub-brands.  Now, if we index the "/sitecore/content" node, the resulting index could contain a lot of items that we don't care about.  The standard crawler would index all the nodes and generate a huge index.  What if we only want to index items that are based on a specific template?  We would have to built a custom crawler to do that.  The resulting crawler would be something like this:



Step-by-Step

Let's create a new class that inherits from the standard crawler:



If we inspect the code for the standard crawler, everything is fine except for the "IsExcludedFromIndex" method.  We can override this method to do what we want and to exclude items based on certain templates here.




Summary

This again illustrates how flexible Sitecore is.  If you don't like the out-of-the-box behavior, modify it to suit your needs.  Modify the config file a little, create a new class that inherits from an existing class and you are good to go.

Wednesday, April 2, 2014

Sitecore: Multi-site Setup w/o Updating Configuration Files


Background of Multi-site Setup

It is pretty easy to set up Sitecore for a single-instance, multi-site environment.  Sitecore is robust enough to handle hundreds or even thousands of websites within the same content tree.  All you have to do is specify a different node as the homepage of each website.  By default, the Home node is the main website and this is defined in the web.config <sites> section.



The format of site config entries is straight-forward.  Give it a name, a path, and the start item (the homepage item).  To add a new site, all you have to do is add a new <site> entry and save the file.  Also, if your website is running in Live Mode (reads from master database instead of web), you would have to add a new entry to the LiveMode.config file as well.



This is the traditional out-of-the-box way to add a new site.  It is fine for a few sites but what if we have hundreds?  Technically, we could repeat this process but this also means that the app pool also has to be recycled each time a new site is added because config changes require an app pool restart.  This also means slow response times during the restart, even if it is only a few seconds.

So how do we handle this in an elegant manner?  It's not that the traditional way is difficult but it requires developer intervention.  There are downloadable modules from the marketplace but there isn't a lean and simple solution.


SiteResolver Process (HttpRequest Pipeline)

To understand what we have to do, we have to understand what Sitecore does behind the scenes when it encounters a multi-site setup.  Inspect Sitecore.Pipelines.HttpRequest.SiteResolver in a decompiler.  You should see the following steps in a nutshell:

1) Sitecore grabs and stores all the <site> definitions
2) The URL parameter "sc_site" is checked.  If a website is defined with that name, we have found a matching site!
3) If "sc_site" is not available, then we check the hostname (domain name).  If a website is defined with that hostname, we have found a matching site!

These steps assume that the sites are all defined in the config files.  What if they are not?  Website names and hostnames are attempted to be matched from top to bottom in the list of definitions.  If a matching website is found, then the searching stops.  If the first one doesn't match, then move on until we hit the last one, which is, AND SHOULD BE, the default "website".  This ensures that at least the default website will load if all is bad.

Knowing all of this, let's go ahead and build our own custom site resolver process to replace the default one. 


Build CustomSiteResolver Process

Create a new class that inherits from the original site resolver.


All we have to do is override the Process method and make sure that it functions almost the same way as the original method:



The original Process performs the following:

1) Find a matching site based of URL parameter
2) If not found, match on hostname
3) Find site and update the start item so we know where the homepage is

The new Process should be:

1) Find matching site based on URL parameter or hostname
2) If nothing matches or if the only website that matches is the default website, then we take another approach
3) Parse the <site> definitions manually and look for the default website definition.  We will base all dynamic websites on the default website definition.
4) Check the URL parameter for the site name.  If available, iterate through all the child nodes in "/sitcore/content" and look for the item where the URL parameter value matches the value of the field "Site Name".  If there is a match, we have found the start item for this website.
5) If the site name does not exist or does not return a matching site, then the hostname is checked. Iterate through all the child nodes in "/sitecore/content" and look for the item where the hostname matches the value of the  field "Site Hostname".  If there is a match, we have found the start item for this website.
6) If no matches are found, we use the default website.


Step-by-Step Implementation

It is much simpler when explained with pictures, so here goes.  First, make sure you modify the template of each website's start item, either by template inheritance or simply adding new fields, to have these two fields:




Modify your custom site resolver to check for sites:



Check the site config file and get the definition for the default website as a single Xml node.




Get a list of all the child node items in the content tree



Perform the match on the item fields and return a site context (GetSite method)


If a match is found, use the default website definition but change the start item to match the item just found via the field search.  Also, set the context site to be the site that was just defined.  It is very important to set the context site so that Page Editor mode will be preserved and works as expected.

Now, replace the original processor from the pipeline with the definition of the new custom processor




Possible Enhancements

A very obvious enhancement would be to Solr-ize (or Lucene-ize) the child nodes of the content tree.  That way, we can perform quick searches on any of those items in the "Site Name" and "Site Hostname" fields and obtain a matching item quickly.


Summary

I am not sure exactly how the other modules work but it is safe to say they all work in a similar way as the method above.  Those modules might be more robust but the method I just described is definitely leaner and doesn't require a set of "global" items to be created to store site definitions.  We use a workaround where we take the default website definition and just change its start path.  This also assumes there are not specific port numbers used as well.  Again, the goal is to have site definitions checked dynamically from a list of website home nodes so developers do not need to update the config files constantly.