Home

Awesome

<pre> _____ _ _____ _ / ____| Sitecore | | |_ _| | | | (___ ___ __ _ _ __ ___| |__ | | _ __ __| | _____ _ \___ \ / _ \/ _` | '__/ __| '_ \ | | | '_ \ / _` |/ _ \ \/ / ____) | __/ (_| | | | (__| | | |_| |_| | | | (_| | __/> < |_____/ \___|\__,_|_| \___|_| |_|_____|_| |_|\__,_|\___/_/\_\ Builder </pre>

License: GPL v3

If you've dealt with older Sitecore projects that use large search indexes, then you've almost certainly hit the issue of "My search index rebuild takes so long, that the IIS process recycles before it finishes"...

This tool tries to help with that by managing the indexing operation from outside the ASP.Net website process. If something causes the web app to recycle, this tool will detect the error and back off before retrying and continuing the process. You can also stop the process and restart it later if necessary.

It will also try to manage errors raised by the Sitecore indexing process - but this behaviour is somewhat limited by the data returned from an indexing job by Sitecore. As far as I can tell, most internal failures return a message that still looks like success - even if, say, a computed field threw an exception. So you will need to check your crawler log to investigate whether any errors which were unreported by Sitecore occurred.

This hasn't been exhaustively tested, as it was something I hacked together to help with a work problem. But it's been tried against both Solr and Lucene indexes, with Sitecore v7.1, v7.2 & v9.0 - but in theory it should work with V7.0 and up.

Grab a release and then make use of the options it provides...

Step 1: Deploying the endpoint

The first step in running the tool is to deploy the special endpoint it uses into your sitecore application. The tool can do this with the Deploy verb:

SearchIndexBuilder.exe deploy -w <your website folder> [-o] [-t <token>]

The parameters are:

You must complete this step before proceeding.

Step 2: Setting up some config

To run an indexing job, the tool relies on a JSON file which specifies job configuration and settings. You can write this file manually if you want to, but the tool will generate it for you using the Setup verb.

SearchIndexBuilder.exe setup -u <url of the endpoint> -d <database> -t <token> [-q <query for items>] [-c <config file name>] [-o]

The parameters are:

The config file will include all the Sitecore indexes defined on your site by default. If you only want to build certain indexes, use a text editor to remove the unwanted ones from the JSON data. Just remember not to break the format of the file.

You can use the -z parameter from the global options below to change the file format at this point. The compressed formats are useful for large config files when you don't have a lot of disk space to play with, as the files tend to compress by 50-75%. However you cannot easily edit the files in these formats. If you want to make changes before running processing, use the convert verb instead.

Step 3.5: Converting the format of a config file

If you want to convert a config file from one format to another, you can make use of the convert verb.

SearchIndexBuilder.exe convert -s <config file> -t <config file> -f <format>

The parameters are:

The code will try to determine the format of the source file using it's extension, or you can override this using the -z global option. The target file format is set by the -w parameter.

This option exists to save disk space on constrained systems - as a zip/GZip stream will reduce a config file but as much as 75% in some cases. But it does this at the expense of performance, as it takes longer to reand and write these files due to the processing for compression.

Step 3: Running an index build

To start the process of re-indexing, you use the index verb. This will take a config file created by the previous step, and process each of the content items it specifies. Using the endpoint you've deployed, the tool will ask Sitecore to reindex each of the items, using each of the indexes you have specified.

SearchIndexBuilder.exe index [-c <config file>] [-o <output Every X items>] [-r <retries in case of error>] [-p <ms to pause for>] [-t <seconds>]

The parameters are:

You can stop the tool safely with Ctrl-C. It will finish its current operation, and then end. The current state (specifically what items are left to process, and what errors have been recorded - both transient and permenant) will be written to disk in the config file. The previous state of the config fill will be preserved in a backup file named with the format backup-<date>-<time>-<config>.json so that you can revert to this previous state if necessary.

To try and help with situations where the tool fails unexpectedly, it will also write (and overwrite) a file name RuntimeBackup-<config>.json each time the tool outputs statistics as part of the -outputEvery option. This is the current state of the job configuration. It will also pay attention to remaining disk space - and if it gets down to less than 1.25 times the size of the last backup written, the indexing job will be cancelled in order to prevent data loss due to running out of disk space.

The updated config is also saved to disk when the tool finishes normally - giving a record of items which caused problems.

You can use -z from the global options below to specify the file format to use. However the code will try and work out the correct format automatically, based on the file extension you specify.

Step 4: Retrying errored items

If you have a config file with errors recorded in it, and you want to re-process those items, you can use the retry verb to generate a new config file from the processed one. It will clear the processed items, elapsed time and attempts count data, and add any errors into the items list. You can then re-run the index verb.

SearchIndexBuilder.exe retry [-s <source config file>] [-t <target config file>] [-o]

The parameters are:

Step 5: Removing the endpoint

Once you're finished, you should remove the endpoint file from the target website. You can do that by just deleting the file, but the tool can do this for you with the Remove verb.

SearchIndexBuilder.exe remove -w <your website folder>

The parameters are:

Global parameters

The system also supports some global parameters, which will affect all of the verbs: