Sitecore has a great feature — Switch On Rebuild — that supports search index swapping once the rebuilding of the index completes.
In this article, I want to share my experience of implementing this feature in practice.
I was involved in one project based on Sitecore 9.3 Managed Cloud and SolrCloud, where we built a custom search index to crawl products every 12 hours. That was a huge one (with more than 3 million products at the time of this writing) and the entire rebuild took 8–10 hours.
In Sitecore, there is a mechanism for switching between active and rebuild indexes in order to avoid downtime in search functionality on the website. We, therefore, configured the Switch On Rebuild feature for this custom index so that the rebuilding process doesn’t affect the active index used on the live site.
As for continuous delivery, we used the blue-green deployment approach so as to have zero production downtime.
After each deployment, we triggered a full rebuild of our custom index and occasionally faced the problem of having the active index running instead of the rebuilt one.
For instance, before deployment, we have the following alias statuses in SOLR cloud:
- active collection — product_index_rebuild (in SolrCloud it maps through aliases)
- rebuild collection — product_index (in SolrCloud it maps through aliases)
Main alias (active index) — product_index_rebuild collection (In SolrCloud, search core is called collection)
This means that the next time we trigger the index, it should be product_index. But we see quite the opposite in practice: the active index is running, the live site is broken, and data search is unavailable.
What Is the Cause of the Problem?
Sitecore has a property store where it stores some major facts about search indexes, such as when the index was last updated. And if the index has the Switch on Rebuild functionality, it stores information on which collection is Active and which is Rebuild. It happens inside the SwapAfterRebuild method (Class: SwitchOnRebuildSolrCloudSearchIndex, Assembly: Sitecore.ContentSearch.SolrProvider) where the PreserveAliasesCollections is called.
All the data is located in the Properties table (core database):
As you can see, this table has three columns. Key and Value columns are of main interest to us.
The Key column comprises the following parts:
- CORE — just the core prefix (core aka collection in SolrCloud)
- PRODUCT_INDEX — the name of the index
- RD50***B1F27 — server Machine Name. The machine where the CM instance is hosted as PaaS
- MC-EDD***501-CM__6554 – WEBSITE_IIS_SITE_NAME
- SOLR_ACTIVE_COLLECTION — the active collection in SOLR
- REBUILD_COLLECTION — the collection that will be rebuilt next
The IndexDatabasePropertyStore (Assembly: Sitecore.ContentSearch) type implements this interface. Below you can see the implementation of the Set method with its dependencies:
In the Property table, you can find more than one pair because Azure can change the Machine Name for any reason.
How do you validate the current Machine Name and WEBSITE_IIS_SITE_NAME? In the Azure portal, navigate to CM AppService and run Advanced Tools (aka KUDU):
Click the Environment tab where you can observe all major metadata about this App Service. For instance, you can find the Machine Name, WEBSITE_IIS_SITE_NAME:
And what happens when we run the blue-green deployment?
When the deployment completes, a new deployment slot becomes available. This slot is another App Service that has its own Machine Name and WEBSITE_IIS_SITE_NAME; for instance:
In the scope of Active/Rebuild collections stored in the Property table, Sitecore does not synchronize slots between each other. What does that mean for us? It means we could end up with a situation in which the status of both active/rebuilt collections is different.
Let’s assume the current Machine Name is A. But we also have machine B that will be swapped when the blue-green deployment occurs. Machine B has Active Collection — product_index_rebuild, Rebuild Collection — product_index. When we trigger the deployment, machine A looks like this:
- Active Collection — product_index
- Rebuild Collection — product_index_rebuild
Consequently, on the live site, the product_index is active. After the deployment, machine B is active, and when we trigger the index, the product_index is run; this is an issue because another index needs to be run.
This issue can be resolved as follows:
- After the blue-green deployment, run SQL script to compare Active/Rebuild collections between deployment slots:
- If #1 resynchronized, update records for the active deployment slot (machine). It’s also possible to delete these two records.
You can automate these steps by writing a script and executing it in the scope of the Content Delivery pipeline.
Note: the ContentSearch.Solr.EnforceAliasCreation setting (part of the SwitchOnRebuildSolrCloudSearchIndex class) should be False. If this setting is True, Sitecore creates aliases every time the CM instance is reset.
I hope you will find my experience useful. That’s it for today.