Sitecore Quicktip: Managing Excessive Interactions in the Collection Database
Sitecore XP is a complex system designed for personalization, analytics, and a whole range of marketing capabilities. However, it can become unhealthy is not properly managed. One of the primary keys for Sitecore to leverage in xConnect/xDB is interactions. A single contact can generate an excessive amount of interactions, and while this is usually seen with bot/synthetic like activity, its imperative to check data within the Shard databases to determine 1) do you have excessive contacts and 2) what do you do if you see excessive contacts.
Contents
Why Does This Matter?
Taking straight from the Sitecore documentation, excessive interactions cause significant harm on both the xConnect roles and the backend Collection database (aka, the Shards). To make matters worse, as there are several databases and roles that interact together, this bottleneck can cause a host of issues to the point of destabilizing an environment:
A contact with an excessive number of interactions can cause high resource consumption on the xConnect instance and SQL databases. This can result in HTTP Server errors because the requests take a long time to execute… Contacts with several thousands of interactions must be checked further since they might cause harm to the system
https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB0417184
How Do I Find Excessive Interactions?
Get you SQL query chops ready and fire up SSMS because there a few key queries to run against the Shard databases to surface if and where excessive interactions are occurring.
Step 1 is to locate the data. This is achieved by getting the list of contacts first and then for contacts with large interactions, drilling into the interaction data. Per the Sitecore documentation, you need to run the following against each Shard as Sitecore and the Shard Map Manager will send contacts/interactions across Shards (thus Sharding):
SELECT TOP (100) ContactId, COUNT(ContactId) as Count
FROM [xdb_collection].[Interactions]
GROUP BY ContactId
ORDER BY Count DESC
Once you have a consolidated list of contacts you can query the interaction data if you see a large amount of interaction data. This is per contact, so the query would be something as follows:
SELECT TOP (100) *
FROM [xdb_collection].[Interactions]
WHERE ContactId = '6ffc58de-6c56-0000-0000-05d6639738da'
Both of these examples are taken directly from the Sitecore documentation at: https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB0417184
Analyzing the Data
Once you have the interaction data, the next step is to analyze the data. This is where you would evaluate the UserAgent to ensure its not a bot (and configure robot detection management if excessive). You will also want to evaluate the Events to check the pages viewed and any subsequent API calls.
Its also critical to understand if the contact is a known individual or anonymous. This where investigating items such as the Contact Identifier, Facets, and IP Addresses come into play.
The methods to analyze data, including queries can be found at: https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB0417184
What to Do About Excessive Interactions
Excessive interactions are likely not adding value and directly impacting performance in a negative manner. Of note, marketing teams depend on “information” where interactions are just “data”. Having bad data that is not relevant takes away the information teams need to make proper decisions. Specifically, Sitecore is direct when they state:
We recommend you to remove contacts with an excessive number of interactions. They are often robots, which do not provide value in the reports and cause extra load on the system.
https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB0417184
There are two strong options to remove unwanted interactions, the API route or the ADM tool. In either instance it is critical to test this in a lower environment and be aware that removing interactions can be a time consuming and performance impacting activity. For example, post ADM removing interactions, it will also trigger an xDB index rebuild so you may want to consider upscaling the instance running ADM, the Shards (especially as they are I/O intensive), and Solr while you perform removal operations.
Here is the link to the API method (Note: reserved for Sitecore 9.2+): https://doc.sitecore.com/xp/en/developers/93/sitecore-experience-platform/deleting-contacts-and-interactions-from-the-xdb.html
Here is the link to the ADM tool (Note: the ADM module is provided as is and Sitecore Support does not cover issues): https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB0692337
Preventing Excessive Interactions
Instead of being in constant clean up mode, there are a few options to try and prevent excessive interactions:
- Implement a Web Application Firewall (WAF) and block bots
- Use Robot detection to exclude robot user agents: https://doc.sitecore.com/xp/en/developers/93/sitecore-experience-platform/configure-robot-detection-functionality.html
- Remove API tracking from API Controllers by adding the Tracker.Current.CurrentPage.Cancel() method to the controller(s) in question
- Be aware of when Pen Testing activities occur and quickly remove synthetic interactions using one of the methods in the “What do Do About Excessive Interactions” section
- Routinely monitor your analytics profile and check contacts/interactions to determine if any excessive interactions are creeping in