The Scalability Engineers DBA team was engaged with one of our customers who had a peculiar problem with regards to their user’s experiencing slowness in highly critical OLTP application quite frequently.

The client environment has Always ON set up with the below configurations.
• 1 Primary
• 3 Secondaries

One of the secondary replica is a local secondary with always on configured with synchronous mode and remaining are remote secondary with asynchronous mode as the configuration.

While investigating the issue, DBA team analysed and found that there was a lot of blocking on the primary replica where application workload was running.

Team found lot of blocking in the highly OLTP database with the wait type as pagelatch_EX, pagelatch_SH etc i.e. high latching. The blocking occurred between background system process and application process. The DBA team initially had to dig deep to get an understanding about this blocking, after a lot of analytics done by the teams we were able to isolate the root cause of the problem.

The DBA team found out that the ghost record cleanup process was blocked by read workload of secondary replica , which eventually slowed down the application queries and created a huge blocking chain. After careful consideration the team proceeded and disabled the readable property of secondary replicas resulting in issue got fixed and application performance came back to normal.

As we know the read operations are mapped to snapshot isolation transaction level, the cleanup of ghost records on the primary replica can be blocked by transactions on one or more secondary replicas. The ghost record cleanup task will automatically clean up the ghost records for disk-based tables on the primary replica when they are no longer needed by any secondary replica. This is like what is done when you run transaction(s) on the primary replica. In the extreme case on the secondary database, you will need to kill a long running read-query that is blocking the ghost cleanup.