Situation: High CPU Utilization, High Response time, slow queries (Readable Secondaries)

Situation: High CPU Utilization, High Response time, slow queries (Readable Secondaries)

Task: Identify the issue at hands and resolve.


Action:
Started investigating the issue. Identified the cause of high cpu utilization. Particular query that used to go over a particular queue table was taking a lot longer and was inducing a very high number of logical reads. It was really confounding to see that the table had just about a handful of rows, however the number of pages in the table ran into hundreds of thousands. The number of logical reads was very high.

There were long running queries that were found to be running on the readable secondaries. Due to the readable secondaries, Snapshot isolation gets activated under the covers.

As a part of snapshot isolation, there's a 14 byte pointer that gets added to each row and at the same time, all the row versions prior to the start of the query are preserved within the tempdb version store to ensure that if the query needs to access the row version at the start time of the transaction then it should be able to do so. This by design does not let the ghost record cleanup thread to work and essentially while the rows were getting deleted in the queue table, but they were not actually being deleted from the data pages allocated to the table. The size of the table continued to grow inspite of the rows being marked as ghosted.

The team was asked to disable the readable secondaries and thereafter the clustered index on the affected queue table was rebuilt.
The issue was fixed and later on we had a call with MS and MS acknowledged this to be the way things were designed. According to them this was not a flaw but a side effect of using the secondaries for reads.


Result:
The issue was fixed and the client appreciated the effort.