Spinlock Detection and Correction

The first and most basic question is “What is a spinlock?”

The following is from The August 1995 issue of the Sybase Technical News

(Vol 4, Number 3): That particular article used the term ‘SQL Server’ to refer to Sybase ASE.

Spinlock Defined

The lowest level of synchronization in a multiprocessor SQL Server is the spinlock. SQL Server uses the spinlock to synchronize access to the critical data used to control semaphores, page locks, and other thread management tasks.

NOTE: Spinlocks and semaphores are only used in multiprocessor servers, since otherwise a process is not allowed to schedule out (or yield) while holding one.

In general, a spinlock works like this:

1. A thread or engine obtains a spinlock before attempting to change data.

2. While that thread or engine has the spinlock, if another thread or engine attempts to obtain the same spinlock, that thread “spins” on the lock until the lock is released.

3. Once it is safe for others to access or change the data, the original thread releases the spinlock.

A spinlock might perform code protection or data protection. For example, SQL Server always uses one particular spinlock for modifying an object. A spinlock protecting code protects only a very small fragment of the code, allowing it to be executed only in a single-threaded fashion. Spinlocks are the most basic form of locking protocol, usually requiring hardware support, an atomic instruction across the bus.

Spinlocks are used in favor of more complex semaphores or page style locks to control access to frequently changed data structures because they are so basic; it doesn’t take long to acquire a spinlock.

 

This case study was performed using SAP Sybase ASE version 15.7 running on IBM AIX version 7. At the beginning of the study, the server was configured in threaded mode with 26 threads. Threads in 15.7 are roughly synonymous with online engines in prior ASE versions. There were several data caches. One for the temporary databases, another contained most system tables, yet another for the various user database logs, and a very large default data cache for everything else.

In this case the customer was experiencing times when all 26 threads were running at 100% utilization. The duration of these spikes could be 30 minutes or longer. The users of the system were seeing very long response times. Long, to the “Nothing is getting done” point.

One rule of thumb is “if response time is slow and CPU utilization is below 90% get faster CPUs, if above 90%, add CPUs” or threads. That seemed simple enough. Since there were plenty of cores on this hardware, add some threads. Oops, that didn’t help. If anything, the problem got worse. Perhaps some more detailed analysis was needed.

It was observed that during the times of 100% CPU utilization the physical I/O dropped to nearly zero. There was the clue that we needed. This combination of symptoms almost always indicates the problem is spinlock contention.

Monitoring spinlocks consists of four major components. These components are spins, waits, grabs and the contention itself (waits/grabs). I have read a number of articles that have suggested the threshold pointing to spinlock contention being a problem was 10%. That is to say a ratio of 10 waits for every 100 grabs. There is a lot of truth in this thumb also. Relative to this rule, the customer appeared to be in pretty good shape. The only serious problem appeared to be in the procedure cache. However, don’t ignore the spins themselves. Remember that it is the spinning that is using the CPU. And there was a whole lot of spinning going on, especially in default data cache and procedure cache.

So here is what was done to correct the problem.

First we added 8 hot caches. We based the objects to be moved from default data cache to the hot caches by monitoring monOpenObjectActivity looking for objects that were performing the highest number of logical reads. Next we split the tempdb cache into caches that contained, at most 2 temporary databases. We also added additional syslogs caches. All the caches were built with 32 cache partitions since the number of threads had been, after these changes, successfully increased from 26 to 30 without adverse effect. Finally, we executed dbcc proc_free whenever CPU utilization exceeded 70% to address procedure cache spinning.

The additional data caches and partitions worked because of two factors. The two primary times the cache spinlocks get grabbed are when:

1) ASE is reading a page from disk into cache.

2) ASE is moving a page in cache to the head of the MRU chain when it is accessed.

 

So, creating more caches will simply increase the number of spinlocks and reduces the possibility of contention. There is one spinlock for each cache partition; which is why increasing the number of partitions can also help (more spinlocks = less chance of contention on a single spinlock).

Of course, monitoring is an ongoing project. To insure that decisions we made were right and that over time they would continue to be right we put in place additional MDA table monitoring. ASE 15.7 contains a table that monitors spinlocks (monSpinlockActivity). Currently the activity being harvested from this table is activity on the procedure cache and the data caches but could be expanded to all the spinlock activity captured in this MDA table. Additional tables monitored are the monOpenObjectActivity in conjunction with monCachedObject. The idea behind monitoring these tables is the following. If spinning is high on a particular cache and two or more tables assigned to that cache are among the highest logical reads during the period that might be an indication that those tables need to be in separate caches.

Many thanks go to Dave Putz of SAP Sybase, Peter Barnett of IBM and his team for their assistance in this case.

The below code is provided as an example only. There is no warranty, either real or implied by the author, Dobler Consulting, SAP Sybase or IBM. The data cache and spinlock activity being collect is entirely dependent on a site’s naming conventions. MDA tables, for the most part contain counters that are incremented from the time the server starts. Most of the counters are integer and can hold a number as large as roughly 2 billion. The important thing is how many of the thing being counted occurs in a certain period of time. In the case of spinlocks, because of their very short duration, the counts get very large very quickly. The code below stores the values of selective counters at the beginning of the sample, sleeps for one minute then captures the counts again in temporary tables. The extraction of the values simply subtracts the beginning counts from the ending counts to show the delta.

/* Syntax to create the tempdb tables for begin and end sample capture /
USE tempdb
go
CREATE TABLE guest.monOpenObjectActivity1
(
DATE datetime NOT NULL,
CacheName varchar(30) NULL,
DBName varchar(30) NULL,
TableName varchar(30) NULL,
ObjectName varchar(30) NULL,
IndexID int NOT NULL,
LogicalReads int NULL,
RowsInserted int NULL,
RowsUpdated int NULL,
RowsDeleted int NULL
)
LOCK DATAPAGES
go
IF OBJECT_ID(‘guest.monOpenObjectActivity1’) IS NOT NULL
PRINT ‘<<< CREATED TABLE guest.monOpenObjectActivity1 >>>’
ELSE
PRINT ‘<<< FAILED CREATING TABLE guest.monOpenObjectActivity1 >>>’
go
CREATE TABLE guest.monOpenObjectActivity2
(
DATE datetime NOT NULL,
CacheName varchar(30) NULL,
DBName varchar(30) NULL,
TableName varchar(30) NULL,
ObjectName varchar(30) NULL,
IndexID int NOT NULL,
LogicalReads int NULL,
RowsInserted int NULL,
RowsUpdated int NULL,
RowsDeleted int NULL
)
LOCK DATAPAGES
go
IF OBJECT_ID(‘guest.monOpenObjectActivity2’) IS NOT NULL
PRINT ‘<<< CREATED TABLE guest.monOpenObjectActivity2 >>>’
ELSE
PRINT ‘<<< FAILED CREATING TABLE guest.monOpenObjectActivity2 >>>’
go
CREATE TABLE guest.monSpinlockActivity1
(
DATE datetime NOT NULL,
SpinlockName varchar(255) NULL,
Waits bigint NOT NULL,
Grabs bigint NOT NULL,
Spins bigint NOT NULL
)
LOCK DATAPAGES
go
IF OBJECT_ID(‘guest.monSpinlockActivity1’) IS NOT NULL
PRINT ‘<<< CREATED TABLE guest.monSpinlockActivity1 >>>’
ELSE
PRINT ‘<<< FAILED CREATING TABLE guest.monSpinlockActivity1 >>>’
go
CREATE TABLE guest.monSpinlockActivity2
(
DATE datetime NOT NULL,
SpinlockName varchar(255) NULL,
Waits bigint NOT NULL,
Grabs bigint NOT NULL,
Spins bigint NOT NULL
)
LOCK DATAPAGES
go
IF OBJECT_ID(‘guest.monSpinlockActivity2’) IS NOT NULL
PRINT ‘<<< CREATED TABLE guest.monSpinlockActivity2 >>>’
ELSE
PRINT ‘<<< FAILED CREATING TABLE guest.monSpinlockActivity2 >>>’
go

/
Syntax to collect the data into the tempdb tables /
set rowcount 0
set nocount on
declare @exec_count int
/ truncate the prior run /
truncate table tempdb.guest.monOpenObjectActivity1
truncate table tempdb.guest.monOpenObjectActivity2
truncate table tempdb.guest.monSpinlockActivity1
truncate table tempdb.guest.monSpinlockActivity2

insert into tempdb.guest.monSpinlockActivity1 select getdate() as DATE, SpinlockName, Waits, Grabs, Spins
from master..monSpinlockActivity
where SpinlockName like ‘%hot%’
or SpinlockName = ‘default data cache’
or SpinlockName like ‘%proccache_spin’

insert into tempdb.guest.monOpenObjectActivity1
select getdate() as DATE, a.CacheName, a.DBName, object_name(a.ObjectID,a.DBID)
, a.ObjectName, a.IndexID, b.LogicalReads, b.RowsInserted, b.RowsUpdated, b.RowsDeleted
from master..monCachedObject a, master..monOpenObjectActivity b
where a.DBName like ‘QUAD%’
and a.DBName = b.DBName
and object_name(a.ObjectID,a.DBID) = object_name(b.ObjectID,b.DBID)
and a.ObjectName = b.ObjectName
and a.IndexID = b.IndexID

waitfor delay ’00:01:00′

insert into tempdb.guest.monSpinlockActivity2 select getdate() as DATE, SpinlockName, Waits, Grabs, Spins
from master..monSpinlockActivity
where SpinlockName like ‘%hot%’
or SpinlockName = ‘default data cache’
or SpinlockName like ‘%proccache_spin’

insert into tempdb.guest.monOpenObjectActivity2
select getdate() as DATE, a.CacheName, a.DBName, object_name(a.ObjectID,a.DBID)
, a.ObjectName, a.IndexID, b.LogicalReads, b.RowsInserted, b.RowsUpdated, b.RowsDeleted
from master..monCachedObject a, master..monOpenObjectActivity b
where a.DBName like ‘QUAD%’
and a.DBName = b.DBName
and object_name(a.ObjectID,a.DBID) = object_name(b.ObjectID,b.DBID)
and a.ObjectName = b.ObjectName
and a.IndexID = b.IndexID

/
Syntax to report on the 1 minute period */
set rowcount 0
set nocount off
select a.DATE, a.SpinlockName, a.Waits – b.Waits as Waits
, a.Grabs- b.Grabs as Grabs
, a.Spins – b.Spins as Spins
from tempdb.guest.monSpinlockActivity1 b, tempdb.guest.monSpinlockActivity2 a
Where a.SpinlockName = b.SpinlockName

select a.DATE, a.CacheName, a.DBName, a.TableName
, a.ObjectName, a.IndexID, a.LogicalReads – b.LogicalReads as LobicalReads
, a.RowsInserted – b.RowsInserted as RowsInserted
, a.RowsUpdated – b.RowsUpdated as RowsUpdated
, a.RowsDeleted – b.RowsDeleted as RowsDeleted
from tempdb.guest.monOpenObjectActivity2 a, tempdb.guest.monOpenObjectActivity1 b
where a.DBName = b.DBName
and a.TableName = b.TableName
and a.ObjectName = b.ObjectName
and a.IndexID = b.IndexID
Order by a.LogicalReads – b.LogicalReads desc

 

Leave a Comment

Your email address will not be published. Required fields are marked *