I recently ran into an interesting situation with SQL Server specifically SQL Server 2012 that involves "Lock Escalation" that usually resulted in dead-lock scenarios. My application is responsible for parsing very large binary/text files and inserting data into SQL tables. My original thought was to handle this large dataset insertion in a single transaction so that data consistency would be achieved. The application has two major components. One of the components is a web application that accesses some of the tables that are used in the bulk insertion I mentioned above for basic verification and validation. When I was running some test cases, I noticed that my application became very un-responsive and it would eventually time out when the time-out values were reached.
SQL Server provides some system tables that maintains information about the locks it keeps and any suspended threads due to these locks. You can use some of the following queries below to query your master schema, system tables and use the information in debugging dead-lock cases.
- sys.dm_os_waiting_tasks - Returns information about the wait queue of tasks that are waiting on some resource - more detailed information can be found at http://technet.microsoft.com/en-us/library/ms188743.aspx
- sys.dm_os_wait_stats - Returns information about all the waits encountered by threads that executed. You can use this aggregated view to diagnose performance issues with SQL Server and also with specific queries and batches. - more information can be found at http://technet.microsoft.com/en-us/library/ms179984.aspx
Quick research online indicated that keeping transactions open for lengthy durations could cause such scenarios even though I specifically specified row-level locking to prevent the tables getting locked by my insertion process. It turns out SQL Server has a complicated logic called "Local Escalation" that could eventually lock an entire table if it deems it will be performant. Detailed information about "Lock Escalation" can be found here.
Based on my research, the first thing I did was to create a commit interval. Instead of keeping one transaction open for a period of time, I decided to keep my transactions smaller and commit at certain intervals. When my buffer reached 1024 items, I flushed it and committed the transactions thinking it would help with the situation. It did not help at all. My web application ran into the same issues where threads were suspended and waiting on other database resources to be released. My test case involved inserting over 1.4 million rows in 30 minutes and I thought 1024 items was a relatively short time. I kept reducing the interval thinking it would help. I tried 512, 256, and eventually no transaction at all. Nothing helped.
I then came across Snapshot Isolation. "SNAPSHOT isolation specifies that data read within a transaction will never reflect changes made by other simultaneous transactions. The transaction uses the data row versions that exist when the transaction begins." Instead of using locks, Snapshot Isolation uses row versioning where the versions of the rows are stored in tempdb. You have to turn Snapshot Isolation On at the database level in order to use this functionality. I went ahead an turned it on, and tested my application. Since nothing was getting locked, my web application's request did not get blocked (locked) by SQL Server. I am still verifying whether this is a good configuration change to make and what other affects it may have on my application. I understand there will be some performance issues due to versioning and that tempdb might grow large.
If you have experience with Snapshot Isolation, I'd love to hear them.
Comments