SQL Query Performance - Do you feel dirty? (Dirty Reads)
Ok, so I'm feeling guilty that I was in such a rush to get my blog, and then I failed to posting anything for almost a week. Well, I thought that rather than let another week go by I ought to post something of use. But what to post? It just so happens that I have been working on a document to share some of my experience with optimizing TSQL queries (MS SQL 2000 that is). I know that in the spirit of SQL 2005, it may seem blasphemy to share knowledge about something that was so yesterday as SQL 2000. But, that's what I have. And there you are.
Today's installment will be a tidbit about optimizing queries specifically using locking hints. Depending on the shape and use of the database, I have used this technique to reduce 30-second queries to 2 seconds. Other times, it has made no noticable difference at all. It all depends on the kind of volume and types of transactions hitting the tables being queried.
I'll follow this post up later with more of the same subject. Here you go:
---------------------------------------------------------------------------------------------------
Microsoft SQL has several levels of lock granularity. The server does its best to “pessimistically” control concurrency with locks. This means that the server assumes the worst when it comes to possible concurrency collisions, and protects you from yourself.
Locks can be acquired on rows, pages, keys, ranges of keys, indexes, tables, or databases. SQL Server dynamically determines the appropriate level at which to place locks for each Transact-SQL statement. The level at which locks are acquired can vary for different objects referenced by the same query; for example one table may be very small and have a table lock applied, while another, larger table may have row locks applied. The level at which locks are applied does not have to be specified by users and needs no configuration by administrators. Each instance of SQL Server ensures that locks granted at one level of granularity respect locks granted at another level. [SQL 2000 online help]
While the server assumes the worst when trying to protect us from ourselves, it also delivers less performance while doing so, and in some circumstances may have dire performance consequences.
Remember Optimization Axiom #2: “Locking is not your friend.” (when optimizing queries)
Example 1:
Imagine that your program runs the following summary query to display revenue for 2005:
SELECT DivCode, DeptCode, SUM(Revenue) AS RevenueSubTotal
FROM AccountsJournal
WHERE ClosedDate BETWEEN ‘2005-01-01’ AND ‘2006-01-01’
GROUP BY DivCode, DeptCode
Let’s say that this AccountsJournal table contains 1,500,000 rows of data, and already has an index for DivCode+DeptCode as well as a separate index for ClosedDate.
Any problems with this? Indexes are in order. The query is optimized. But there is a problem. A big one.
This had better not be running on the main production database during business hours. It might run in less than 3 minutes, but during that time, the SQL server’s locking traffic cop is deciding the fate of every production user of the AccountsJournal table, and whether or not they’ll be allowed to post their updates in a timely manner. Almost as bad, is the fact that this query will also be forced to wait while other update queries finish running. Do you feel lucky?
Simply put, a dirty read is where a query reads data from the database without lock protection. Without lock protection, you cannot be guaranteed that the data isn’t changing during the time that the query is running.
SQL Server assumes that you need to be protected from concurrency issues even when you’re running a SELECT statement. What you need to decide is whether or not you agree with it’s assessment of the danger. Ask yourself these questions:
- While my SELECT query is running, does it matter whether or not someone else makes a change to a record or two? Will it affect my outcome in a meaningful way?
- Are any of the tables in my query in danger of being affected by a mass-update while the query is in the process of running? If so, would it be bad if my query only saw part of the mass-update?
If the answer is, “No, my query doesn’t mind,” then you need to turn off the locking for your query. This is called a “dirty read”. You’re willing to accept less than a pristine and untouched snapshot of data.
A better question is, “when shouldn’t a dirty read be used?” It is the author’s opinion and experience that the best practice for database developers is to use dirty reads as the rule, rather than the exception. Let the hate mail begin...
In fact, at two telecommunications companies in the author’s past, it was a serious offense for developers to write any query without enabling dirty reads (without strong justification). At In Touch Communications, in 2001, the development staff had no SQL standards. The MS SQL 2000 database that powered their Siebel CRM system was deadlocking often, and the Customer Service and Sales departments would crash in Siebel at least twice per week. In 90% of the cases, the issue turned out to be long-running queries that had been hand written by developers for various purposes. Some were queries for reports. Others were custom scheduled SQL jobs. All of them used the default locking of SQL server.
Once the In Touch Communications I.T. staff migrated all appropriate stored procedures (99%) to use dirty-reads, the deadlocks went away. Siebel stopped crashing. Later, I.T. afforded a separate SQL server to run their reports, which further increased system performance (also a best practice).
It is rare that a SELECT statement is negatively impacted by using a dirty read. The only cases the author has run into have been in stored procedures that run one or more SELECT statements to gather information just prior to a mass update or delete. In these rare cases, a qualified DBA should consider the data dependencies, and decide whether or not dirty reads are safe for that stored procedure. If locking is allowed on a long-running query (15+ seconds), then the DBA must consider how this query will affect the rest of the system.
Best practice rules:
- All SELECT queries should use the WITH (NOLOCK) directive for every table involved in the query.
- If it is not practical to use the WITH (NOLOCK) directive on every query, then upon opening a database connection, enable dirty reads as the default behavior for the entire database connection using:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Example 2:
To change our previous revenue query to use dirty reads:
SELECT DivCode, DeptCode, SUM(Revenue) AS RevenueSubTotal
FROM AccountsJournal WITH (NOLOCK)
WHERE ClosedDate BETWEEN ‘2005-01-01’ AND ‘2006-01-01’
GROUP BY DivCode, DeptCode
Example 3:
Here’s an example of WITH (NOLOCK) on a more complex query:
SELECT DISTINCT
P.Handle, SM.SeatDescription
, PS.PreFlopSeen, PS.FlopSeen
, PP.Invested, PP.Won, PP.MonetaryProfit
FROM tblPlayer P WITH (NOLOCK)
INNER JOIN tblPlayerStats PS WITH (NOLOCK)
ON P.PlayerID = PS.PlayerID
INNER JOIN tblPlayerHand PH WITH (NOLOCK)
ON P.PlayerID = PH.PlayerID
INNER JOIN dbo.tfn_Hand_PlayerCount() HPC
ON PH.HandID = HPC.HandID
LEFT JOIN (
SELECT SM.PlayerCount, SM.SeatIndex
, SM.SeatID, ST.SeatDescription
FROM tblSeatMap SM WITH (NOLOCK)
INNER JOIN tblSeat ST WITH (NOLOCK)
ON SM.SeatID = ST.SeatID
WHERE ST.StreetID = 2
) SM
ON SM.PlayerCount = HPC.PlayerCount
AND SM.SeatIndex = PH.PreFlopSeatIndex
LEFT JOIN dbo.tfn_PlayerProfit_Summary() PP
ON P.PlayerID = PP.PlayerID
WHERE NOT SM.SeatID IS NULL
Notice that both User Defined Functions in Example 3 do not have a WITH(NOLOCK) directive. More on that later.
Until next time...
Phil Scott
www.neudesic.com
phil.scott@neudesic.com