Dealing with SQL Performance
It seems all too often that the answer to dealing with SQL is now to use an Object-Relational Mapping (ORM) tool to avoid SQL like the plague when talking to .Net developers. Some of the reasons for
(for .Net developers)
Introduction
I use Entity Framework… what do I care about SQL Performance?
It seems all too often that the answer to dealing with SQL is now to use an Object-Relational Mapping (ORM) tool to avoid SQL like the plague when talking to .Net developers. Some of the reasons for this are that SQL contains black magic or because Entity Framework handles it for you so that you don’t have to. And about a million other reasons to not work within SQL itself.
And to be honest, in a lot of cases the ORMs that are widely used will do an excellent job. But there are going to be cases where that ORM falls on its face when you try to create a query and now you are the unfortunate soul who is tasked with resolving the issue.
So let’s be clear before we get started. ORMs aren’t evil and they really do have a place within .Net applications but sometimes they can prove to be a square peg in a round hole. So this write up is to add another tool to your tool belt.
ORM Example
We are going to use Entity Framework (EF) for our example because it’s a popular ORM out there so why not?
Setup & Background
We want to pull back all of the information from the database for a person that is currently using the application.
We are using AdventureWorks2014 running on a SQL 2014 server and using .Net 4.6.1 with EF 6.1.3 Code First for this test case. The concepts we will be discussing here will apply to other versions of both .Net, SQL, and EF.
.Net Code
The code seen below seems to be popular in a lot of data access layers that are built in order to make querying objects easier for development teams. The concept being that if you request a person object, you get a fully qualified object so you don’t need to worry about having the context available to get some information about the object in repeat trips to the database. At a price of course which we will go over down below.
Person person;
using (var context = new AdventureWorks2014Context())
{
// Write out the query & parameters so we can view everything if the debugger is attached
if (System.Diagnostics.Debugger.IsAttached) context.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);
// Include all the possible user information we could ever need
person = context.People
.Include(p => p.Employee)
.Include(p => p.BusinessEntity)
.Include(p => p.BusinessEntityContacts)
.Include(p => p.EmailAddresses)
.Include(p => p.Password)
.Include(p => p.Customers)
.Include(p => p.PersonCreditCards)
.Include(p => p.PersonPhones)
.Single(p => p.LastName == lastName && p.FirstName == firstName);
}
Console.WriteLine($"Person: {person.LastName}, {person.FirstName}");
Console.WriteLine("Phone Numbers:");
foreach (var phone in person.PersonPhones)
Console.WriteLine($"....{phone.PhoneNumber}");
So in the code above, we will query using the first and last name of a person and get back all the relevant data we could need about that person.
Generated SQL Query
Here is a general idea of the query that is generated:
SELECT
[UnionAll4].[BusinessEntityID] AS [C1],
[UnionAll4].[BusinessEntityID1] AS [C2],
[UnionAll4].[BusinessEntityID2] AS [C3],
[UnionAll4].[BusinessEntityID3] AS [C4],
[UnionAll4].[BusinessEntityID4] AS [C5],
[UnionAll4].[BusinessEntityID5] AS [C6],
[UnionAll4].[BusinessEntityID6] AS [C7],
[UnionAll4].[BusinessEntityID7] AS [C8],
[UnionAll4].[BusinessEntityID8] AS [C9],
[UnionAll4].[PersonType] AS [C10],
[UnionAll4].[NameStyle] AS [C11],
[UnionAll4].[Title] AS [C12],
[UnionAll4].[FirstName] AS [C13],
...removed for brevity...
FROM (SELECT TOP (2) [Extent33].[BusinessEntityID] AS [BusinessEntityID34], [Extent33].[PersonType] AS [PersonType5], [Extent33].[NameStyle] AS [NameStyle5], [Extent33].[Title] AS [Title5], [Extent33].[FirstName] AS [FirstName5], [Extent33].[MiddleName] AS [MiddleName5], [Extent33].[LastName] AS [LastName5], [Extent33].[Suffix] AS [Suffix5], [Extent33].[EmailPromotion] AS [EmailPromotion5], [Extent33].[AdditionalContactInfo] AS [AdditionalContactInfo5], [Extent33].[Demographics] AS [Demographics5], [Extent33].[rowguid] AS [rowguid17], [Extent33].[ModifiedDate] AS [ModifiedDate17], [Extent34].[BusinessEntityID] AS [BusinessEntityID29], [Extent34].[NationalIDNumber] AS [NationalIDNumber5], [Extent34].[LoginID] AS [LoginID5], [Extent34].[OrganizationLevel] AS [OrganizationLevel5], [Extent34].[JobTitle] AS [JobTitle5], [Extent34].[BirthDate] AS [BirthDate5], [Extent34].[MaritalStatus] AS [MaritalStatus5], [Extent34].[Gender] AS [Gender5], [Extent34].[HireDate] AS [HireDate5], [Extent34].[SalariedFlag] AS [SalariedFlag5], [Extent34].[VacationHours] AS [VacationHours5], [Extent34].[SickLeaveHours] AS [SickLeaveHours5], [Extent34].[CurrentFlag] AS [CurrentFlag5], [Extent34].[rowguid] AS [rowguid18], [Extent34].[ModifiedDate] AS [ModifiedDate18], [Extent35].[BusinessEntityID] AS [BusinessEntityID30], [Extent35].[PasswordHash] AS [PasswordHash5], [Extent35].[PasswordSalt] AS [PasswordSalt5], [Extent35].[rowguid] AS [rowguid20], [Extent35].[ModifiedDate] AS [ModifiedDate20], [Join33].[BusinessEntityID35], [Join33].[rowguid19], [Join33].[ModifiedDate19], [Join33].[BusinessEntityID33], [Join33].[BusinessEntityID31], [Join33].[BusinessEntityID32]
FROM [Person].[Person] AS [Extent33]
LEFT OUTER JOIN [HumanResources].[Employee] AS [Extent34] ON [Extent33].[BusinessEntityID] = [Extent34].[BusinessEntityID]
LEFT OUTER JOIN [Person].[Password] AS [Extent35] ON [Extent33].[BusinessEntityID] = [Extent35].[BusinessEntityID]
LEFT OUTER JOIN (SELECT [Extent36].[BusinessEntityID] AS [BusinessEntityID35], [Extent36].[rowguid] AS [rowguid19], [Extent36].[ModifiedDate] AS [ModifiedDate19], [Join32].[BusinessEntityID33], [Join32].[BusinessEntityID31], [Join32].[BusinessEntityID32]
FROM [Person].[BusinessEntity] AS [Extent36]
INNER JOIN (SELECT [Extent37].[BusinessEntityID] AS [BusinessEntityID33], [Extent38].[BusinessEntityID] AS [BusinessEntityID31], [Extent39].[BusinessEntityID] AS [BusinessEntityID32]
FROM [Person].[Person] AS [Extent37]
LEFT OUTER JOIN [HumanResources].[Employee] AS [Extent38] ON [Extent37].[BusinessEntityID] = [Extent38].[BusinessEntityID]
LEFT OUTER JOIN [Person].[Password] AS [Extent39] ON [Extent37].[BusinessEntityID] = [Extent39].[BusinessEntityID] ) AS [Join32] ON [Extent36].[BusinessEntityID] = [Join32].[BusinessEntityID33] ) AS [Join33] ON [Extent33].[BusinessEntityID] = [Join33].[BusinessEntityID33]
WHERE (N'Griffin' = [Extent33].[LastName]) AND (N'Abigail' = [Extent33].[FirstName]) ) AS [Limit5]
INNER JOIN [Person].[PersonPhone] AS [Extent40] ON [Limit5].[BusinessEntityID34] = [Extent40].[BusinessEntityID]) AS [UnionAll4]
ORDER BY [UnionAll4].[BusinessEntityID] ASC, [UnionAll4].[BusinessEntityID1] ASC, [UnionAll4].[BusinessEntityID2] ASC, [UnionAll4].[BusinessEntityID3] ASC, [UnionAll4].[BusinessEntityID4] ASC, [UnionAll4].[BusinessEntityID5] ASC, [UnionAll4].[BusinessEntityID6] ASC, [UnionAll4].[BusinessEntityID8] ASC, [UnionAll4].[BusinessEntityID10] ASC, [UnionAll4].[C1] ASC
If you wish, you can see the full query (it’s long) your can download it here
So what is the issue with this query?
There are a few things that are wrong with this query (other than it being an unreadable mess thanks to the ORM).
- The physical size of the query is massive for what it is doing.
- The number of records is more than we need.
- The number of tables joined is more than we need.
The Business Scenario & Issue
So as the application has become more popular, the development team has found that the request to pull back the person information is slower than the end user would like.
With a smaller number of users, this wasn’t an issue since the database had plenty of head room available so it would tackle the request and no one had any trouble. But with the addition of more end users, we are now getting to a point where the database is struggling to keep up.
Investigation
So how do we know that this query is the issue?
Before we start performance tuning on the database, it’s very helpful to determine if that really is the bottle neck. If you are able to debug the application locally and reproduce the issue, then that’s fantastic because then you can determine just how much time is being spent executing the query using Intellitrace and other such tools.
Once the database query is identified as a suspect of the performance problems, the next step would be to see what the query is doing. To do this with EF, you can use SQL Profiler which will usually give you the information you require but you will need elevated trace permissions to work with a SQL trace if you wish to use this method. The other issue is that if you have a query that is extremely large (typically anything over 530kb in size) then the profiler GUI will just show “Skipped Records” and you will be stuck. A server side trace is not subject to this GUI limitation so if you run a server side trace then you can get the information using this way.
So if the trace & profiler method isn’t for you or doesn’t work for whatever reason and you are using EF for your ORM then you can use this little bit of code which will output the query to the output window so you can view & re-run it.
if (System.Diagnostics.Debugger.IsAttached)
context.Database.Log = s => System.Diagnostics.Debug.WriteLin
If you put a breakpoint at the point where the EF query runs and you have that debug statement included before you run the query (as shown in the example) then you will see the query output in the Output window.As noted above, there are other tools within Visual Studio that will give you great information around where the pain points in your application are as well such as the Diagnostic Tools window. As you can see in the screenshots, the query that is running to get the person data is taking longer than we would like so that is where we will focus our attention.
Resolution
Once we have identified the culprit for performance woes within a specific part of the application, we are ready to go to the next step and resolve it. In some cases as with this one, there are a few ways to resolve an issue. I will outline them below.
You will want to check the performance of the query or queries using the exeuction plan, SQL Profiler, or another tool of your choosing. You want the overall impact to be as small as possible on the database so that we can speed things up.
Application Updates
With EF, there is likely a way you can restructure the code such that the query that is generated is different & perhaps more performance oriented. In our case we have a few ways we can accomplish this. Two main ways that I typically use would be to limit the data by reducing the object we get back to only the things we need or by extending the context such that the data we need will be lazy loaded as necessary. Each way will have their own pros and cons and depending on the scenario one or the other may be more beneficial for your needs.
Smaller Query
With this method, we have identified that we only need the phone number(s) of a given person so we include that in our original request to the database which still yields a smaller request, only hits the database once, and gives us the data we need. It depends on the circumstance but typically I find that the performance will be better in this method or the method below instead of the large object way of pulling back data.
Person person;
using (var context = new AdventureWorks2014Context())
{
// Write out the query & parameters so we can view everything if the debugger is attached
if (System.Diagnostics.Debugger.IsAttached) context.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);
person = context.People
.Include(p => p.PersonPhones)
.Single(p => p.LastName == lastName && p.FirstName == firstName);
}
Console.WriteLine($"Person: {person.LastName}, {person.FirstName}");
Console.WriteLine("Phone Numbers:");
foreach (var phone in person.PersonPhones)
Console.WriteLine($"....{phone.PhoneNumber}");
The queries that get generated with this way look like this
exec sp_executesql N'SELECT
[Project2].[BusinessEntityID] AS [BusinessEntityID],
[Project2].[PersonType] AS [PersonType],
[Project2].[NameStyle] AS [NameStyle],
[Project2].[Title] AS [Title],
[Project2].[FirstName] AS [FirstName],
[Project2].[MiddleName] AS [MiddleName],
[Project2].[LastName] AS [LastName],
[Project2].[Suffix] AS [Suffix],
[Project2].[EmailPromotion] AS [EmailPromotion],
[Project2].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Project2].[Demographics] AS [Demographics],
[Project2].[rowguid] AS [rowguid],
[Project2].[ModifiedDate] AS [ModifiedDate],
[Project2].[C1] AS [C1],
[Project2].[BusinessEntityID1] AS [BusinessEntityID1],
[Project2].[PhoneNumber] AS [PhoneNumber],
[Project2].[PhoneNumberTypeID] AS [PhoneNumberTypeID],
[Project2].[ModifiedDate1] AS [ModifiedDate1]
FROM ( SELECT
[Limit1].[BusinessEntityID] AS [BusinessEntityID],
[Limit1].[PersonType] AS [PersonType],
[Limit1].[NameStyle] AS [NameStyle],
[Limit1].[Title] AS [Title],
[Limit1].[FirstName] AS [FirstName],
[Limit1].[MiddleName] AS [MiddleName],
[Limit1].[LastName] AS [LastName],
[Limit1].[Suffix] AS [Suffix],
[Limit1].[EmailPromotion] AS [EmailPromotion],
[Limit1].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Limit1].[Demographics] AS [Demographics],
[Limit1].[rowguid] AS [rowguid],
[Limit1].[ModifiedDate] AS [ModifiedDate],
[Extent2].[BusinessEntityID] AS [BusinessEntityID1],
[Extent2].[PhoneNumber] AS [PhoneNumber],
[Extent2].[PhoneNumberTypeID] AS [PhoneNumberTypeID],
[Extent2].[ModifiedDate] AS [ModifiedDate1],
CASE WHEN ([Extent2].[BusinessEntityID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM (SELECT TOP (2)
[Extent1].[BusinessEntityID] AS [BusinessEntityID],
[Extent1].[PersonType] AS [PersonType],
[Extent1].[NameStyle] AS [NameStyle],
[Extent1].[Title] AS [Title],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[MiddleName] AS [MiddleName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Suffix] AS [Suffix],
[Extent1].[EmailPromotion] AS [EmailPromotion],
[Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Extent1].[Demographics] AS [Demographics],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[Person] AS [Extent1]
WHERE ([Extent1].[LastName] = @p__linq__0) AND ([Extent1].[FirstName] = @p__linq__1) ) AS [Limit1]
LEFT OUTER JOIN [Person].[PersonPhone] AS [Extent2] ON [Limit1].[BusinessEntityID] = [Extent2].[BusinessEntityID]
) AS [Project2]
ORDER BY [Project2].[BusinessEntityID] ASC, [Project2].[C1] ASC',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',@p__linq__0=N'Griffin',@p__linq__1=N'Abigail'
Multiple Simple Queries
When using this way, we need to leave the context available so that when the application needs the other data, the context is still available for another query to be passed to the database. The queries that get generated using this method are very small and simple however there will be multiple round trips to the database which may not be ideal.
using (var context = new AdventureWorks2014Context())
{
// Write out the query & parameters so we can view everything if the debugger is attached
if (System.Diagnostics.Debugger.IsAttached) context.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);
// Note that we aren't including any tables in this query
var person = context.People
.Single(p => p.LastName == lastName && p.FirstName == firstName);
// As a result of not including the PersonPhones table, we need to run this part of the logic within the context so that it will not fail since it will make a repeat trip to the database.
Console.WriteLine($"Person: {person.LastName}, {person.FirstName}");
Console.WriteLine("Phone Numbers:");
foreach (var phone in person.PersonPhones)
Console.WriteLine($"....{phone.PhoneNumber}");
}
The queries that get generated with this way look like this
exec sp_executesql N'SELECT TOP (2)
[Extent1].[BusinessEntityID] AS [BusinessEntityID],
[Extent1].[PersonType] AS [PersonType],
[Extent1].[NameStyle] AS [NameStyle],
[Extent1].[Title] AS [Title],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[MiddleName] AS [MiddleName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Suffix] AS [Suffix],
[Extent1].[EmailPromotion] AS [EmailPromotion],
[Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Extent1].[Demographics] AS [Demographics],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[Person] AS [Extent1]
WHERE ([Extent1].[LastName] = @p__linq__0) AND ([Extent1].[FirstName] = @p__linq__1)',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',@p__linq__0=N'Griffin',@p__linq__1=N'Abigail'
exec sp_executesql N'SELECT
[Extent1].[BusinessEntityID] AS [BusinessEntityID],
[Extent1].[PhoneNumber] AS [PhoneNumber],
[Extent1].[PhoneNumberTypeID] AS [PhoneNumberTypeID],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[PersonPhone] AS [Extent1]
WHERE [Extent1].[BusinessEntityID] = @EntityKeyValue1',N'@EntityKeyValue1 int',@EntityKeyValue1=3986
Indexes
Let’s assume that the “Multiple Simple Queries” method above was the fastest way to query data. Now you want to take action against the database so that queries like that will be quicker. So now we will take the queries out of the profiler and we will use SQL Server Management Studio so that we can identify bottle necks.
When we run the queries in management studio, we will want to include the Actual Execution Plan so that we can see exactly what it’s doing. And when we do so, we can see that the first query is taking up a majority of the time and there is room for improvement since it’s doing a key lookup whereas the second query is just doing a clustered index seek which is pretty ideal as far as queries are concerned.
The first query has a key lookup so we will want to evaluate existing indexes to see if we can change one of the indexes to be a covering index to pick up those extra fields. In this example there is an existing index called “IX_Person_LastName_FirstName_MiddleName” which we may be able to change around for the needs of this query but for the sake of learning how to create a new index we will ignore it for now. Now we look at the columns we are using to find the record which are FirstName and LastName. And then we have the primary key which is BusinessEntityId. Those will likely be the columns we need to index. But then we also need to return back various other information about the record which will likely be included columns. From here we will typically create an index & verify that it gets used. And if not we will modify it to fit the query until it does. we would suggest not using index hints though. Those typically end poorly because what will happen is that the development team will forget about that hint being there and make changes later that suit new business needs and now that hint will either break or become antiquated causing a decrease in performance.
Here is the index that we came up with for this query based on the columns being queried and the columns being returned:
create nonclustered index IX_Person_FirstName_LastName_BusinessEntityId on Person.Person (
FirstName asc,
LastName asc,
BusinessEntityID asc
)
include (
PersonType,
NameStyle,
Title,
MiddleName,
Suffix,
EmailPromotion,
AdditionalContactInfo,
Demographics,
rowguid,
ModifiedDate
)
And that index does in fact yield the result we are looking for be because now both queries being executed are doing index seeks as you can see in the screenshot. At this point, you will want to go back to the application and re-run the test scenarios to review the impact you have made to the performance of the application. If possible, it would be ideal to push load against the application so that you can evaluate if the bottleneck is in fact resolved or not. If you are not able to do that locally then you will want to just verify that the application is running quicker than before.
To help with this, SQL Profiler can give you total duration metrics for SQL queries. And Intellitrace and Diagnostic Tools in Visual Studio can give you overall run time metrics to use before deploying your code. If you are having trouble trying to determine what to create for your index, you could use the Database Tuning Advisor tool but any recommendations that it gives should be taken with a grain of salt because it doesn’t look to see if existing schema can be modified to fit the needs of the new query so you could end up creating indexes that are extremely close to existing indexes which is not ideal.
When to stop?
When doing any kind of performance optimization, it’s always important to know what your goal is. Typically, it’s unlikely you will have a documented SLA for the one particular area of the application you are working with. That said, you will likely need to determine what is “good enough” to complete the task of resolving the slow down.
There is no definitive answer here and your mileage will vary. The reason for that is because your application & user expectations will differ drastically depending on the need and other factors. For example, if we are talking about a web application which will have many end users, we will want to make database traffic as quick as reasonably possible so that it will scale out without having users wait for a heavily loaded database to return results. But on the other hand, if we are talking about a background job that runs once a week then making the database call only take 5 milliseconds to run isn’t quite so important anymore.
My general rule of thumb that I try to strive for in this area is to cut the run time down by 90% if at all possible. In this example, we were able to cut down the run time significantly so we were able to meet the need. That said, if we needed to take it a step further the next step(s) I would take would be to start reviewing the statistics of the data we are querying and then we could take further action. After that, then we could start to take more drastic steps like restructing portions of the schema or using a custom stored procedure in place of EF or a host of other avenues we could explore.
Using SQL Server Profiler we can see our changes made the impact we were looking for and we got an overall performance gain of over 90%. So at this point we are done with our optimizations.
Conclusion
That is a very brief introduction into how I go about working on SQL performance issues as a .Net developer. In the examples I was very specific with the use of an ORM but the concepts apply elsewhere. The resolution was also reasonably simple and things can get more complex from here. The general flow that you run through would still be similar regardless of the scenario. We will explore more SQL related topics in future blog posts as well.
All of the code used for this example can be downloaded from GitHub.
The JBS Quick Launch Lab
Free Qualified Assessment
Quantify what it will take to implement your next big idea!
Our assessment session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best. Let JBS prove to you and your team why over 24 years of experience matters.