블로그 이미지
LifeisSimple

calendar

1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30

Notice

2010. 5. 24. 17:41 Brain Trainning/DataBase

Optimising Server-Side Paging - Part II

Introduction

In part I of this series, we looked at an optimisation that is useful when paging through a wide data set.

This part examines four methods commonly employed to return the total number of rows available, at the same time as returning the single page of rows needed for display.

Each tested method works correctly; the focus of this article is to identify the performance characteristics of each method, and explore the reasons for those differences.

Sample Data

This part uses a single table containing one million rows of meteorological data collected at an imaginary weather station.

The code to create a test database, load the sample data, and run the full test suite is included in the Resources section at the end of this article.

Tested Methods

The Count Over Method

The first tested method uses the OVER clause extension to the COUNT aggregate. The syntax needed to count all the rows in the query (not just those on the requested page) is very simple:

The Double Row Number Method

The second method uses two ROW_NUMBER functions to determine the total number of rows, using a technique described in this SQL Server Central article by Robert Cary.

The basic idea is to number the rows in the whole set twice: once in ascending order, and once in descending order. It turns out that the sum of these two numbers (in every row) equals the count of rows in the whole set, plus one. This neat trick can be accomplished with the following code:

The Sub-query Method

The third idea is to use a simple COUNT sub-query, which duplicates the conditions in the main paging query.

The Indexed View Method

The last of the four methods uses an indexed view that contains an aggregated record count per day. The total record count is calculated by summing daily record count subtotals.

Using the view to compute the record count:

Test Results

Each test uses the same basic paging mechanism described in part I of this series, with a small section of code added to count the overall total number of rows. The test query includes all one million test rows in the paged data set.

The tests were run on a single-core machine running SQL Server 2008 Developer Edition, version 10.0.2766. The code has also been tested on SQL Server Developer Edition, version 9.0.4285.

All system caches were cleared before each test run, and the SQL Server read-ahead mechanism was disabled.

Each test use the same million-row data set, using 25 rows per page. Three tests were run using each method, to return data from the first, last, and middle pages of the set.

The data concerning physical reads, logical reads, CPU time, and elapsed time were obtained from the sys.dm_exec_query_stats dynamic management view, and validated against Profiler output. Buffer pool usage was determined from sys.dm_os_buffer_descriptors. Memory grants were obtained from actual execution plans generated on separate runs.

For each performance category in the summary tables below, the best results are shown in green, and the worst in orange.

First page

Middle page

Last page

Analysis

Count Over

This method performs a very large number of logical reads, and requires a memory grant of almost 46MB for sorting. A look at the relevant part of the execution plan for this method reveals the causes:

COUNT(*) OVER() is implemented using a special kind of sub-expression spool, known as a Segment Spool. The idea is to break the input up into groups (the Segment iterator), write each group to a worktable (the Spool), count the rows using a Stream Aggregate, and then join the count back onto the original rows as a new column.

The high number of logical reads incurred by this method is caused by the joins and by replaying the spooled rows twice: once to compute the row count, and then again to join the count back onto each row. The logical writes are caused by writing the rows to the spool.

The large memory grant is requested by the highlighted Sort operator. In current versions of SQL Server, the optimiser introduces this sort to guarantee the order of rows presented to a TOP operator later in the plan (not shown for space reasons). The required sort order is the same as that provided by the initial Index Seek - perhaps future optimisers will be able to take advantage of that and avoid this expensive sort altogether.

The million-row sort also contributes to the high CPU utilisation of this method.

Double Row Number

This method is the slowest overall, with high CPU usage, a large memory grant, and the largest number of physical reads.

Although the initial Index Seek provides rows in the correct order for the first row numbering operation, an explicit sort is required for the second.

Another explicit sort (the Top N Sort) is required to select the keys for the single page requested. Ironically, this sort puts the rows back in the original order provided by the Index Seek.

The two sorts both have to process one million rows, though the memory granted for the first sort can be reused by the second.

Sub-Query

The sub-query method produces a nice simple plan, and performs very well:

The top row of the plan performs the count sub-query. Since the query is guaranteed to produce a single row, it can be joined directly to the Index Seek that provides the keys for the page of data to return.

The lower Index Seek provides page keys in sorted order, so for page one, it only needs to return the first 25 keys. The biggest cost in this plan is counting the million rows in the Stream Aggregate.

Indexed View

This is the best-performing solution overall:

This plan is very similar to that produced by the sub-query method, but instead of counting one million rows, the top row of the plan is able to sum the partial counts stored in the indexed view - so only 695 rows flow through the Stream Aggregate (rather than one million).

This dramatic reduction in row count pays dividends across all the performance categories. In particular, it reduces the number of data and index pages which must be read in to the data cache from disk.

Conclusion

The count over and double row number methods are not really suited to large data sets, due to the cost of the sorts and spools.

The sub-query method is much more efficient, and is limited only by the costs associated with counting the qualifying rows.

The indexed view method improves further on the sub-query method, by maintaining useful partial aggregates. This is similar to the idea of keeping counts in a separate table using a system of triggers.

Resources:

Optimising Server-Side Paging Part II.sql

By Paul White, 2010/05/24

Total article views: 948 | Views in the last 30 days: 948 

출처 : http://www.sqlservercentral.com/articles/paging/70120/

'Brain Trainning > DataBase' 카테고리의 다른 글

MSSql 2005 SP 적용  (0) 2010.06.18
DBA 일간 체크리스트  (0) 2010.06.03
Replication Across Non-Trusted Domains (펌)  (0) 2010.05.17
Optimising Server-Side Paging (펌)  (0) 2010.05.04
Server Side Paging with SQL Server 2005 (펌)  (0) 2010.05.04
posted by LifeisSimple
2010. 5. 4. 11:08 Brain Trainning/DataBase

Optimising Server-Side Paging - Part I

By Paul White, 2010/04/26

Total article views: 7603 | Views in the last 30 days: 7603

Introduction

It is common to need to access rows from a table or result set one page at a time, perhaps for display purposes.This short series of articles explores a number of optimisations that can be applied to the basic technique of using the ROW_NUMBER ranking function to identify the rows to return for a requested page.

This series does not include an introduction to the core method, since there are a number of good articles already on SQL Server Central which do this. An excellent explanation can be found in this article by regular columnist Jacob Sebastian.

This first article takes a close look at an optimisation that is useful when paging through a wide data set. The next parts in the series will cover optimisations for returning the total number of pages available, and implementing custom ordering.

For layout reasons, the code samples in this article are rendered as images. A full test script with additional annotations can be found in the Resources section at the end of this article.

Paging through a wide data set

A wide data set is one which has a large average row size. Paging through a set like this presents some special challenges. To illustrate, this article uses a table that might form part of a design for an on-line discussion forum (like the ones here on this site).

The rows of the example table are potentially quite wide, since the body text of a posted message can use up to 2500 Unicode characters.

Sample Data

The following code creates the test table and populates it with 10,000 random rows.

The task is to return data from this table as a paged data set. The client will supply the page number required, and the number of rows per page.

Initial solution

One solution to this problem uses a common table expression (CTE) to number the rows, and an expression in the WHERE clause to return just the rows that correspond to the single page of data required. A TOP expression is used to allow SQL Server to stop looking through the table as soon as it has found the rows needed.

Running the above code on our test data produces an execution plan like this:

This is a simple and efficient-looking plan, but scanning the clustered index of a wide table can quickly become expensive. The following graph shows the number of physical reads incurred using this method, when reading a specified page number from our test table, with 50 rows per page.

Why scanning the clustered index can be inefficient

The cost of scanning the clustered index rises quickly due to the high average size of the data rows. This might seem counter-intuitive, since the clustered index key is very narrow - just 4 bytes for the integer post_id column.

While that is certainly true, the key just defines the logical order of data pages - the leaf level of the clustered index contains the entire data row, by definition. When SQL Server scans a clustered index in an ordered fashion, it follows a linked list of page ids. Since the linked list is found at the leaf level, entires are separated by the full width of the data row. This makes the clustered index the least-dense index possible.

Alternative Methods

Using a non-clustered index

One alternative to scanning the clustered index, is to create a covering non-clustered index. In this case, however, a covering index is not practical since it would essentially duplicate the clustered index.

We might instead consider creating a non-clustered index just on the post_id column, with a plan to use that index to quickly find the page required, and then look up the rest of the row data. The index is created using this statement:

This new index is very narrow (just 4 bytes per row at the leaf, plus index overhead), which ought to make finding the requested page very much more efficient. Sadly, running the query again with a forced index produces the following execution plan:

This plan is very similar to that generated for the clustered index, except now the query performs a bookmark lookup for every row, in order to fetch the columns not included in the index. This plan is less efficient than scanning the clustered index, so it seems that this is a step backward.

The problem is that SQL Server is fetching the off-index columns for every row it examines - not just the rows that will be eventually returned. The Key Lookup operator fetches the extra columns before the Sequence Project operator assigns a row number, and before the Filter has a chance to restrict the rows based on the assigned row number.

A much better execution plan would scan the non-clustered index, assign row numbers, filter the records, and only then use a Key Lookup to fetch the off-index columns. Since we know we will only be returning a maximum of 50 rows, it seems likely that the cost of 50 lookups would be more than offset by scanning the much narrower non-clustered index to begin with.

Unfortunately, SQL Server does not yet include logic to spot this sort of optimisation, and always places a Key Lookup just after the Index Seek or Scan it is associated with.

The 'Key Seek' Method

We can work around this limitation by rewriting the query as three logically separate steps:

  1. Start a partial scan of the non-clustered index in post_id order, assigning row numbers as we go.
  2. Filter those row numbers down to just the single page of rows we want.
  3. Use the 50 primary keys from step 2 to look up the off-index columns for the final output.

The following code implements these three steps in a single statement:

The query plan for this implementation is:

Notice that the Sequence Project and Filter operators now precede the Key Lookup in the plan, meaning that a maximum of 50 lookups will be performed. Because this method performs an index seek on filtered keys, it is referred to as the Key Seek method.

Testing and Results

Test environment

The tests were run on a single-core machine running SQL Server 2008 Developer Edition, version 10.0.2757. The code has also been tested on SQL Server Developer Edition, version 9.0.4285.

System caches, including the buffer pool (data cache) were fully cleared before each run, and the SQL Server read-ahead mechanism was disabled.

Clearing the buffer pool ensures that each data page required comes from disk, and disabling read-ahead ensures repeatable results on different editions of SQL Server. Early testing found that enabling read-ahead tended to disadvantage the clustered index scanning method, since SQL Server frequently read more pages than were ultimately required, resulting in an artificially high number of reads.

The purpose of clearing the cache and disabling read-ahead is to produce clear and consistent test results. Do not run the scripts included in this article on anything other than a personal or dedicated test server.

All tests were run on the same 10,000 row data set, using 50 rows per page, and requesting page numbers 1, 10, 50, 100, and 200 (the last page).

The data concerning physical reads, logical reads, CPU time, and elapsed time were obtained from the sys.dm_exec_query_stats dynamic management view, and validated against Profiler output and from runs with SET STATISTICS IO, TIME ON. Buffer pool usage was determined from sys.dm_os_buffer_descriptors.

The full test script is included in the Resources section at the end of this article.

Physical reads

Although the query plan for the Key Seek method looks a little more complex than the clustered index scan version, it is surprisingly efficient, with consistent performance across the whole range of tested page numbers. The following graph shows the physical reads incurred by the Key Seek method, in comparison to the clustered index scan.

The Key Seek method ranges from 6 to 10 physical reads, whereas the clustered scan method uses between 5 and 694 physical reads.

Confirmation of the small number of data pages needed by the Key Seek method can be found later in this article, in the analysis of Buffer Pool memory usage. That analysis also shows that a each physical read was able to load multiple 8KB data pages.

Logical reads

Logical reads are the sum of read operations on pages already in the data cache, plus any physical reads from external storage. Operations on a cached page are very fast - many thousands of times faster than a physical read - but we must nevertheless account for them. The following graph shows a comparison of logical reads for the two methods:

Again, the Key Seek method is very consistent, though it does perform more logical reads than the clustered index scan for the first few pages. Since we already know how many physical reads were required, it is clear that very nearly all of the logical reads are reads from data cache.

CPU and total elapsed time

Total worker time is generally slightly higher for the Key Seek method, due to the extra logical operations performed. As in the other tests, the difference is most pronounced for low numbered pages.

The tests for elapsed time show the Key Seek ahead in all cases - emphasising the dominant contribution of physical I/O.

Memory Usage

This test measures the number of 8KB index and data pages brought into the buffer pool by each method.

The clustered index scan consumes very slightly less memory only for the very first page. After the first page, the Key Seek method has a smaller buffer pool requirement.

This result confirms the observations from the physical read test, showing that the Key Seek method touches a consistently small number of data pages, and that many data pages can be fetched with a single physical read.

Conclusion

The Key Seek method provides a fast and efficient way to implement server-side paging when the source row data is relatively wide. The method gives predictable and consistent performance regardless of the page number requested, while making efficient use of system resources.

Very high-volume applications that frequently access the first few pages of a result set might consider combining the two methods presented here - using a clustered index seek for the first few pages, and switching to the Key Seek method for all other requests.

A comprehensive test script containing all the code samples from this article, together with additional comments and annotations can be found below.

Resources:

Optimising Server-Side Paging Part I.sql

By Paul White, 2010/04/26

Total article views: 7603 | Views in the last 30 days: 7603 

출처> http://www.sqlservercentral.com/articles/paging/69892/
posted by LifeisSimple
2010. 5. 4. 11:07 Brain Trainning/DataBase

Server Side Paging With SQL Server 2005

By Jacob Sebastian, 2007/08/29

Total article views: 10905 | Views in the last 30 days: 551

Introduction

Most of the web developers must have come across the requirement to implement Server Side Paging of the data to increase the performance of the application. In the absence of Server Side Paging, the application will fetch all the data from the database server and load a specific number of records (depending on the current page being viewed by the user). Assume that a table has 10,000 records and the paging used by the application is 50. When the user clicks on Page 2, the application will fetch all the 10,000 records from the database server and then load records 51 to 100 to the UI control on the web page. This shows that, we are fetching a lot of records which we really do not use. By fetching only the records that we need (in the above example, records 51 from 100) from the database server, we can gain better performance at the database server level as well as at the application level.

There are quite a few articles available on Internet which address this problem from different angles. Some of the interesting articles that I could find are the following:

None of the articles I could find online was considering all the requirements that I was looking for. I wanted that the Server Side Paging code should consider the following points:

  • Select the required number of records based on the current page count and the page size. If the page size is 25 records and if we are on page 4, then we need to retrieve records from 76 to 100.
  • The sort order needs to be handled. The data that we need to retrieve for page 4 will be different when the sort order changes. For example, when the sort order is First Name a different set of records are to be returned than City.
  • Filters need to be applied in the TSQL code. Most of the times, the data is retrieved against a search operation which takes various filter values. For example, the Employee search might take filters like First Name, City or Hire Date. It could also be that, the filters are optional. None, one, many or all of the filters can be specified in the query. If a filter is provided, then the data needs to be filtered for that condition. Otherwise, that filter should be ignored.

At this point, I thought of writing my own version of the Server Side Paging TSQL code which takes care of all the points mentioned above.

Sample Code

We will use the NorthWind database for the purpose of this example. The following are the requirements that this example will fulfill.  

  1. A web page needs to be created for displaying a list of Employees
  2. User can search by First Name, Title and City
  3. User can enter None, One, Two or All of the filters
  4. We will use LIKE matching while applying the filters
  5. The page should display only 10 records at a time. Paging should be implemented for viewing other records.
  6. When a specific page number is clicked, the data of that page needs to be loaded
  7. User can sort the results by First Name, Title, City, or Hire Date
  8. After sorting the results by a column, when the user clicks on a page number, the paging should happen based on the current sort order.

Here is the stored procedure which satisfies the above requirements. [code]

    1 CREATE PROCEDURE GetEmployees(

    2     @LastName VARCHAR(20) = NULL,

    3     @Title VARCHAR(20) = NULL,

    4     @City VARCHAR(20) = NULL,

    5     @PageSize INT = 5,

    6     @PageNumber INT = 1,

    7     @SortOrder VARCHAR(20) = 'LastName'

    8 )

    9 AS

   10 

   11 SET NOCOUNT ON

   12 /*

   13     Let us use a CTE to simplify the code. The below CTE makes the code easier

   14     to read and understand.

   15 */

   16 ;WITH emp AS (

   17 SELECT

   18     /*

   19         Based on the sort order passed into the stored procedure, a Record Identifier

   20         (sequential number) is generated using the ROW_NUMBER() method. The sequential

   21         number is generated in the sorted order.

   22     */

   23     CASE

   24         WHEN @SortOrder = 'Title' THEN ROW_NUMBER()OVER (ORDER BY Title)

   25         WHEN @SortOrder = 'HireDate' THEN ROW_NUMBER()OVER (ORDER BY HireDate)

   26         WHEN @SortOrder = 'City' THEN ROW_NUMBER()OVER (ORDER BY City)

   27         -- In all other cases, assume that @SortOrder = 'LastName'

   28         ELSE ROW_NUMBER()OVER (ORDER BY LastName)

   29     END AS RecID,

   30     LastName,

   31     FirstName,   

   32     Title,

   33     HireDate,

   34     City,

   35     Country,

   36     PostalCode

   37 FROM employees

   38 WHERE

   39     /*

   40         Apply the filter. If the filter is specified, then apply the filter.

   41         If not, ignore the filter.

   42     */

   43     (@LastName IS NULL OR LastName LIKE '%' + @LastName + '%')

   44     AND

   45     (@Title IS NULL OR Title LIKE '%' + @Title + '%')

   46     AND

   47     (@City IS NULL OR City LIKE '%' + @City + '%')

   48 )

   49 

   50 /*

   51     Select the final query result.

   52 */

   53 SELECT

   54     RecID,

   55     LastName,

   56     Title,

   57     HireDate,

   58     City

   59 FROM emp

   60 /*

   61     Apply a RANGE filter on the requested SORT ORDER to retrieve the records of the

   62     current page. If the "Page Number" is 3 and "Page Size" is 30 then records 61 to

   63     90 are retrieved.

   64 */

   65 WHERE RecID BETWEEN ((@PageNumber - 1) * @PageSize) + 1 AND @PageNumber * @PageSize

   66 /*

   67     "RecID" is a value generated by the previous CTE based on the sort order specified

   68     by the @SortOrder parameter.

   69 */

   70 ORDER BY RecID

Let us execute the stored procedure. [code]

    1 -- Let us retrieve the first page sorted by "Last Name"

    2 EXECUTE GetEmployees @PageSize = 3, @PageNumber = 1, @SortOrder = 'LastName'

    3 

    4 /*

    5 OUTPUT:

    6 

    7 RecID                LastName             Title                         HireDate                City

    8 -------------------- -------------------- ------------------------------ ----------------------- ---------------

    9 1                    Buchanan             Sales Manager                 1993-10-17 00:00:00.000 London

   10 2                    Callahan             Inside Sales Coordinator       1994-03-05 00:00:00.000 Seattle

   11 3                    Davolio             Sales Representative           1992-05-01 00:00:00.000 Seattle

   12 */

   13 

   14 -- Let us retrieve the second page sorted by "Last Name"

   15 EXECUTE GetEmployees @PageSize = 3, @PageNumber = 2, @SortOrder = 'LastName'

   16 

   17 /*

   18 OUTPUT:

   19 

   20 RecID                LastName             Title                         HireDate                City

   21 -------------------- -------------------- ------------------------------ ----------------------- ---------------

   22 4                    Dodsworth            Sales Representative           1994-11-15 00:00:00.000 London

   23 5                    Fuller               Vice President, Sales         1992-08-14 00:00:00.000 Tacoma

   24 6                    King                 Sales Representative           1994-01-02 00:00:00.000 London

   25 */

   26 

   27 -- Let us retrieve the third page sorted by "City"

   28 EXECUTE GetEmployees @PageSize = 3, @PageNumber = 3, @SortOrder = 'City'

   29 

   30 /*

   31 OUTPUT:

   32 

   33 RecID                LastName             Title                         HireDate                City

   34 -------------------- -------------------- ------------------------------ ----------------------- ---------------

   35 7                    Davolio             Sales Representative           1992-05-01 00:00:00.000 Seattle

   36 8                    Callahan             Inside Sales Coordinator       1994-03-05 00:00:00.000 Seattle

   37 9                    Fuller               Vice President, Sales         1992-08-14 00:00:00.000 Tacoma

   38 */

Conclusions

There are different ways to implement the above functionality. The above code can be re-written in different ways. For example, the ORDER BY clause can take an expression which uses a CASE statement with ROW_NUMBER().  


<출처> http://www.sqlservercentral.com/articles/Advanced+Querying/3181/

posted by LifeisSimple
prev 1 next