일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

'paging'에 해당되는 글 3건

2010.05.24 Optimising Server-Side Paging - Part II (펌)
2010.05.04 Optimising Server-Side Paging (펌)
2010.05.04 Server Side Paging with SQL Server 2005 (펌)

2010. 5. 24. 17:41 Brain Trainning/DataBase

Optimising Server-Side Paging - Part II (펌)

Optimising Server-Side Paging - Part II

Introduction

In part I of this series, we looked at an optimisation that is useful when paging through a wide data set.

This part examines four methods commonly employed to return the total number of rows available, at the same time as returning the single page of rows needed for display.

Each tested method works correctly; the focus of this article is to identify the performance characteristics of each method, and explore the reasons for those differences.

Sample Data

This part uses a single table containing one million rows of meteorological data collected at an imaginary weather station.

Zoom in | Open in new window

The code to create a test database, load the sample data, and run the full test suite is included in the Resources section at the end of this article.

Tested Methods

The Count Over Method

The first tested method uses the OVER clause extension to the COUNT aggregate. The syntax needed to count all the rows in the query (not just those on the requested page) is very simple:

The Double Row Number Method

The second method uses two ROW_NUMBER functions to determine the total number of rows, using a technique described in this SQL Server Central article by Robert Cary.

The basic idea is to number the rows in the whole set twice: once in ascending order, and once in descending order. It turns out that the sum of these two numbers (in every row) equals the count of rows in the whole set, plus one. This neat trick can be accomplished with the following code:

The Sub-query Method

The third idea is to use a simple COUNT sub-query, which duplicates the conditions in the main paging query.

Zoom in | Open in new window

The Indexed View Method

The last of the four methods uses an indexed view that contains an aggregated record count per day. The total record count is calculated by summing daily record count subtotals.

Zoom in | Open in new window

Using the view to compute the record count:

Zoom in | Open in new window

Test Results

Each test uses the same basic paging mechanism described in part I of this series, with a small section of code added to count the overall total number of rows. The test query includes all one million test rows in the paged data set.

The tests were run on a single-core machine running SQL Server 2008 Developer Edition, version 10.0.2766. The code has also been tested on SQL Server Developer Edition, version 9.0.4285.

All system caches were cleared before each test run, and the SQL Server read-ahead mechanism was disabled.

Each test use the same million-row data set, using 25 rows per page. Three tests were run using each method, to return data from the first, last, and middle pages of the set.

The data concerning physical reads, logical reads, CPU time, and elapsed time were obtained from the sys.dm_exec_query_stats dynamic management view, and validated against Profiler output. Buffer pool usage was determined from sys.dm_os_buffer_descriptors. Memory grants were obtained from actual execution plans generated on separate runs.

For each performance category in the summary tables below, the best results are shown in green, and the worst in orange.

Analysis

Count Over

This method performs a very large number of logical reads, and requires a memory grant of almost 46MB for sorting. A look at the relevant part of the execution plan for this method reveals the causes:

Zoom in | Open in new window

COUNT(*) OVER() is implemented using a special kind of sub-expression spool, known as a Segment Spool. The idea is to break the input up into groups (the Segment iterator), write each group to a worktable (the Spool), count the rows using a Stream Aggregate, and then join the count back onto the original rows as a new column.

The high number of logical reads incurred by this method is caused by the joins and by replaying the spooled rows twice: once to compute the row count, and then again to join the count back onto each row. The logical writes are caused by writing the rows to the spool.

The large memory grant is requested by the highlighted Sort operator. In current versions of SQL Server, the optimiser introduces this sort to guarantee the order of rows presented to a TOP operator later in the plan (not shown for space reasons). The required sort order is the same as that provided by the initial Index Seek - perhaps future optimisers will be able to take advantage of that and avoid this expensive sort altogether.

The million-row sort also contributes to the high CPU utilisation of this method.

Double Row Number

This method is the slowest overall, with high CPU usage, a large memory grant, and the largest number of physical reads.

Zoom in | Open in new window

Although the initial Index Seek provides rows in the correct order for the first row numbering operation, an explicit sort is required for the second.

Another explicit sort (the Top N Sort) is required to select the keys for the single page requested. Ironically, this sort puts the rows back in the original order provided by the Index Seek.

The two sorts both have to process one million rows, though the memory granted for the first sort can be reused by the second.

Sub-Query

The sub-query method produces a nice simple plan, and performs very well:

Zoom in | Open in new window

The top row of the plan performs the count sub-query. Since the query is guaranteed to produce a single row, it can be joined directly to the Index Seek that provides the keys for the page of data to return.

The lower Index Seek provides page keys in sorted order, so for page one, it only needs to return the first 25 keys. The biggest cost in this plan is counting the million rows in the Stream Aggregate.

Indexed View

This is the best-performing solution overall:

Zoom in | Open in new window

This plan is very similar to that produced by the sub-query method, but instead of counting one million rows, the top row of the plan is able to sum the partial counts stored in the indexed view - so only 695 rows flow through the Stream Aggregate (rather than one million).

This dramatic reduction in row count pays dividends across all the performance categories. In particular, it reduces the number of data and index pages which must be read in to the data cache from disk.

Conclusion

The count over and double row number methods are not really suited to large data sets, due to the cost of the sorts and spools.

The sub-query method is much more efficient, and is limited only by the costs associated with counting the qualifying rows.

The indexed view method improves further on the sub-query method, by maintaining useful partial aggregates. This is similar to the idea of keeping counts in a separate table using a system of triggers.

Resources:

Optimising Server-Side Paging Part II.sql

By Paul White, 2010/05/24

Total article views: 948 | Views in the last 30 days: 948

출처 : http://www.sqlservercentral.com/articles/paging/70120/

저작자표시 비영리 변경금지

'Brain Trainning > DataBase' 카테고리의 다른 글

MSSql 2005 SP 적용 (0)	2010.06.18
DBA 일간 체크리스트 (0)	2010.06.03
Replication Across Non-Trusted Domains (펌) (0)	2010.05.17
Optimising Server-Side Paging (펌) (0)	2010.05.04
Server Side Paging with SQL Server 2005 (펌) (0)	2010.05.04

posted by LifeisSimple

2010. 5. 4. 11:08 Brain Trainning/DataBase

Optimising Server-Side Paging (펌)

Optimising Server-Side Paging - Part I

By Paul White, 2010/04/26

Total article views: 7603 | Views in the last 30 days: 7603

Rate this |

Join the discussion |

Briefcase |

Introduction

It is common to need to access rows from a table or result set one page at a time, perhaps for display purposes.This short series of articles explores a number of optimisations that can be applied to the basic technique of using the ROW_NUMBER ranking function to identify the rows to return for a requested page.

This series does not include an introduction to the core method, since there are a number of good articles already on SQL Server Central which do this. An excellent explanation can be found in this article by regular columnist Jacob Sebastian.

This first article takes a close look at an optimisation that is useful when paging through a wide data set. The next parts in the series will cover optimisations for returning the total number of pages available, and implementing custom ordering.

For layout reasons, the code samples in this article are rendered as images. A full test script with additional annotations can be found in the Resources section at the end of this article.

Paging through a wide data set

A wide data set is one which has a large average row size. Paging through a set like this presents some special challenges. To illustrate, this article uses a table that might form part of a design for an on-line discussion forum (like the ones here on this site).

The rows of the example table are potentially quite wide, since the body text of a posted message can use up to 2500 Unicode characters.