블로그 이미지
LifeisSimple

calendar

1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

Notice

'페이징'에 해당되는 글 2

  1. 2010.05.24 Optimising Server-Side Paging - Part II (펌)
  2. 2010.04.19 페이징쿼리 속도 비교 (펌글)
2010. 5. 24. 17:41 Brain Trainning/DataBase

Optimising Server-Side Paging - Part II

Introduction

In part I of this series, we looked at an optimisation that is useful when paging through a wide data set.

This part examines four methods commonly employed to return the total number of rows available, at the same time as returning the single page of rows needed for display.

Each tested method works correctly; the focus of this article is to identify the performance characteristics of each method, and explore the reasons for those differences.

Sample Data

This part uses a single table containing one million rows of meteorological data collected at an imaginary weather station.

The code to create a test database, load the sample data, and run the full test suite is included in the Resources section at the end of this article.

Tested Methods

The Count Over Method

The first tested method uses the OVER clause extension to the COUNT aggregate. The syntax needed to count all the rows in the query (not just those on the requested page) is very simple:

The Double Row Number Method

The second method uses two ROW_NUMBER functions to determine the total number of rows, using a technique described in this SQL Server Central article by Robert Cary.

The basic idea is to number the rows in the whole set twice: once in ascending order, and once in descending order. It turns out that the sum of these two numbers (in every row) equals the count of rows in the whole set, plus one. This neat trick can be accomplished with the following code:

The Sub-query Method

The third idea is to use a simple COUNT sub-query, which duplicates the conditions in the main paging query.

The Indexed View Method

The last of the four methods uses an indexed view that contains an aggregated record count per day. The total record count is calculated by summing daily record count subtotals.

Using the view to compute the record count:

Test Results

Each test uses the same basic paging mechanism described in part I of this series, with a small section of code added to count the overall total number of rows. The test query includes all one million test rows in the paged data set.

The tests were run on a single-core machine running SQL Server 2008 Developer Edition, version 10.0.2766. The code has also been tested on SQL Server Developer Edition, version 9.0.4285.

All system caches were cleared before each test run, and the SQL Server read-ahead mechanism was disabled.

Each test use the same million-row data set, using 25 rows per page. Three tests were run using each method, to return data from the first, last, and middle pages of the set.

The data concerning physical reads, logical reads, CPU time, and elapsed time were obtained from the sys.dm_exec_query_stats dynamic management view, and validated against Profiler output. Buffer pool usage was determined from sys.dm_os_buffer_descriptors. Memory grants were obtained from actual execution plans generated on separate runs.

For each performance category in the summary tables below, the best results are shown in green, and the worst in orange.

First page

Middle page

Last page

Analysis

Count Over

This method performs a very large number of logical reads, and requires a memory grant of almost 46MB for sorting. A look at the relevant part of the execution plan for this method reveals the causes:

COUNT(*) OVER() is implemented using a special kind of sub-expression spool, known as a Segment Spool. The idea is to break the input up into groups (the Segment iterator), write each group to a worktable (the Spool), count the rows using a Stream Aggregate, and then join the count back onto the original rows as a new column.

The high number of logical reads incurred by this method is caused by the joins and by replaying the spooled rows twice: once to compute the row count, and then again to join the count back onto each row. The logical writes are caused by writing the rows to the spool.

The large memory grant is requested by the highlighted Sort operator. In current versions of SQL Server, the optimiser introduces this sort to guarantee the order of rows presented to a TOP operator later in the plan (not shown for space reasons). The required sort order is the same as that provided by the initial Index Seek - perhaps future optimisers will be able to take advantage of that and avoid this expensive sort altogether.

The million-row sort also contributes to the high CPU utilisation of this method.

Double Row Number

This method is the slowest overall, with high CPU usage, a large memory grant, and the largest number of physical reads.

Although the initial Index Seek provides rows in the correct order for the first row numbering operation, an explicit sort is required for the second.

Another explicit sort (the Top N Sort) is required to select the keys for the single page requested. Ironically, this sort puts the rows back in the original order provided by the Index Seek.

The two sorts both have to process one million rows, though the memory granted for the first sort can be reused by the second.

Sub-Query

The sub-query method produces a nice simple plan, and performs very well:

The top row of the plan performs the count sub-query. Since the query is guaranteed to produce a single row, it can be joined directly to the Index Seek that provides the keys for the page of data to return.

The lower Index Seek provides page keys in sorted order, so for page one, it only needs to return the first 25 keys. The biggest cost in this plan is counting the million rows in the Stream Aggregate.

Indexed View

This is the best-performing solution overall:

This plan is very similar to that produced by the sub-query method, but instead of counting one million rows, the top row of the plan is able to sum the partial counts stored in the indexed view - so only 695 rows flow through the Stream Aggregate (rather than one million).

This dramatic reduction in row count pays dividends across all the performance categories. In particular, it reduces the number of data and index pages which must be read in to the data cache from disk.

Conclusion

The count over and double row number methods are not really suited to large data sets, due to the cost of the sorts and spools.

The sub-query method is much more efficient, and is limited only by the costs associated with counting the qualifying rows.

The indexed view method improves further on the sub-query method, by maintaining useful partial aggregates. This is similar to the idea of keeping counts in a separate table using a system of triggers.

Resources:

Optimising Server-Side Paging Part II.sql

By Paul White, 2010/05/24

Total article views: 948 | Views in the last 30 days: 948 

출처 : http://www.sqlservercentral.com/articles/paging/70120/

'Brain Trainning > DataBase' 카테고리의 다른 글

MSSql 2005 SP 적용  (0) 2010.06.18
DBA 일간 체크리스트  (0) 2010.06.03
Replication Across Non-Trusted Domains (펌)  (0) 2010.05.17
Optimising Server-Side Paging (펌)  (0) 2010.05.04
Server Side Paging with SQL Server 2005 (펌)  (0) 2010.05.04
posted by LifeisSimple
2010. 4. 19. 16:29 Brain Trainning/DataBase
페이징 쿼리 종류별 성능 비교 자료

페이징 쿼리 종류별로 성능을 비교한 좋은 자료가 있어서 상당부분 인용했습니다.

"ex)"에 들어가는 샘플 쿼리는 바투 락커룸 DB에 파일첨부(TB_AttachFile) 테이블을 대상으로 

페이징 쿼리를 작성해 봤습니다.


5번에 표시된 내용이 데이터 건수에 상관없이 실질적으로 가장 빠르지만 우리 시스템에 적용할 수 있는지는

의문이 갑니다.

테이블 별로 인덱스 생성이 필요할 수 있고 중간에 데이터가 삭제되거나 어떤 반응이 생길 때 문제가 될 

소지가 있는지도 조사해야 할 것 같습니다.

참고로 웹젠 빌링에서 사용하던 페이징 쿼리는 4번을 이용했었습니다.

그리고 2번에 샘플 쿼리를 보면 TB_AttachFile 테이블에 FileGuid가 None Clustered Index로 걸려 있어서 

서브쿼리 내에서도 ORDER BY FileGuid DESC를 해줘야합니다.(즉 ORDER BY 를 2번 해야한다는..)

대부분의 기본키들이 None Clustered Index 라 이 점에 대한 이슈도 있습니다.



1. SELECT TOP [불러올 총 게시물수] [출력 필드명] FROM [테이블 명]  

  레코드셋의 AbsolutePage를 이용해서 페이징을 했습니다.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 



2. SELECT TOP [불러올 총 게시물수] [출력 필드명] FROM [테이블 명]
   WHERE [글번호필드] NOT IN (SELECT TOP [제거할 게시물수] [글번호필드] FROM [테이블명])

   예전에 태요 사이트에서 보았던 쿼리 구문입니다.. NOT IN 때문에.. 문제시 되었던 쿼리구문이죠.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 
WHERE FileGuid NOT IN 
(
 SELECT TOP 0 FileGuid 
 FROM TB_AttachFile
 ORDER BY FileGuid DESC

ORDER BY FileGuid DESC



3. SELECT TOP [불러올 총 게시물수] [출력 필드명] FROM [테이블 명]
   WHERE [글번호] IN (SELECT TOP [페이지출력 갯수] [글번호] FROM
   (SELECT TOP [불러올 총 게시물수] [글번호] FROM [테이블 명]) AS A ORDER BY [글번호])
   ORDER BY [글번호] DESC

   이 쿼리 구문은 2번의 쿼리 구문의 문제점을 보완한 구문입니다. NOT IN 대신에 IN을 사용 했습니다.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 
WHERE FileGuid IN 
(
 SELECT TOP 123 FileGuid -- 총 데이터수 - ( (페이지수 - 1) * 10) // 1페이지 : 123 - 0, 2페이지 : 123 - 10 ...
 FROM 
 (
  SELECT FileGuid 
  FROM TB_AttachFile 
 )AS A 
 ORDER BY FileGuid

ORDER BY FileGuid DESC



4. SELECT TOP [페이지 출력갯수] [출력 필드명] FROM [테이블 명]
   WHERE [글번호] <= (SELECT MIN([글번호])
   FROM (SELECT TOP [제거할 게시물수] + 1 [글번호] FROM [테이블명]) AS A)

   4번째 쿼리 구문은 IN, NOT IN이 아닌 출력할 마지막 글번호 바로 앞이 글번호를 찾아서 처리를 해주는
   쿼리 구문입니다.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 
WHERE FileGuid <=
(
 SELECT MIN(FileGuid)
 FROM 
 (
  SELECT TOP 11 FileGuid -- (현재페이지수-1) * 10 + 1  
  FROM TB_AttachFile
  ORDER BY FileGuid DESC
 )AS A 

ORDER BY FileGuid DESC



5. SELECT TOP [페이지 출력갯수] [출력 필드명] FROM [테이블명]
   WHERE [글번호] <= (SELECT MIN([글번호])
   FROM (SELECT TOP [제거할 게시물수] + 1 [글번호]
   FROM [테이블명] WHERE [인덱스] = [시작인덱스번호] AND [인덱스] = [끝인덱스번호]) AS A
   WHERE [인덱스] = [시작인덱스번호] AND [인덱스] = [끝인덱스번호])
   AND [인덱스] = [시작인덱스번호] AND [인덱스] = [끝인덱스번호]

   5번재 쿼리는.. 글에 인덱스(가칭)라는 필드를 하나 더 추가 해서.. 글 기본 2000개마다 (가변적입니다)
   인덱스를 증가 시켰습니다. 즉 2000개를 하나의 묶음으로 만든것입니다.
   그 인덱스를 기준으로 처리를 해주었습니다.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 
WHERE FileGuid <=
(
 SELECT MIN(FileGuid)
 FROM 
 (
  SELECT TOP 11 FileGuid -- (현재페이지수-1) * 10 + 1  
  FROM TB_AttachFile
  WHERE IDX_FileGuid > 350  -- 350은 이전 페이지의 끝 게시물 번호. (350보다 큰 10개를 얻음.)
  ORDER BY FileGuid DESC
 )AS A 

ORDER BY FileGuid DESC



결과.

게시물은 100만개를 넣고 테스트를 했습니다
서버정보 : CPU : p4-1.8, RAM : 768Mb, 컴팩 프리자리오 2820AP, 환경 : 윈도우2003 MSSQL2000
처음페이지(1), 마지막 페이지(50000) 처리 시간이 아래와 같습니다.(단위 ms)

 첫페이지 실행    끝페이지 실행
1 :  273                 11476.56
2 :  289                 4406.25
3 :  289                 2695.31
4 :  289                 1218.75
5 :  7.81                23.44

<출처> http://tit99hds.egloos.com/888201

'Brain Trainning > DataBase' 카테고리의 다른 글

SQL Server Error Logs from Multiple Instances Using SSIS (펌)  (0) 2010.04.28
Managing Max Degree of Parallelism (펌)  (0) 2010.04.28
Not In Query  (0) 2010.04.15
Job List 적용  (0) 2010.04.14
Partition Table 의 구현  (0) 2009.11.30
posted by LifeisSimple
prev 1 next