블로그 이미지
LifeisSimple

calendar

1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

Notice

2010. 5. 12. 19:27 Brain Trainning/PRG Language

'Brain Trainning > PRG Language' 카테고리의 다른 글

[MSSQL] JDBC 연결  (0) 2011.02.25
[MongoDB] Comparison of data serialization formats  (0) 2011.01.18
MSSQL + PHP 연동  (0) 2010.12.21
Connection String 세팅  (0) 2010.12.03
Programming Windows Phone 7, by Charles Petzold  (0) 2010.10.29
posted by LifeisSimple
2010. 5. 6. 16:13 Brain Trainning

http://www.maxmind.com/app/geolitecountry


위의 사이트에서~ 다운받아서

'Brain Trainning' 카테고리의 다른 글

[MAC] Keynote 사용설명서....  (0) 2012.02.23
MS 워드를 활용한 블로그 글올리기  (1) 2011.01.09
가상화 관련 링크  (0) 2010.11.18
MS 2009 PDC 자료  (0) 2010.07.15
Shift-Space 로 한영전환 Reg  (0) 2010.06.07
posted by LifeisSimple
2010. 5. 4. 11:08 Brain Trainning/DataBase

Optimising Server-Side Paging - Part I

By Paul White, 2010/04/26

Total article views: 7603 | Views in the last 30 days: 7603

Introduction

It is common to need to access rows from a table or result set one page at a time, perhaps for display purposes.This short series of articles explores a number of optimisations that can be applied to the basic technique of using the ROW_NUMBER ranking function to identify the rows to return for a requested page.

This series does not include an introduction to the core method, since there are a number of good articles already on SQL Server Central which do this. An excellent explanation can be found in this article by regular columnist Jacob Sebastian.

This first article takes a close look at an optimisation that is useful when paging through a wide data set. The next parts in the series will cover optimisations for returning the total number of pages available, and implementing custom ordering.

For layout reasons, the code samples in this article are rendered as images. A full test script with additional annotations can be found in the Resources section at the end of this article.

Paging through a wide data set

A wide data set is one which has a large average row size. Paging through a set like this presents some special challenges. To illustrate, this article uses a table that might form part of a design for an on-line discussion forum (like the ones here on this site).

The rows of the example table are potentially quite wide, since the body text of a posted message can use up to 2500 Unicode characters.

Sample Data

The following code creates the test table and populates it with 10,000 random rows.

The task is to return data from this table as a paged data set. The client will supply the page number required, and the number of rows per page.

Initial solution

One solution to this problem uses a common table expression (CTE) to number the rows, and an expression in the WHERE clause to return just the rows that correspond to the single page of data required. A TOP expression is used to allow SQL Server to stop looking through the table as soon as it has found the rows needed.

Running the above code on our test data produces an execution plan like this:

This is a simple and efficient-looking plan, but scanning the clustered index of a wide table can quickly become expensive. The following graph shows the number of physical reads incurred using this method, when reading a specified page number from our test table, with 50 rows per page.

Why scanning the clustered index can be inefficient

The cost of scanning the clustered index rises quickly due to the high average size of the data rows. This might seem counter-intuitive, since the clustered index key is very narrow - just 4 bytes for the integer post_id column.

While that is certainly true, the key just defines the logical order of data pages - the leaf level of the clustered index contains the entire data row, by definition. When SQL Server scans a clustered index in an ordered fashion, it follows a linked list of page ids. Since the linked list is found at the leaf level, entires are separated by the full width of the data row. This makes the clustered index the least-dense index possible.

Alternative Methods

Using a non-clustered index

One alternative to scanning the clustered index, is to create a covering non-clustered index. In this case, however, a covering index is not practical since it would essentially duplicate the clustered index.

We might instead consider creating a non-clustered index just on the post_id column, with a plan to use that index to quickly find the page required, and then look up the rest of the row data. The index is created using this statement:

This new index is very narrow (just 4 bytes per row at the leaf, plus index overhead), which ought to make finding the requested page very much more efficient. Sadly, running the query again with a forced index produces the following execution plan:

This plan is very similar to that generated for the clustered index, except now the query performs a bookmark lookup for every row, in order to fetch the columns not included in the index. This plan is less efficient than scanning the clustered index, so it seems that this is a step backward.

The problem is that SQL Server is fetching the off-index columns for every row it examines - not just the rows that will be eventually returned. The Key Lookup operator fetches the extra columns before the Sequence Project operator assigns a row number, and before the Filter has a chance to restrict the rows based on the assigned row number.

A much better execution plan would scan the non-clustered index, assign row numbers, filter the records, and only then use a Key Lookup to fetch the off-index columns. Since we know we will only be returning a maximum of 50 rows, it seems likely that the cost of 50 lookups would be more than offset by scanning the much narrower non-clustered index to begin with.

Unfortunately, SQL Server does not yet include logic to spot this sort of optimisation, and always places a Key Lookup just after the Index Seek or Scan it is associated with.

The 'Key Seek' Method

We can work around this limitation by rewriting the query as three logically separate steps:

  1. Start a partial scan of the non-clustered index in post_id order, assigning row numbers as we go.
  2. Filter those row numbers down to just the single page of rows we want.
  3. Use the 50 primary keys from step 2 to look up the off-index columns for the final output.

The following code implements these three steps in a single statement:

The query plan for this implementation is:

Notice that the Sequence Project and Filter operators now precede the Key Lookup in the plan, meaning that a maximum of 50 lookups will be performed. Because this method performs an index seek on filtered keys, it is referred to as the Key Seek method.

Testing and Results

Test environment

The tests were run on a single-core machine running SQL Server 2008 Developer Edition, version 10.0.2757. The code has also been tested on SQL Server Developer Edition, version 9.0.4285.

System caches, including the buffer pool (data cache) were fully cleared before each run, and the SQL Server read-ahead mechanism was disabled.

Clearing the buffer pool ensures that each data page required comes from disk, and disabling read-ahead ensures repeatable results on different editions of SQL Server. Early testing found that enabling read-ahead tended to disadvantage the clustered index scanning method, since SQL Server frequently read more pages than were ultimately required, resulting in an artificially high number of reads.

The purpose of clearing the cache and disabling read-ahead is to produce clear and consistent test results. Do not run the scripts included in this article on anything other than a personal or dedicated test server.

All tests were run on the same 10,000 row data set, using 50 rows per page, and requesting page numbers 1, 10, 50, 100, and 200 (the last page).

The data concerning physical reads, logical reads, CPU time, and elapsed time were obtained from the sys.dm_exec_query_stats dynamic management view, and validated against Profiler output and from runs with SET STATISTICS IO, TIME ON. Buffer pool usage was determined from sys.dm_os_buffer_descriptors.

The full test script is included in the Resources section at the end of this article.

Physical reads

Although the query plan for the Key Seek method looks a little more complex than the clustered index scan version, it is surprisingly efficient, with consistent performance across the whole range of tested page numbers. The following graph shows the physical reads incurred by the Key Seek method, in comparison to the clustered index scan.

The Key Seek method ranges from 6 to 10 physical reads, whereas the clustered scan method uses between 5 and 694 physical reads.

Confirmation of the small number of data pages needed by the Key Seek method can be found later in this article, in the analysis of Buffer Pool memory usage. That analysis also shows that a each physical read was able to load multiple 8KB data pages.

Logical reads

Logical reads are the sum of read operations on pages already in the data cache, plus any physical reads from external storage. Operations on a cached page are very fast - many thousands of times faster than a physical read - but we must nevertheless account for them. The following graph shows a comparison of logical reads for the two methods:

Again, the Key Seek method is very consistent, though it does perform more logical reads than the clustered index scan for the first few pages. Since we already know how many physical reads were required, it is clear that very nearly all of the logical reads are reads from data cache.

CPU and total elapsed time

Total worker time is generally slightly higher for the Key Seek method, due to the extra logical operations performed. As in the other tests, the difference is most pronounced for low numbered pages.

The tests for elapsed time show the Key Seek ahead in all cases - emphasising the dominant contribution of physical I/O.

Memory Usage

This test measures the number of 8KB index and data pages brought into the buffer pool by each method.

The clustered index scan consumes very slightly less memory only for the very first page. After the first page, the Key Seek method has a smaller buffer pool requirement.

This result confirms the observations from the physical read test, showing that the Key Seek method touches a consistently small number of data pages, and that many data pages can be fetched with a single physical read.

Conclusion

The Key Seek method provides a fast and efficient way to implement server-side paging when the source row data is relatively wide. The method gives predictable and consistent performance regardless of the page number requested, while making efficient use of system resources.

Very high-volume applications that frequently access the first few pages of a result set might consider combining the two methods presented here - using a clustered index seek for the first few pages, and switching to the Key Seek method for all other requests.

A comprehensive test script containing all the code samples from this article, together with additional comments and annotations can be found below.

Resources:

Optimising Server-Side Paging Part I.sql

By Paul White, 2010/04/26

Total article views: 7603 | Views in the last 30 days: 7603 

출처> http://www.sqlservercentral.com/articles/paging/69892/
posted by LifeisSimple
2010. 5. 4. 11:07 Brain Trainning/DataBase

Server Side Paging With SQL Server 2005

By Jacob Sebastian, 2007/08/29

Total article views: 10905 | Views in the last 30 days: 551

Introduction

Most of the web developers must have come across the requirement to implement Server Side Paging of the data to increase the performance of the application. In the absence of Server Side Paging, the application will fetch all the data from the database server and load a specific number of records (depending on the current page being viewed by the user). Assume that a table has 10,000 records and the paging used by the application is 50. When the user clicks on Page 2, the application will fetch all the 10,000 records from the database server and then load records 51 to 100 to the UI control on the web page. This shows that, we are fetching a lot of records which we really do not use. By fetching only the records that we need (in the above example, records 51 from 100) from the database server, we can gain better performance at the database server level as well as at the application level.

There are quite a few articles available on Internet which address this problem from different angles. Some of the interesting articles that I could find are the following:

None of the articles I could find online was considering all the requirements that I was looking for. I wanted that the Server Side Paging code should consider the following points:

  • Select the required number of records based on the current page count and the page size. If the page size is 25 records and if we are on page 4, then we need to retrieve records from 76 to 100.
  • The sort order needs to be handled. The data that we need to retrieve for page 4 will be different when the sort order changes. For example, when the sort order is First Name a different set of records are to be returned than City.
  • Filters need to be applied in the TSQL code. Most of the times, the data is retrieved against a search operation which takes various filter values. For example, the Employee search might take filters like First Name, City or Hire Date. It could also be that, the filters are optional. None, one, many or all of the filters can be specified in the query. If a filter is provided, then the data needs to be filtered for that condition. Otherwise, that filter should be ignored.

At this point, I thought of writing my own version of the Server Side Paging TSQL code which takes care of all the points mentioned above.

Sample Code

We will use the NorthWind database for the purpose of this example. The following are the requirements that this example will fulfill.  

  1. A web page needs to be created for displaying a list of Employees
  2. User can search by First Name, Title and City
  3. User can enter None, One, Two or All of the filters
  4. We will use LIKE matching while applying the filters
  5. The page should display only 10 records at a time. Paging should be implemented for viewing other records.
  6. When a specific page number is clicked, the data of that page needs to be loaded
  7. User can sort the results by First Name, Title, City, or Hire Date
  8. After sorting the results by a column, when the user clicks on a page number, the paging should happen based on the current sort order.

Here is the stored procedure which satisfies the above requirements. [code]

    1 CREATE PROCEDURE GetEmployees(

    2     @LastName VARCHAR(20) = NULL,

    3     @Title VARCHAR(20) = NULL,

    4     @City VARCHAR(20) = NULL,

    5     @PageSize INT = 5,

    6     @PageNumber INT = 1,

    7     @SortOrder VARCHAR(20) = 'LastName'

    8 )

    9 AS

   10 

   11 SET NOCOUNT ON

   12 /*

   13     Let us use a CTE to simplify the code. The below CTE makes the code easier

   14     to read and understand.

   15 */

   16 ;WITH emp AS (

   17 SELECT

   18     /*

   19         Based on the sort order passed into the stored procedure, a Record Identifier

   20         (sequential number) is generated using the ROW_NUMBER() method. The sequential

   21         number is generated in the sorted order.

   22     */

   23     CASE

   24         WHEN @SortOrder = 'Title' THEN ROW_NUMBER()OVER (ORDER BY Title)

   25         WHEN @SortOrder = 'HireDate' THEN ROW_NUMBER()OVER (ORDER BY HireDate)

   26         WHEN @SortOrder = 'City' THEN ROW_NUMBER()OVER (ORDER BY City)

   27         -- In all other cases, assume that @SortOrder = 'LastName'

   28         ELSE ROW_NUMBER()OVER (ORDER BY LastName)

   29     END AS RecID,

   30     LastName,

   31     FirstName,   

   32     Title,

   33     HireDate,

   34     City,

   35     Country,

   36     PostalCode

   37 FROM employees

   38 WHERE

   39     /*

   40         Apply the filter. If the filter is specified, then apply the filter.

   41         If not, ignore the filter.

   42     */

   43     (@LastName IS NULL OR LastName LIKE '%' + @LastName + '%')

   44     AND

   45     (@Title IS NULL OR Title LIKE '%' + @Title + '%')

   46     AND

   47     (@City IS NULL OR City LIKE '%' + @City + '%')

   48 )

   49 

   50 /*

   51     Select the final query result.

   52 */

   53 SELECT

   54     RecID,

   55     LastName,

   56     Title,

   57     HireDate,

   58     City

   59 FROM emp

   60 /*

   61     Apply a RANGE filter on the requested SORT ORDER to retrieve the records of the

   62     current page. If the "Page Number" is 3 and "Page Size" is 30 then records 61 to

   63     90 are retrieved.

   64 */

   65 WHERE RecID BETWEEN ((@PageNumber - 1) * @PageSize) + 1 AND @PageNumber * @PageSize

   66 /*

   67     "RecID" is a value generated by the previous CTE based on the sort order specified

   68     by the @SortOrder parameter.

   69 */

   70 ORDER BY RecID

Let us execute the stored procedure. [code]

    1 -- Let us retrieve the first page sorted by "Last Name"

    2 EXECUTE GetEmployees @PageSize = 3, @PageNumber = 1, @SortOrder = 'LastName'

    3 

    4 /*

    5 OUTPUT:

    6 

    7 RecID                LastName             Title                         HireDate                City

    8 -------------------- -------------------- ------------------------------ ----------------------- ---------------

    9 1                    Buchanan             Sales Manager                 1993-10-17 00:00:00.000 London

   10 2                    Callahan             Inside Sales Coordinator       1994-03-05 00:00:00.000 Seattle

   11 3                    Davolio             Sales Representative           1992-05-01 00:00:00.000 Seattle

   12 */

   13 

   14 -- Let us retrieve the second page sorted by "Last Name"

   15 EXECUTE GetEmployees @PageSize = 3, @PageNumber = 2, @SortOrder = 'LastName'

   16 

   17 /*

   18 OUTPUT:

   19 

   20 RecID                LastName             Title                         HireDate                City

   21 -------------------- -------------------- ------------------------------ ----------------------- ---------------

   22 4                    Dodsworth            Sales Representative           1994-11-15 00:00:00.000 London

   23 5                    Fuller               Vice President, Sales         1992-08-14 00:00:00.000 Tacoma

   24 6                    King                 Sales Representative           1994-01-02 00:00:00.000 London

   25 */

   26 

   27 -- Let us retrieve the third page sorted by "City"

   28 EXECUTE GetEmployees @PageSize = 3, @PageNumber = 3, @SortOrder = 'City'

   29 

   30 /*

   31 OUTPUT:

   32 

   33 RecID                LastName             Title                         HireDate                City

   34 -------------------- -------------------- ------------------------------ ----------------------- ---------------

   35 7                    Davolio             Sales Representative           1992-05-01 00:00:00.000 Seattle

   36 8                    Callahan             Inside Sales Coordinator       1994-03-05 00:00:00.000 Seattle

   37 9                    Fuller               Vice President, Sales         1992-08-14 00:00:00.000 Tacoma

   38 */

Conclusions

There are different ways to implement the above functionality. The above code can be re-written in different ways. For example, the ORDER BY clause can take an expression which uses a CASE statement with ROW_NUMBER().  


<출처> http://www.sqlservercentral.com/articles/Advanced+Querying/3181/

posted by LifeisSimple
2010. 5. 3. 21:43 Brain Trainning/DataBase
posted by LifeisSimple
2010. 4. 28. 15:12 Brain Trainning/DataBase

Consolidating SQL Server Error Logs from Multiple Instances Using SSIS
14 April 2010

SQL Server hides a lot of very useful information in its error log files. Unfortunately, the process of hunting through all these logs, file-by-file, server-by-server, can cause a problem. Rodney Landrum offers a solution which will allow you to pull error log records from multiple servers into a central database, for analysis and reporting with T-SQL.

QL Server hides a wealth of useful information inside its error log files. Aside from containing the details of specific errors that have occurred, as it name implies it must, there are details of successful and unsuccessful login attempts, the output of DBCC commands, and more. You can even raise your own errors or events to the log, for example to indicate when an 'index rebuild' process is complete.

Unfortunately, I use the term "hides" advisedly. Although it puts a lot of useful information in there, getting it out again in a useful, easy-to-digest format is often a long and painful trial, unless of course you're the sort of DBA who cherishes nothing more in life than a few free hours each morning to scroll through hundreds of thousands of SQL Server error log entries, server-by-server, using the Log Viewer application, raptly focused on each and every interesting message.

What would be useful to all other DBAs, I hope, is the means to consolidate the error log entries for all your servers into a central location, and to report on them for there, quickly firing off SQL queries that could tell you instantly, for example:

  • "When was the last time X logged in or accessed the server?"
  • "Have there been any prolific login failure attempts by unknown accounts?"
  • "Have any of these specific errors occurred on any of my servers in the last x days"

The solution I offer here uses an SSIS package to pull error log data from the current error log on any number of remote servers, and load the data into a staging table in a central database. From here, any new log records are pushed into a permanent table using the new SQL 2008 MERGE command. From here the log records can be diced and sliced with T-SQL, to your heart's content.

Shortcomings of the Standard Error Log Reader

Before we dive into our custom "error log consolidation" solution, let's take a brief look at the SQL Server Log Viewer application for SQL Server 2005 and beyond. Figure 1 shows the Log Viewer window, with an applied filter, which searches for messages containing the term "failed", across multiple error logs, on a server instance namedSRVSAT (one that is not much-used).

Figure 1 – Standard SQL Server 2005 Log Viewer

Notice that, while I'm specifically search the SQL Server error logs here, I can also choose to view the SQL Server AgentDatabase Mail and Windows NT logs. There is also a search function that lets us find specific log entries based on a keyword.

In my filtered search of the SQL Server logs, you can see that there are two entries on this particular server that contain the word "failed"; one is a login failure for sa and the other is a failure in opening a database file. The filter capability is limited, however, in the sense that a filtered search for "Failed" or "Login Failed" will return login failure results. However, a filtered search on "Failed Logins" you will return no results.

Overall, the Log Viewer is a reasonable ad-hoc tool for tracking down errors on a given server. However, the fact that the Log viewer can only be directed at one server at a time makes it a rater tedious means of tracking down specific errors across multiple servers. Also, its filter and searching capabilities certainly do not match what can be achieved with the power of T-SQL.

Custom Error Log Consolidation

As noted in the Introduction, this solution is built on SSIS. I first began using SSIS back in 2007, when I developed the "DBA Repository" solution, described in full detail in my SQL Server Tacklebox book. It uses an SSIS package to execute a range of useful scripts across all servers under my care, and pull the results back into a central SQL Server database, for reporting and analysis. A fringe benefit is that it's been relatively easy to update and adapt this original solution to accommodate new requirements, such as the error log consolidation I describe here.

My central database, DBA_Rep, is housed on a SQL Server 2008 instance and contains all manner of tables in which to store various bits of useful SQL Server information. In order to support my error log consolidation, I simply needed to add two new tables to this database:

  • StageErrorLog – a temporary holding table for the error log data read from each server.
  • SQL_ErrorLog – the permanent storage table for the error log data

The two tables' schemas are identical except that the permanent table, SQL_ErrorLog, contains an identity column called LogID. The code to create these tables is included as part of the code download bundle for this article, along with the package code, sample queries and so on.

As I describe the error log solution, I'll focus on those aspect most closely related to gathering and analyzing the log data, rather than on the specific SSIS techniques use, for example, to assign object and string variables in theForEachLoop Containers within the packages. The latter is covered in detail in one of my previous Simple-Talkarticles.

The SSIS Package

The SSIS package that in a single run polls, consolidates and merges the error logs from any number of servers is really quite simple. A single screen shot, shown in Figure 2, captures the entire package, comprising just five steps.

Figure 2 – Error Log Reader SSIS Package

Let's examine each step in turn, with most emphasis on the Error Log Reader for-each loop, and the MERGE step.

Get the Server Names into an Object Variable

The first task "Populate ADO Variable For SQL Server Names" populates a SQL_RS object variable with the list of the servers from which we will be extracting error log information. In a later step, our package will iterate through this list and populate a string variable, SRV_Conn. This, in turn, is used to dynamically update the ServerNameproperty of the MultiServer Connection Manager object, which controls the the connection to each SQL Server instance we need to interrogate. There is also a second Connection Manager object, DBA_Rep, which is used to connect to the central repository. This whole step is described in fuller detail in my previous article, Using SSIS to monitor SQL Server Databases.

Figure 3 – Connection Manager Objects

Truncate the Staging table

The second step is straightforward and simply issues as single T-SQL command, TRUNCATE TABLE StageErrorLog, to empty the contents of the StageErrorLog temporary table, which temporarily stores error log data for each server, as the package executes.

The Error Log Reader ForEachLoop Container

The "Error Log Reader" ForEachLoop container forms the heart of the package, and performs two important functions:

  1. It sets the connection string for each server using the two variables SQL_RS and SRV_Conn
  2. It executes the objects contained within, namely "Create Temp Table for Error Logs" and "Get Error Logs From All Servers".

Figure 4 shows the SSIS Task Editor for the Create Temp Table for Error Logs task. It uses the MultiServerConnection Manager to create a temporary table on the remote server, in which to store the error log data during package execution.

Figure 4 – Create Temp Table For Error Logs Task Editor

The T-SQL statement that is executed within this task is shown in Listing 1. First, it drops (if it exists) and creates on the remote server, in TempDB, a temp table, wErrorLog and then populates the table with the output of thesp_readerrorlog system stored procedure.

The sp_readerrorlog takes a parameter value, in this case "0", which instructs it to read only the current log file. While it is possible to read historic logs, I decided that only reading the current log would improve performance. I run the task, daily to minimize the risk of losing log data, and run an additional step once a month that interrogates all of the available logs to catch anything that may have been missed. The three actions that I know will initiate a new error log on SQL Server are running sp_cycle_erroglog, running DBCC ERRORLOG and restarting SQL Server.

Enumerating through multiple log files
If you choose to interrogate all the log files, rather than just the current one, a script that may prove useful is included in my DBA Script Thumb.

The final step in the Read Error Logs script alters the temp table wErrorLog to add a Server column and then updates the records with the ServerProperty(ServerName) function. This step is critical as it allows us to distinguish from which server the error logs came, once the records are all merged into the final destination table (a step I will explain in short order).

--Read Error Logs

 

DECLARE @TSQL NVARCHAR(2000)

IF EXISTS ( SELECT  *

            FROM    tempdb.dbo.sysobjects

            WHERE   id = OBJECT_ID(N'[tempdb].[dbo].[wErrorLog]') )

    DROP TABLE [tempdb].[dbo].[wErrorLog] ;

 

CREATE TABLE [tempdb].[dbo].[wErrorLog]

    (

      LogDate DATETIME ,

      ProcessInfo NVARCHAR(50) ,

      [Text] NVARCHAR(MAX)

    )

 

INSERT  INTO [tempdb].[dbo].[wErrorLog]

        EXEC sp_readerrorlog 0

 

ALTER TABLE [tempdb].[dbo].[wErrorLog]

ADD Server VARCHAR(100) NULL ;

GO

 

UPDATE  [tempdb].[dbo].[wErrorLog]

SET     Server = CONVERT(VARCHAR(100), SERVERPROPERTY('Servername'))

 

SELECT  *

FROM    [tempdb].[dbo].[wErrorLog]

Listing 1 – The Read Error Logs script, executed by the Create Temp Table for Error Logs task

The output of the sp_readerrorlog stored procedure, executed on a SQL Server 2005 instance, can be seen in Figure 5. Notice that three columns are returned, LogDateProcessInfo and Text.

Figure 5 – Output of sp_readerrorlog

When the package executes it will create and populate the wErrorLog temp table on each of the remote servers. The next step is to pull the data from each of the wErrorLog temp tables into the staging table, StageErrorLog, in the DBA_Rep central repository.

So why do we need temp tables on each remote server, as well as a staging table in the central database? The main reason is to provide the next task in this container, the Get Error Logs from All Servers data flow task, a valid source table on the remote server from which to select. This task pulls all of the error log data from the remote servers into a single staging table in my central SQL Server 2008 database, DBA_REP. From there, in the final step in the package (discussed shortly) I can use the MERGE command to move any new error log records into theSQL_ErrorLog table. Some of the remotes servers were running SQL 2005 instances, so in fact it was not possible to run the MERGE command directly against these SQL Server 2005 sources, as it would fail syntactically.

Figure 6 shows the Get Error Logs From All Servers data flow task with the Server Source (wErrorLog) and the Repository Destination (StageErrorLog).

Figure 6 – Data Flow Task for Error Log

Merging the Error Log Data into the Central Database

After all of the remote servers bear their error log fruit into the StageErrorLog basket all that remains is to merge that data into its final destination, SQL_ErrorLog, which can then be used for analysis via standard T-SQL queries.

I chose to use the MERGE statement for a number of reasons. I felt that the typical "truncate and refresh solution" is inefficient for large data loads. I could have built an equivalent "UPSERT" solution, without MERGE, by using IF EXISTS constructs to control INSERTs and UPDATEs, as well as DELETEs, but MERGE meant I could avoid all of that work. Finally, I just wanted a good excuse to experiment with a new command, and this was a perfect project for that!

I have not found an equivalent of MERGE in the SSIS Toolbox. There are some third party tools that will provide this functionality, at a cost, but instead I decided simply to execute the MERGE command within an Execute SQL task.

The MERGE command is shown in Listing 2. It looks at matched and unmatched (NOT MATCHED) records in the target table, StageErrorLog, based on all four field values (LogDateProcessInfoText and Server) and INSERTs new records into SQL_ErrorLog. I know I will never DELETE or UPDATE the records, so new INSERTs are all I am concerned about.

MERGE SQL_ErrorLog AS Target

    USING StageErrorLog AS Source

    ON ( Source.LogDate = Target.LogDate

         AND Source.ProcessInfo = Target.ProcessInfo

         AND Source.Text = Target.Text

         AND Source.Server = Target.Server

       )

    WHEN NOT MATCHED BY TARGET

        THEN

INSERT  (

          LogDate ,

          ProcessInfo ,

          Text ,

          Server

        ) VALUES

        ( Source.LogDate ,

          Source.Processinfo ,

          Source.Text ,

          Source.Server

        ) ;

Listing 2 – merging new log records into the SQL_ErrorLog table

Executing and Testing the Package

So that is it, five fairly simple steps. Now all we have to do is execute the package to start consolidating logs for all of the SQL Servers. I would imagine that the list of servers would be limited to critical production environments, where logs must, by mandate, be stored for audit purposes and for problem/resolution analysis. Figure 7 shows the package executing, with each task turning green as it completes successfully.

Figure 7 – Executing the Error Log Reader Package

It is quite easy to determine if the package is working as expected. If you have Audit Logins enabled, you can simply, on a database in your test environment, deliberately generate failed login attempts. You can then rerun the package and make sure that only those new records were inserted in the SQL_ErrorLog table. In my test environment, I have two servers. On the last execution of the package, I had 951 records in the SQL_ErrorLog table, 778 for Server1 and 173 for Server2, which the following query showed.

SELECT  COUNT(*) ,

        Server

FROM    SQL_ErrorLog

GROUP BY Server

After a couple of failed login attempts, using SQL authenticated users "sa" and "bad user", I reran the package and the row counts increased as expected by only a handful of records per server making the new totals 779 and 178 for Server1 and Server2 respectively. I verified that the failed logins were in the SQL_ErrorLog table by executing the following query:

SELECT  Text ,

        COUNT(Text) Number_Of_Attempts

FROM    SQL_ErrorLog

WHERE   Text LIKE '%failed%'

        AND ProcessInfo = 'LOGON'

GROUP BY Text

As Figure 8 shows, "bad user" tried to login and I captured it via the package execution. The important part is that only the news records were inserted, so the MERGE statement worked as anticipated.

Figure 8 – Failed Login Attempts From SQL_ErrorLog table.

Querying the Error Log Data

With the records loaded and the package tested, it is time to get creative with the T-SQL analysis of the data. If you know the specific errors (or log entries I should say) for which you are looking, such as DBCC or memory or deadlocks, then you can simply use a LIKE or EXISTS statement as in the following two queries that scan for "CPU" and "MEMORY". Of course you can do this with one query, but I broke them out for emphasis.

SELECT  LogDate ,

        Text

FROM    SQL_ErrorLog

WHERE   Text LIKE '%memory%'

 

SELECT  LogDate ,

        Text

FROM    SQL_ErrorLog

WHERE   Text LIKE '%CPU%'

The results are already proving valuable as you can see from the highlighted rows regarding memory, In Figure 9. Apparently I am not using all the memory that I could do, and SQL Server 2008 is helping me by telling me what to do to increase the amount of memory SQL Server can utilize. Thanks, Error Log.

Figure 9 – Memory- and CPU-related data in SQL_ErrorLog

Of course, as DBAs, I will leave it to you to be to be creative in the development of useful queries for your environment. Heck, you can even add a Full Text index if you like. In fact, if you notice in the code download, theSQL_ErrorLog DDL adds an identity column called LogID to facilitate the creation of such a Full Text index. That process is an article in itself and one which I will tackle next.

Conclusion (with a few small caveats)

I do not like putting the caveat section so close the end of the article but you may have guessed by now that there are a few and I will just note them really quickly and move on to the conclusion before you have a chance to utter "Ah Shucks". First, this solution only pulls error logs from SQL Server 2005 and higher, due to the different way thesp_readerrorlog data is returned in earlier versions. Second, the solution does not pull SQL Agent logs or Mail Logs, like the Log Viewer application does. Finally, the package and MERGE statement therein requires SQL Server 2008 to house the DBA_Rep database. Not too horrible after all, I hope. If I were to have said it only works on SQL Server 2000 that might have been a problem.

I hope this error log consolidation solution, based on SSIS, and making use of the new SQL 2008 MERGEcommand, proves useful to many DBAs. If it also saves you lots of time and money then great; I did my small part for the economy.



This article has been viewed 2304 times.

'Brain Trainning > DataBase' 카테고리의 다른 글

Server Side Paging with SQL Server 2005 (펌)  (0) 2010.05.04
System Benchmark Tool  (0) 2010.05.03
Managing Max Degree of Parallelism (펌)  (0) 2010.04.28
페이징쿼리 속도 비교 (펌글)  (0) 2010.04.19
Not In Query  (0) 2010.04.15
posted by LifeisSimple
2010. 4. 28. 14:53 Brain Trainning/DataBase

Managing Max Degree of Parallelism

By Herve Roggero, 2003/06/23

Total article views: 24782 | Views in the last 30 days: 229

Introduction

In situations where your tuned T-SQL statements are pushing the limits of your CPUs, more processing power may be needed. Deploying database servers on two, four or even eight SMP systems is rather straightforward. SQL Server usually scales almost in a linear fashion on up to eight processors. 

However, some SQL Server installations may require up to 32 processors. In this kind of environment, configuration parameters that are usually ignored in smaller configurations come into play and can offer significant performance improvements. We will take a look at the Maximum Degree of Parallelism (DOP) and see how and why it may make sense to change its default setting. 

Parallel Queries Performance Limitations

When adding processors to SQL Server, the database engine will evaluate how to best leverage the available processors through internal algorithms. In essence, when receiving a SQL statement to process, SQL Server determines which processors are available, what the overall cost of the query is and executes the query on as many processors as necessary if the cost of the query reaches a configurable threshold. When 4 processors are available on a server, the likelihood of SQL Server using all processors for a complex SELECT statement is pretty high. 

The same holds true in larger environments. For instance on 16 processors, SQL Server will frequently use 12 or more processors to execute complex SELECT statements. This may turn out to be an issue for a couple of reasons. First, using more processors means managing more threads and requires more cache synchronization. System -> Context Switches/Sec is a measure of this effort. The more processors are used for a process, the higher this counter will be. In addition, SQL Server has more coordination to perform since it needs to slice and regroup the work spread over the processors. Since by default SQL Server will use as many processors as it can, upgrading your SQL Server from 8 to 12 processors may actually degrade the overall performance of your database. Although there are no golden rules, it appears that in most cases using more than 8 processors for a SELECT statement can degrade performance (although this may vary greatly by system).

Enforcing a Maximum DOP

The DOP can be set in two ways. The first way is to include the OPTION (MAXDOP n) keyword in your T-SQL statement. For example, the following query will execute with a maximum of 4 processors, regardless of how many processors have been allocated to SQL Server:

SELECT * FROM master..sysprocesses OPTION (MAXDOP4)
The other approach is to set the maximum DOP at the database instance level, hence limiting the maximum number of CPUs to be used for any given query. To set this option at the system level, run the following command from Query Analyzer:
EXEC sp_configure 'show advanced option', '1'
RECONFIGURE
GO
sp_configure 'max degree of parallelism', 0
RECONFIGURE
GO

Note that this can be set differently for each instance of SQL Server. So if you have multiple SQL Server instances in the same server, it is possible to specify a different Maximum DOP value for each one.

On large SMP systems, setting the maximum DOP to 4 or 8 is not unusual. The default value for this parameter is 0, which allows SQL Server to use all allocated processors. The following test shows the Context Switches/Sec and average response time of a T-SQL statement running off a few million records. The server utilized for this test was loaded with the /PAE boot.ini option, 16 processors and 8GB of RAM. The statement is as follows (the statement itself is of little importance, but notice the OPTION keyword):

Select (UnitPrice - UnitCost) * TotalUnitsSold
        FROM Salesdb..salesdata (NOLOCK)
        WHERE
        SalesYear = 2000
        GROUP BY UPC
        ORDER BY 1
        OPTION (MAXDOP 2)

This statement was loaded 500 times in a table in a format that Profiler could understand. Then four Profilers were loaded on that same server, each running the content of the same table. So SQL Server was receiving four select statements at once. Note the (NOLOCK) hint that forces SQL Server to read the data without generating any locks.

The results are as follows:

DOP Context Switches/Sec Avg Execution Time
2 4280 12
4 5700 7.5
8 10,100 6
12 11,200 8.5
16 13000 9

As more processors are added to the query (by using the MAXDOP option), the Context Switches/Sec increases up to 13,000, which is expected behavior. This is really a low number, considering that we are only executing 4 statements at any single point in time.

This graph shows that starting at 12 processors, the execution time degrades. Although it takes 12 seconds to execute this statement on 2 processors, it takes about 6 seconds on eight CPUs. However, we see that setting the DOP to 12 or 16 degrades the overall performance of our query when compared to a DOP of 8.

Leaving the default Maximum Degree of Parallelism value of 0 would yield the same result as the DOP of 16 in our test. Hence, changing the DOP to 8 in our scenario would provide a 30%  performance improvement over a DOP of 0 (or 16).

Enforcing a system-wide Maximum DOP is a good practice since this allows you to control the maximum number of processors SQL Server will use at any given time, regardless of the statement, as long as the MAXDOP is not used in the query (which would override the global Maximum DOP setting).

Conclusion

SQL Server has many parameters that give you more control on the performance of your databases. Understanding how SQL Server behaves on servers with 8 processors or less gives a strong understanding of the capabilities of SQL Server. However, SQL Server offers specific configuration parameters that may give you extra performance on larger systems.

The Maximum Degree of Parallelism is a key parameter for environments with 8 or more processors, and allows you to gain control on the maximum number of processors used for a query. When deciding which DOP you should use, careful evaluation of your environment is needed. Certain queries may perform better with a DOP of 4, or even 1. Testing your environment with multiple DOPs should give you the answer. In cases where your database environment functions in OLTP and OLAP mode (for live reporting), you may consider setting a default DOP for SQL Server that works best for your OLTP system and use the OPTION keyword for your OLAP T-SQL to use the DOP that works best for these queries.

Finally, SELECT statements are not the only types of statements that can take advantage of the DOP, specially if your action queries use correlated queries (in which a SELECT statement is found inside an UPDATE statement for example). The Maximum DOP is an advanced setting, and as such it is wise to test it thoroughly before making a decision in your production environment.

By Herve Roggero, 2003/06/23

Total article views: 24782 | Views in the last 30 days: 229


'Brain Trainning > DataBase' 카테고리의 다른 글

System Benchmark Tool  (0) 2010.05.03
SQL Server Error Logs from Multiple Instances Using SSIS (펌)  (0) 2010.04.28
페이징쿼리 속도 비교 (펌글)  (0) 2010.04.19
Not In Query  (0) 2010.04.15
Job List 적용  (0) 2010.04.14
posted by LifeisSimple
2010. 4. 19. 16:29 Brain Trainning/DataBase
페이징 쿼리 종류별 성능 비교 자료

페이징 쿼리 종류별로 성능을 비교한 좋은 자료가 있어서 상당부분 인용했습니다.

"ex)"에 들어가는 샘플 쿼리는 바투 락커룸 DB에 파일첨부(TB_AttachFile) 테이블을 대상으로 

페이징 쿼리를 작성해 봤습니다.


5번에 표시된 내용이 데이터 건수에 상관없이 실질적으로 가장 빠르지만 우리 시스템에 적용할 수 있는지는

의문이 갑니다.

테이블 별로 인덱스 생성이 필요할 수 있고 중간에 데이터가 삭제되거나 어떤 반응이 생길 때 문제가 될 

소지가 있는지도 조사해야 할 것 같습니다.

참고로 웹젠 빌링에서 사용하던 페이징 쿼리는 4번을 이용했었습니다.

그리고 2번에 샘플 쿼리를 보면 TB_AttachFile 테이블에 FileGuid가 None Clustered Index로 걸려 있어서 

서브쿼리 내에서도 ORDER BY FileGuid DESC를 해줘야합니다.(즉 ORDER BY 를 2번 해야한다는..)

대부분의 기본키들이 None Clustered Index 라 이 점에 대한 이슈도 있습니다.



1. SELECT TOP [불러올 총 게시물수] [출력 필드명] FROM [테이블 명]  

  레코드셋의 AbsolutePage를 이용해서 페이징을 했습니다.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 



2. SELECT TOP [불러올 총 게시물수] [출력 필드명] FROM [테이블 명]
   WHERE [글번호필드] NOT IN (SELECT TOP [제거할 게시물수] [글번호필드] FROM [테이블명])

   예전에 태요 사이트에서 보았던 쿼리 구문입니다.. NOT IN 때문에.. 문제시 되었던 쿼리구문이죠.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 
WHERE FileGuid NOT IN 
(
 SELECT TOP 0 FileGuid 
 FROM TB_AttachFile
 ORDER BY FileGuid DESC

ORDER BY FileGuid DESC



3. SELECT TOP [불러올 총 게시물수] [출력 필드명] FROM [테이블 명]
   WHERE [글번호] IN (SELECT TOP [페이지출력 갯수] [글번호] FROM
   (SELECT TOP [불러올 총 게시물수] [글번호] FROM [테이블 명]) AS A ORDER BY [글번호])
   ORDER BY [글번호] DESC

   이 쿼리 구문은 2번의 쿼리 구문의 문제점을 보완한 구문입니다. NOT IN 대신에 IN을 사용 했습니다.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 
WHERE FileGuid IN 
(
 SELECT TOP 123 FileGuid -- 총 데이터수 - ( (페이지수 - 1) * 10) // 1페이지 : 123 - 0, 2페이지 : 123 - 10 ...
 FROM 
 (
  SELECT FileGuid 
  FROM TB_AttachFile 
 )AS A 
 ORDER BY FileGuid

ORDER BY FileGuid DESC



4. SELECT TOP [페이지 출력갯수] [출력 필드명] FROM [테이블 명]
   WHERE [글번호] <= (SELECT MIN([글번호])
   FROM (SELECT TOP [제거할 게시물수] + 1 [글번호] FROM [테이블명]) AS A)

   4번째 쿼리 구문은 IN, NOT IN이 아닌 출력할 마지막 글번호 바로 앞이 글번호를 찾아서 처리를 해주는
   쿼리 구문입니다.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 
WHERE FileGuid <=
(
 SELECT MIN(FileGuid)
 FROM 
 (
  SELECT TOP 11 FileGuid -- (현재페이지수-1) * 10 + 1  
  FROM TB_AttachFile
  ORDER BY FileGuid DESC
 )AS A 

ORDER BY FileGuid DESC



5. SELECT TOP [페이지 출력갯수] [출력 필드명] FROM [테이블명]
   WHERE [글번호] <= (SELECT MIN([글번호])
   FROM (SELECT TOP [제거할 게시물수] + 1 [글번호]
   FROM [테이블명] WHERE [인덱스] = [시작인덱스번호] AND [인덱스] = [끝인덱스번호]) AS A
   WHERE [인덱스] = [시작인덱스번호] AND [인덱스] = [끝인덱스번호])
   AND [인덱스] = [시작인덱스번호] AND [인덱스] = [끝인덱스번호]

   5번재 쿼리는.. 글에 인덱스(가칭)라는 필드를 하나 더 추가 해서.. 글 기본 2000개마다 (가변적입니다)
   인덱스를 증가 시켰습니다. 즉 2000개를 하나의 묶음으로 만든것입니다.
   그 인덱스를 기준으로 처리를 해주었습니다.

ex)

SELECT TOP 10 FileGuid, FileName, Capacity, RegDate 
FROM TB_AttachFile 
WHERE FileGuid <=
(
 SELECT MIN(FileGuid)
 FROM 
 (
  SELECT TOP 11 FileGuid -- (현재페이지수-1) * 10 + 1  
  FROM TB_AttachFile
  WHERE IDX_FileGuid > 350  -- 350은 이전 페이지의 끝 게시물 번호. (350보다 큰 10개를 얻음.)
  ORDER BY FileGuid DESC
 )AS A 

ORDER BY FileGuid DESC



결과.

게시물은 100만개를 넣고 테스트를 했습니다
서버정보 : CPU : p4-1.8, RAM : 768Mb, 컴팩 프리자리오 2820AP, 환경 : 윈도우2003 MSSQL2000
처음페이지(1), 마지막 페이지(50000) 처리 시간이 아래와 같습니다.(단위 ms)

 첫페이지 실행    끝페이지 실행
1 :  273                 11476.56
2 :  289                 4406.25
3 :  289                 2695.31
4 :  289                 1218.75
5 :  7.81                23.44

<출처> http://tit99hds.egloos.com/888201

'Brain Trainning > DataBase' 카테고리의 다른 글

SQL Server Error Logs from Multiple Instances Using SSIS (펌)  (0) 2010.04.28
Managing Max Degree of Parallelism (펌)  (0) 2010.04.28
Not In Query  (0) 2010.04.15
Job List 적용  (0) 2010.04.14
Partition Table 의 구현  (0) 2009.11.30
posted by LifeisSimple
2010. 4. 15. 12:14 Brain Trainning/DataBase
상황에 따라 다를 수 있지만 3번이 가장 효율적이네요... 
혹시나 하고 테스트는 못해봤는데 음... 이제 되도록 Not In은 사용하지 말아야겠습니다. 후후후


r.hensbergen (1/29/2009)
souLTower (1/29/2009)
Aren't left joins and NOT EXIST statements more efficient than NOT IN statements? I don't know if I made that up or read it somewhere. I usually use a query like this:

I believe that this can cause the optimizer to search for a match for each record in myTable
-- option 1 NOT IN
select foo, bar, baz from myTable where baz not in
 
(select baz from myOtherTable)

-- Option 2 NOT EXISTS
select foo, bar, baz from myTable A where not exists
 
(select baz from myOtherTable B where A.baz = B.baz)

-- Option 3 LEFT JOIN
-- I prefer this one
select foo, bar, baz from myTable A left join myOtherTable B
 
ON A.baz = B.baz where B.baz IS NULL

Definitely option 3! Both 1 and 2 the inner query relates to the outer query, which means usually bad performance but at least less readability.

'Brain Trainning > DataBase' 카테고리의 다른 글

Managing Max Degree of Parallelism (펌)  (0) 2010.04.28
페이징쿼리 속도 비교 (펌글)  (0) 2010.04.19
Job List 적용  (0) 2010.04.14
Partition Table 의 구현  (0) 2009.11.30
인덱스~  (0) 2009.11.24
posted by LifeisSimple
2010. 4. 14. 21:40 Brain Trainning/DataBase
몇가지 작업으로 Job을 이관할 수 있습니다.

일단. Job과 관련된 DB를 모두 복원을 우선 합니다.
Job관련 계정을 생성해주고...


백업한 원서버의 MSDB를 새로운 서버에 복구합니다.

그리고,
update dbo.sysjobsteps
 set database_name = 'senseSrvDB'
where database_name = 'SenseSrvDB'

여기에서 실행되는 DB를 Update 합니다. 혹시 동일한 이름일 경우는 Update가 필요없습니다.

실행아이디는 어떻게 하는지 아직... 모르겠네요 흠...

'Brain Trainning > DataBase' 카테고리의 다른 글

페이징쿼리 속도 비교 (펌글)  (0) 2010.04.19
Not In Query  (0) 2010.04.15
Partition Table 의 구현  (0) 2009.11.30
인덱스~  (0) 2009.11.24
for xml path  (0) 2009.11.22
posted by LifeisSimple