Top 10 SQL Server 2008 Features for ISV
Applications
Author: Burzin Patel
Contributor: Kun Cheng
Reviewers: Sanjay Mishra, Denny Lee, Mike
Ruthruff, Sharon Bjeletich, Mark Souza, Peter Carlin, Hermann Daeubler, Peter
Scharlock, Wanda He
Microsoft® SQL Server® 2008 has hundreds
of new and improved features, many of which are specifically designed for large
scale independent software vendor (ISV) applications, which need to leverage
the power of the underlying database while keeping their code database
agnostic. This article presents details of the top 10 features that we believe
are most applicable to such applications based on our work with strategic ISV
partners. Along with the description of each feature, the main pain-points the
feature helps resolve and some of the important limitations that need to be
considered are also presented. The features are grouped into two categories:
ones that do not require any application change (features 1-8) and those that
require some application code change (features 9-10). The features are not
prioritized in any particular order.
Data Compression
The disk I/O subsystem is the most common bottleneck for many database
implementations. More disks are needed to reduce the read/write latencies; but
this is expensive, especially on high-performing storage systems. At the same
time, the need for storage space continues to increase due to rapid growth of
the data, and so does the cost of managing databases (backup, restore,
transfer, etc.).
Data compression introduced in SQL Server 2008 provides a resolution to
address all these problems. Using this feature one can selectively compress any
table, table partition, or index, resulting in a smaller on-disk footprint,
smaller memory working-set size, and reduced I/O. Configurations that are
bottlenecked on I/O may also see an increase in performance. In our lab test,
enabling data compression for some ISV applications resulted in a 50-80% saving
in disk space.
SQL Server supports two types of compressions: ROW
compression, which compresses the individual
columns of a table, and PAGE
compression which compresses data pages using
row, prefix, and dictionary compression. The compression results are highly
dependent on the data types and data contained in the database; however, in
general we’ve observed that using ROW compression results in lower overhead on
the application throughput but saves less space. PAGE compression, on the other
hand, has a higher impact on application throughput and processor utilization,
but it results in much larger space savings. PAGE compression is a superset of
ROW compression, implying that an object or partition of an object that is
compressed using PAGE compression also has ROW compression applied to it.
Compressed pages remain compressed in memory until rows/columns on the pages
are accessed.
Both ROW and PAGE compression can be applied to a table or index in an
online mode that is without any interruption to the application availability.
However, partitions of a partitioned table cannot be compressed or uncompressed
online. In our testing we found that using a hybrid approach where only the
largest few tables were compressed resulted in the best overall performance,
saving significant disk space while having a minimal negative impact on
performance. We also found that compressing the smallest objects first
minimized the need for additional disk space during the compression process.
To determine how compressing an object will affect its size you can use
thesp_estimate_data_compression_savings system stored procedure.
Database compression is only supported in SQL Server 2008 Enterprise and
Developer editions. It is fully controlled at the database level and does not
require any application change.
Backup Compression
The amount of data stored in databases has grown significantly in the last
decade. resulting in larger database sizes. At the same time the demands for
applications to be available 24x7 have forced the backup time-windows to
shrink. In order to speed up the backup procedure, database backups are usually
first streamed to fast disk-based storage and moved out to slower media later.
Keeping such large disk-based backups online is expensive, and moving them
around is time consuming.
With SQL Server 2008 backup compression, the backup file is
compressed as it is written out, thereby requiring less storage, less disk I/O,
and less time, and utilizing less network bandwidth for backups that are
written out to a remote server. However, the additional processing results in
higher processor utilization. In a lab test conducted with an ISV workload we
observed a 40% reduction in the backup file size and a 43% reduction in the
backup time.The compression is achieved by specifying the WITH COMPRESSION
clause in the backup command (for more information, see SQL Server Books Online). To prevent having to modify all the
existing backup scripts, there is also a global setting (using the Database
Settings page of the Server Properties dialog box) to
enable compression of all backups taken on that server instance by default;
this eliminates the need to modify existing backup scripts. While the
compression option on the backup command needs to be explicitly specified, the
restore command automatically detects that a backup is compressed and
decompresses it during the restore operation. Overall, backup compression is a
very useful feature that does not require any change to the ISV application.
For more information about tuning backup compression, see the technical note on Tuning the Performance of
Backup Compression in SQL Server 2008.
Note: Creating compressed backups is only supported in SQL Server 2008
Enterprise and Developer editions; however, every SQL Server 2008 edition can
restore a compressed backup.
Transparent Data Encryption
In most cases, organizations do well at securing their active data via the
use of firewalls, physical security, and tightly controlled access policies.
However, when the physical media such as the backup tape or disk on which the
data resides is compromised, the above security measures are of no use, since a
rogue user can simply restore the database and get full access to the data.
SQL Server 2008 offers a solution to this problem by way of
Transparent Data Encryption (TDE). TDE performs real-time I/O encryption and
decryption of the data and log files using a database encryption key (DEK). The
DEK is a symmetric key secured by using a certificate stored in the master database
of the server, or an asymmetric key protected by an Extensible Key Management
(EKM) module.TDE is designed to protect data ‘at rest’; this means that the
data stored in the .mdf, .ndf, and .ldf files cannot be viewed using a hex
editor or some other such means. However, data that is not at rest, such as the
results of a select statement in SQL Server Management Studio, continues to be
visible to users who have rights to view the table. TDE should not be confused
with column-level encryption, which is a separate feature that allows
encryption of data even when it is not at rest.Encrypting a database is a
one-time process that can be initiated via a Transact-SQL command and is
executed as a background thread. You can monitor the encryption/decryption status
using thesys.dm_database_encryption_keys dynamic management view (DMV).
In a lab test we conducted we were able to encrypt a 100-gigabyte (GB)
database using the AES_128 encryption algorithm in about one hour. While the
overheads of using TDE are largely dictated by the application workload, in
some of the testing we conducted, the overhead was measured to be less than 5%.
One point worth mentioning is when backup compression is used to compress
an encrypted database, the size of the compressed backup is larger than if the
database were not encrypted; this is because encrypted data does not compress
well.
TDE enables organizations to meet the demands of regulatory compliance and
overall concern for data privacy.
TDE is only supported in the SQL Server 2008 Enterprise and Developer
editions, and it can be enabled without changing an existing application.
Data
Collector and Management Data Warehouse
Performance tuning and troubleshooting is a time-consuming task that
usually requires deep SQL Server skills and an understanding of the database
internals. Windows® System monitor (Perfmon), SQL Server Profiler, and dynamic
management views helped with some of this, but they were often too intrusive or
laborious to use, or the data was too difficult to interpret.
To provide actionable performance insights, SQL Server 2008 delivers
a fully extensible performance data collection and warehouse tool also known as
the Data Collector. The tool includes several out-of-the-box data collection
agents, a centralized data repository for storing performance data called
management data warehouse (MDW), and several precanned reports to present the
captured data. The Data Collector is a scalable tool that can collect and
assimilate data from multiple sources such as dynamic management views,
Perfmon, Transact-SQL queries, etc., using a fully customizable data collection
and assimilation frequency. The Data Collector can be extended to
collect data for any measurable attribute of an application. For example, in
our lab test we wrote a custom Data Collector agent job (40 lines of code) to
measure the processing throughput of the workload.
The diagram below depicts a typical Data Collector report.
The Performance data collection and warehouse feature is supported in all
editions of SQL Server 2008.
Lock Escalation
Lock escalation has often caused blocking and sometimes even deadlocking
problems for many ISV applications. Previous versions of SQL Server permitted
controlling lock escalation (trace flags 1211 and 1224), but this was only
possible at an instance-level granularity. While this helped some applications
work around the problem, it caused severe issues for others. Another problem
with the SQL Server 2005 lock escalation algorithm was that locks on
partitioned tables were directly escalated to the table level, rather than the
partition level.
SQL Server 2008 offers a solution for both these issues. A new option has
been introduced to control lock escalation at a table level. If an ALTER TABLE
command option is used, locks can be specified to not escalate, or to escalate
to the partition level for partitioned tables. Both these enhancements help
improve the scalability and performance without having negative side-effects on
other objects in the instance. Lock escalation is specified at the
database-object level and does not require any application change. It is
supported in all editions of SQL Server 2008.
Plan Freezing
SQL Server 2005 enabled greater query performance stability and
predictability by providing a new feature called plan guides to enable specifying
hints for queries that could not be modified directly in the application (for
more information, see the white paper Forcing Query Plans). While a very
powerful feature, plan guides were often cumbersome to use due to the
sensitivity of the plan guides to the formatting, and only supported SELECT DML
operations when used in conjunction the USE PLAN query hint.
SQL Server 2008 builds on the plan guides mechanism in two ways: it
expands the support for plan guides to cover all DML statements (INSERT,
UPDATE, DELETE, MERGE), and introduces a new feature,Plan Freezing, that
can be used to directly create a plan guide (freeze) for any query plan that
exists in the SQL Server plan cache, for example:
sp_create_plan_guide_from_handle
@name = N'MyQueryPlan',
@plan_handle = @plan_handle,
@statement_start_offset = @offset;
A plan guide created by either means have a database scope and are stored
in the sys.plan_guidestable. Plan guides are only used to influence
the query plan selection process of the optimizer and do not eliminate the need
for the query to be compiled. A new function sys.fn_validate_plan_guidehas
also been introduced to validate existing SQL Server 2005 plan guides and
ensure their compatibility with SQL Server 2008. Plan freezing is available in
the SQL Server 2008 Standard, Enterprise, and Developer editions.
Optimize
for Ad hoc Workloads Option
Applications that execute many single use ad hoc batches (e.g.,
nonparameterized workloads) can cause the plan cache to grow excessively large
and result in reduced efficiency. SQL Server 2005 offered the Parameterization
Forced database option to address such scenarios, but that sometimes resulted
in adverse side-effects on workloads that had a large skew in the data and had
queries that were very sensitive to the underlying data.
SQL Server 2008 introduces a new option, optimize for ad hoc
workloads, which is used to improve the efficiency of the plan cache. When
this option is set to 1, the SQL Server engine stores a small stub for the
compiled ad hoc plan in the plan cache instead of the entire compiled plan,
when a batch is compiled for the first time. The compiled plan stub is used to
identify that the ad hoc batch has been compiled before but has only stored a
compiled plan stub, so that when this batch is invoked again the database
engine compiles the batch, removes the compiled plan stub from the plan cache,
and replaces it with the full compiled plan.
This mechanism helps to relieve memory pressure by not allowing the plan
cache to become filled with large compiled plans that are not reused. Unlike
the Forced Parameterization option, optimizing for ad hoc workloads does not
parameterize the query plan and therefore does not result in saving any
processor cycles by way of eliminating compilations. This option does not
require any application change and is available in all editions of SQL Server
2008.
Resource
Governor
Maintaining a consistent level of service by preventing runaway queries
and guaranteeing resources for mission-critical workloads has been a challenge
for SQL Server. In the past there was no way of guaranteeing a certain amount
of resources to a set of queries and prioritizing the access; all queries had
equal access to all the available resources.
SQL Server 2008 introduces a new feature, Resource Governor, which helps
address this issue by enabling users to differentiate workloads and allocate
resources as they are requested. The Resource Governor limits can easily be
reconfigured in real time with minimal impact on the workloads that are
executing. The allocation of the workload to a resource pool is configurable at
the connection level, and the process is completely transparent to the
application.
The diagram below depicts the resource allocation process. In this
scenario three workload pools (Admin workload, OLTP workload, and Report
workload) are configured, and the OLTP workload pool is assigned a high
priority. In parallel two resource pools (Admin pool and Application pool) are
configured with specific memory and processor (CPU) limits as shown. As final
steps, the Admin workload is assigned to the Admin pool, and the OLTP and
Report workloads are assigned to the Application pool.
Below are some other points you need to consider when using resource
governor:
·
Since Resource Governor relies on login
credentials, host name, or application name as a resource pool identifier, most
ISV applications that use a single login to connect multiple application users
to SQL Server will not be able to use Resource Governor without reworking the
application. This rework would require the application to utilize one of the
resource identifiers from within the application to help differentiate the
workload.
·
Database-level object grouping, in
which the resource governing is done based on the database objects being
referenced, is not supported.
·
Resource Governor only allows resource
management within a single SQL Server instance. For multiple instances. Windows System Resource Manager should be
considered.
·
Only processor and memory resources can
be configured. I/O resource cannot be controlled.
·
Dynamically switching workloads between
resource pools once a connection is made is not possible.
·
Resource Governor is only supported in
SQL Server 2008 Enterprise and Developer editions and can only be used for the
SQL Server database engine; SQL Server Analysis Services (SSAS), SQL Server
Integration Services (SSIS), and SQL Server Reporting Services (SSRS) cannot be
controlled.
Table-Valued Parameters
Often one of the biggest problems ISVs encountered while developing
applications on earlier versions of SQL Server was the lack of an easy way to
execute a set of UPDATE, DELETE, INSERT operations from a client as a single
batch on the server. Executing the set of statements as singleton operations
resulted in a round trip from the client to the server for each operation and
could result in as much as a 3x slowdown in performance.
SQL Server 2008 introduces the table-valued parameter (TVP) feature, which
helps resolve this problem. Using the new TVP data type, a client application
can pass a potentially unlimited sized array of data directly from the client
to the server in a single-batch operation. TVPs are first-class data types and
are fully supported by the SQL Server tools and SQL Server 2008 client
libraries (SNAC 10 or later). TVPs are read-only, implying that they can only
be used to pass array-type data into SQL Server; they cannot be used to return
array-type data.
The graph below plots the performance of executing a batch of insert
statements using a parameter array (sequence of singleton operations) vs.
executing the same batch using a TVP. For batches of 10 statements or less,
parameter arrays perform better than TVPs. This is due to the one-time overhead
associated with initiating the TVP, which outweighs the benefits of
transferring and executing the inserts as a single batch on the server.
However, for batches larger than 10 statements, TVPs outperform parameter
arrays, because the entire batch is transferred to the server and executed as a
single operation. As can be seen in the graph for a batch of 250 inserts the
amount of time taken to execute the batch is 2.5 times more when the operations
are performed using a parameter array versus a TVP. The performance benefits
scale almost linearly and when the size of the batch increases to 2,000 insert
statements, executing the batch using a parameter array takes more than four
times longer than using a TVP.
TVPs can also be used to perform other functions such as passing a large
batch of parameters to a stored procedure. TVPs are supported in all editions
of SQL Server 2008 and require the application to be modified.
Filestream
In recent years there has been an increase in the amount of unstructured
data (e-mail messages, documents, images, videos, etc.) created. This
unstructured data is often stored outside the database, separate from its
structured metadata. This separation can cause challenges and complexities in
keeping the data consistent, managing the data, and performing backup/restores.
The new Filestream data type in SQL Server 2008 allows large
unstructured data to be stored as files on the file system. Transact-SQL
statements can be used to read, insert, update and manage the Filestream data,
while Win32® file system interfaces can be used to provide streaming access to
the data. Using the NTFS streaming APIs allows efficient performance of common
file operations while providing all of the rich database services, including
security and backup. In our lab tests we observed the biggest performance
advantage of streaming access when the size of binary large objects (BLOBs) was
greater than 256 kilobytes (KB). The Filestream feature is initially targeted
to objects that do not need to be updated in place, as that is not yet
supported.
Filestream is not automatically enabled when you install or upgrade SQL
Server 2008. You need to enable it by using SQL Server Configuration Manager
and SQL Server Management Studio. Filestream requires a special dedicated
filegroup to be created to store the Filestream (varbinary(max)) data
that has been qualified with the Filestream attribute. This filegroup points to
an NTFS directory on a file system and is created similar to all the other
filegroups. The Filestream feature is supported in all editions of SQL
Server 2008, and it requires the application to be modified to
leverage the Win32 APIs (if required) and to migrate the existing varbinary
data.
SQL Server 2008 is a significant release that delivers many new features
and key improvements, many of which have been designed specifically for ISV
workloads and require zero or minimal application change.
This article presented an overview of only the top-10 features that are
most applicable to ISV applications and help resolve key ISV problems that
couldn’t easily be addressed in the past. For more information, including a
full list of features and detailed descriptions, see SQL Server Books Online and the SQL Server
web site.
Published Nov 24 2008, 01:24 PM by Burzin
Filed under: Maintenance, Query, Performance, OLTP, Table Valued Parameters, Transparent Data Encryption, Backup Compression, Filestream, Resource Governor, Encryption, Data Compression, MDW, Data Collector, TDE, Compression, Statistics
출처 : http://sqlcat.com/top10lists/archive/2008/11/24/top-10-sql-server-2008-features-for-isv-applications-burzin.aspx