http://www.sqlservercentral.com/articles/Transaction+Log/72488/
Printed
2011/02/21 02:36AM
Why is my transaction log full?
It’s an error that’s near-certain to ruin any DBA’s day:
Error: 9002, Severity: 17, State: 2.
The transaction log for database 'VeryImportant' is
full. To find out why space in the log cannot be reused, see the
log_reuse_wait_desc column in sys.databases
“But, but, but...” mutters the DBA, “The log backups are running, so why is
the log full?”
Well, failed or missing log backups are just one of the reasons for a
transaction log to be full. There are several other possible causes.
Where to start?
Start by querying sys.databases and to see what the value of the column
log_reuse_wait_desc is for the database mentioned in the error message.
DECLARE @DatabaseName VARCHAR(50);
SET @DatabaseName = 'VeryImportant'
SELECT name, recovery_model_desc, log_reuse_wait_desc
FROM sys.databases
WHERE name = @DatabaseName
The value of the log_reuse_wait_desc column will show the current reason why
log space cannot be reused. It is possible more than one thing is preventing log
reuse. Sys.databases will only show one. Hence it is possible to resolve one
problem, query sys.databases again and see a different log reuse wait
reason.
The possible values for log_reuse_wait_desc are listed in Books Online.
But first, before we get to an explanation of the various reasons, a short piece
of theory.
Transaction log architecture
The transaction log is a circular file that's divided internally into a
number of virtual log files (VLFs). SQL writes log entries sequentially into the
VLFs. When one VLF fills up, SQL moves to the next one. When it reaches the end
of the file, it will (ideally) go back to the beginning and start using the
first VLF again (hence circular).
A VLF can be in one of two states:
- Active
- Inactive
Active
An active VLF is one that contains one or more log records that are needed by
the database engine. The log records may be part of active transactions, they
may be log records that are needed for replication or mirroring, they may be
needed for a backup, they may be associated with changes that haven’t yet been
written to the data file, etc.
Inactive
An inactive VLF is one that does not contain any log records needed by the
database engine for any reason. There are no active transactions, the log
records are not needed for replication or mirroring and all the changes
associated with these transaction log records have been written to the data
file. If the database is in full or bulk-logged recovery the log records in the
VLF has been backed up.
Log reuse
For a VLF to be reused by SQL, it needs to be inactive. If all the VLFs in
the log are active and SQL has filled the one it is writing to, the transaction
log will grow or, if it cannot grow, data modifications will fail with error
9002. (The transaction log for database '<Database name>' is
full. To find out why space in the log cannot be reused, see the
log_reuse_wait_desc column in sys.databases)
The act of marking one or more VLFs as inactive is generally called log
truncation. While this is a commonly used term, it’s a not a very accurate term.
Truncate, according to the Oxford
English dictionary means “shorten (something) by cutting off the top or the
end”, however log truncation doesn’t shorten anything; it just makes the space
available for reuse.
In addition, people often seem to think that truncate means to discard log
records and break the log backup chain (probably from BACKUP LOG ... WITH
TRUNCATE ONLY). It does not mean that.
In the above diagram, the transaction log has 4 VLFs (labelled A, B, C and
D). This is for simplicity; most log files will have more. The first (A) and
second (B) VLFs are inactive, and the third (C) and fourth (D) are active with
SQL currently writing log entries into the "D" VLF. In that diagram, the (1)
marks the log record of the last log backup, (2) marks the log record associated
with the last checkpoint and (3) is the last log record written into the
database.
After a number of data modifications, the log has wrapped around and SQL has
now reused the first VLF and is about to start reusing the second. The third and
fourth VLFs are still active. In this situation, unless something happens to
mark the third and fourth VLFs as inactive the log will grow as soon as SQL has
mostly used the second VLF. In this case, what is preventing the reuse is that
the log backup has not run again.
If a log backup now runs, the third and fourth VLFs (C and D) would be marked
inactive and would be available for SQL to reuse them.
Causes of delayed log truncation
These are the log reuse wait reasons from sys.databases
Active transactions
The active transaction log reuse wait reason means that there is an open
transaction that is keeping the VLFs active.
When encountering this log reuse wait, two commands will help in identifying
the source of the open transaction, DBCC OPENTRAN and sys.dm_exec_sessions
DBCC OPENTRAN lists information about the oldest open transaction in the
database. The main piece of information needed from this, when investigating log
growth caused by active transactions, is the SPID (Server Process ID) that is
running the transaction.
Transaction information for database
'WebForums'.
Oldest active transaction:
SPID (server process ID): 51
UID (user ID) : -1
Name : user_transaction
LSN : (10861:3200:1)
Start time : Jan 14 2011 1:04:26:017AM
SID :
0x0105000000000005150000002e86f8cbc457a001b905c7e95e040000
DBCC execution completed. If DBCC printed error
messages, contact your system administrator.
In this example, the connection with SPID 51 has had a transaction open since
1AM on the 14th Jan (a late worker, clearly). While there’s nothing
in the output of DBCC OPENTRAN that can identify who is running that
transaction, the SPID can be used along with some of the DMVs to get that
information.
SELECT host_name,program_name, original_login_name, st.text
FROM sys.dm_exec_sessions es
INNERJOIN sys.dm_exec_connections ec
ON es.session_id = ec.session_id
CROSSAPPLY sys.dm_exec_sql_text(ec.most_recent_sql_handle) st
WHERE ec.session_id = 51
I’m using original login name, rather than login name in case there’s any
impersonation occurring.
That should be enough information to make an educated decision whether to
kill the connection, or to speak to the person running the query and ask them to
stop, or to wait for it to finish if it’s too important to be rolled back.
Checkpoint
The checkpoint log reuse wait should be a transient condition in most
circumstances. It will be most commonly seen in simple recovery model.
SQL Server automatically runs checkpoints on a regular basis to keep the
number of log records that it would need to process during a database recovery.
It runs checkpoints for a number of other reasons as well. For a full list, see
http://msdn.microsoft.com/en-us/library/ms188748.aspx
If the checkpoint log reuse wait appears frequently or for prolonged periods
occur it may indicate that the IO subsystem performance is inadequate for the
volume of data that needs to be written during the checkpoint or that there are
very large numbers of changed pages that the checkpoint needs to process.
Database backup
The active portions of the log are necessary for database backups, as the
backup has to include at least that much of the log to ensure a consistent
restore. As such, long-running database backups may result in log growth as the
portions of the log necessary for the backup must be retained for the duration
of the backup.
There is little that can be done if this wait type is regularly encountered
and is causing problems, other than optimising the backups themselves
(optimising the IO subsystem or striping the backups over multiple devices),
converting to file/filegroup backups (with appropriate consideration of
backup/restore requirements) or using compressed backups either natively (SQL
2008 Enterprise, SQL 2008 R2 Enterprise and Standard editions) or via a
3rd party tool, or other methods that reduce the time taken to back
the database up.
Replication
Transactional replication uses the transaction log to identify changes that
need to be replicated to subscribers.
Whenever a change is made to a replicated table, the log records associated
with that change are marked as ‘pending replication’. The Log Reader Agent job,
which is one of the components of transactional replication, reads along the log
looking for such transactions and, when it finds one, it places the details of
the change into the distribution database and marks the log entry as
‘replicated’.
If transactions are being marked ‘pending replication’ and the Log Reader is
not running, then those log records will never be marked as ‘replicated’, and
the VLFs that contain those log records will never become inactive, and the log
will grow.
In addition to this, on SQL 2008 Enterprise edition, the Change Data Capture
feature (CDC) uses the transaction log and the Log Reader Agent job in much the
same way as transactional replication. Hence, if the CDC jobs are not running,
the log space cannot be reused, much as if there was transactional replication
with the Log Reader Agent job not running. NB, the log reuse wait description
still shows Replication, not CDC.
The only kind of replication that uses the transaction log is transactional
replication, and thus it is the only type of replication that can prevent log
reuse. However, there was a bug in some versions of SQL 2005 that could result
in a log reuse wait of Replication when the only replication present was
snapshot replication. See http://blogs.msdn.com/b/sqlserverfaq/archive/2009/06/01/size-of-the-transaction-log-increasing-and-cannot-be-truncated-or-shrinked-due-to-snapshot-replication.aspx.
Database Mirroring
In database mirroring, the principal database sends log records to the mirror
database. In synchronous mirroring, this occurs before the transaction is
committed on the principal. In asynchronous mirroring, the log records are
placed into a log send queue when the transaction is committed on the principal.
The log records will then be sent to the mirror at some point afterwards.
If the mirroring is running asynchronous and the rate of transactions exceeds
that which the link between the principal and mirror can handle, then the log on
the principal will grow because the log records cannot be marked inactive until
they have been sent to the mirror.
The same thing will happen in synchronous mirroring if the link between the
principal and mirror drops and the mirroring becomes disconnected or if the
mirroring session is suspended. The log records on the principal remain active
until the mirror reconnects or the session is resumed and the log records are
copied over to it, or until the mirroring is dropped.
Snapshot creation
This should also be a short-lived transient type. During the creation of a
database snapshot, SQL has to read the source database’s log so that it can run
crash-recovery on the snapshot in order to ensure that it is transactionally
consistent. While it is doing this, the portions of the log necessary cannot be
marked as inactive.
DBCC CHECKDB uses hidden database snapshots for its checks and so this wait
may appear when running CheckDB even if there are no user-created snapshots.
Prolonged snapshot creation waits can be a result of very long-running
transactions that need to all be rolled back for the snapshot creation, or an
inefficient reading of the log due to a very large number of VLFs.
Log backup
Log backups are the one that most people know can prevent log reuse. In full
and bulk-logged recovery models, VLFs cannot be marked as inactive until all the
log records in that VLF have been backed up. The log backup wait reason will
never appear in Simple Recovery.
If this log reuse wait reason appears, check that the log backups are
correctly scheduled and are succeeding. Also check that the NO_TRUNCATE option
has not been specified, as that option is only for backing up the log of a
damaged database and will not ‘truncate’ any VLFs (as the option name
implies).
Conclusion
This should cover the main reasons why a database’s transaction log may grow.
Fixes for more complex problems are out of scope of this article, but this
should at least help in knowing where to start looking.
Acknowledgements
Firstly a massive thank you to Paul Randal (blog|twitter) who kindly agreed to
tech-edit this article and corrected a variety of errors small and large.
Without his assistance this article would not have been worth reading.
Also thank you to Robert Davis (blog|twitter) who provided some comments
and corrections around log use in database mirroring.
Finally thanks to Jason Brimhall (CirquedeSQLeil) (blog | twitter), Gus Gwynne (GSquared), Brandie
Tarvin (blog|twitter), Wayne Sheffield and
Stefan Krzywicki for correcting grammar errors, illogical sentences and an
appalling dearth of punctuation.
As they say in published books, any errors that remain are solely mine.
Copyright © 2002-2011 Simple Talk Publishing. All Rights Reserved.
Privacy Policy. Terms of Use. Report Abuse.
Following is my feedback! ^_^
1. Edit server.txt
SERVER01 user01 password01
SERVER02 user02 password02
...
more than 100 computers ....
or
dsquery computer -limit 10000 > computers.txt
to fetch domain computers name and do some modify to fit your need ...
2. command tools
grep.exe
sed.exe
3. Edit DSN (ODBC)
Create a database name: PViewer
Create a DSN name to mapping your PViewer Database : PViewer
3. FetchPerfData.bat
@echo off
REM ----------------------------------------
REM Set environment variable
REM ----------------------------------------
setlocal enableextensions
rem setlocal enabledelayedexpansion
set title=%~n0
REM ----------------------------------------
REM ----------------------------------------
REM Set console title
REM ----------------------------------------
title Generate Performance Counters and System Information
REM ----------------------------------------
REM ----------------------------------------
REM Set path variable
REM ----------------------------------------
path=%PATH%;C:\Program Files\Microsoft Office\Office12\
REM ----------------------------------------
REM ----------------------------------------
REM Set system date variable
REM ----------------------------------------
for /f "tokens=1-5 delims=/ " %%s in ("%date%") do (set CurrDate=%%s%%t%%u)
for /f "tokens=1-5 delims=:." %%d in ("%time%") do (set CurrTime=%%d%%e%%f)
set sysdate=%CurrDate%_%CurrTime%
REM ----------------------------------------
REM ----------------------------------------
REM Clear DNS record
REM ----------------------------------------
ipconfig /flushdns
REM ----------------------------------------
REM ----------------------------------------
REM Create work directory
REM ----------------------------------------
if not exist ".\Work" (mkdir .\Work) else (del .\Work\* /F/Q)
if not exist ".\Counter" (mkdir .\Counter) else (del .\Counter\* /F/Q)
if not exist ".\Output" (mkdir .\Output) else (del .\Output\* /F/Q)
REM ----------------------------------------
REM ----------------------------------------
REM Generating system information header
REM ----------------------------------------
systeminfo /fo csv > .\Work\HTemp.csv
sed -e "1d" -e "3d" .\Work\HTemp.csv > .\Work\STemp.csv
REM ----------------------------------------
REM ----------------------------------------
REM Generate Performance Counters
REM ----------------------------------------
for /f "tokens=1-3 delims= " %%S in (Servers.txt) do (
net use T: \\%%S\c$ /user:%%T %%U
start /wait typeperf -q Memory -s %%S -o .\Counter\%%S_Memory.cf
start /wait typeperf -cf .\Counter\%%S_Memory.cf -si 1 -sc 5 -y -f SQL -o SQL:PViewer!%%S
start /wait typeperf -q PhysicalDisk -s %%S -o .\Counter\%%S_PhysicalDisk.cf
start /wait typeperf -cf .\Counter\%%S_PhysicalDisk.cf -si 1 -sc 5 -y -f SQL -o SQL:PViewer!%%S
start /wait typeperf -q Processor -s %%S -o .\Counter\%%S_Processor.cf
start /wait typeperf -cf .\Counter\%%S_Processor.cf -si 1 -sc 5 -y -f SQL -o SQL:PViewer!%%S
start /wait typeperf -q "Network Interface" -s %%S -o .\Counter\%%S_Network.cf
start /wait typeperf -cf .\Counter\%%S_Network.cf -si 1 -sc 5 -y -f SQL -o SQL:PViewer!%%S
start /wait typeperf -cf .\Counter\%%S_Server.cf -si 1 -sc 5 -y -f SQL -o SQL:PViewer!%%S
start /wait typeperf -q Server -s %%S -o .\Counter\%%S_Server.cf
if "%%S"=="yourSQLServer" (
start /wait typeperf -qx "sqlserver:databases" -s %%S -o .\Counter\%%S_Databases.cf
start /wait typeperf -cf .\Counter\%%S_Databases.cf -si 1 -sc 5 -y -f SQL -o SQL:PViewer!%%S
start /wait typeperf -qx "sqlserver:buffer manager" -s %%S -o .\Counter\%%S_Buffer.cf
start /wait typeperf -cf .\Counter\%%S_Buffer.cf -si 1 -sc 5 -y -f SQL -o SQL:PViewer!%%S
)
systeminfo -s \\%%S -U %%T -P %%U /fo csv /nh >> .\Work\STemp.csv
net use T: /delete
)
REM ----------------------------------------
REM ----------------------------------------
REM Generate System information
REM ----------------------------------------
grep -v "^$" .\Work\STemp.csv > .\Output\systeminfo.csv
REM ----------------------------------------
REM ----------------------------------------
REM Delete Work files
REM ----------------------------------------
del .\Work\HTemp.csv /Q/F
del .\Work\STemp.csv /Q/F
REM ----------------------------------------