RESOURCE_SEMAPHORE WAIT – Performance Troubleshooting

Recently I was involved in troubleshooting the performance issue. All the queries were running slow on SQL instance. Most of the queries usually complete in seconds and that day it was running more than 15 minutes and not return any output.

I have followed below approach to fix this issue.
Step 1: Checked for the blocking if there is any on the SQL instace but didn’t find any.

select * from sys.sysprocesses where blocked <> 0

Step 2: Checked for the active sessions on the server. To find out if there is any session which in KILLED/Rollback state but didn’t find out any. Sessions in KILLED/Rollback states are resource incentive sessions based on different conditions.

sp_who2 active

Step 3: Checked for the overall wait types on the SQL Server and found RESOURCE_SEMAPHORE and Page IO related waits.

SELECT TOP 10
        wait_type ,
        max_wait_time_ms wait_time_ms ,
        signal_wait_time_ms ,
        wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms ,
        100.0 * wait_time_ms / SUM(wait_time_ms) OVER ( )
                                    AS percent_total_waits ,
        100.0 * signal_wait_time_ms / SUM(signal_wait_time_ms) OVER ( )
                                    AS percent_total_signal_waits ,
        100.0 * ( wait_time_ms - signal_wait_time_ms )
        / SUM(wait_time_ms) OVER ( ) AS percent_total_resource_waits
FROM    sys.dm_os_wait_stats
WHERE   wait_time_ms > 0
        AND wait_type NOT IN 
( 'SLEEP_TASK', 'BROKER_TASK_STOP', 'BROKER_TO_FLUSH',
  'SQLTRACE_BUFFER_FLUSH','CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT', 
  'LAZYWRITER_SLEEP', 'SLEEP_SYSTEMTASK', 'SLEEP_BPOOL_FLUSH',
  'BROKER_EVENTHANDLER', 'XE_DISPATCHER_WAIT', 'FT_IFTSHC_MUTEX',
  'CHECKPOINT_QUEUE', 'FT_IFTS_SCHEDULER_IDLE_WAIT', 
  'BROKER_TRANSMITTER', 'FT_IFTSHC_MUTEX', 'KSOURCE_WAKEUP',
  'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',
  'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BAD_PAGE_PROCESS',
  'DBMIRROR_EVENTS_QUEUE', 'BROKER_RECEIVE_WAITFOR',
  'PREEMPTIVE_OS_GETPROCADDRESS', 'PREEMPTIVE_OS_AUTHENTICATIONOPS',
  'WAITFOR', 'DISPATCHER_QUEUE_SEMAPHORE', 'XE_DISPATCHER_JOIN',
  'RESOURCE_QUEUE' )
ORDER BY wait_time_ms DESC

Step 4: Analyzed the wait types, while analyzing the wait types I found the RESOURCE_SEMAPHORE wait type. Let’s see what is RESOURCE_SEMAPHORE wait type and what does it indicate.

RESOURCE SEMAPHORE SQL wait type “Occurs when a memory request for query cannot be granted immediately due to other concurrent queries or memory pressure, Resource Semaphore high waits and wait times may indicate excessive number of concurrent queries, or excessive memory request amounts”

Let’s query DMV to get more idea on Resource Semaphore wait. In the output of the below query output you have to check grantee_count and waiter_count column values, grantee_count is the number of queries which have their memory granted and the waiter_count is the number of queries which are waiting in queue to get memory

SELECT * FROM sys.dm_exec_query_resource_semaphores

Now let’s check queries waiting for memory using “sys.dm_exec_query_memory_grants” DMV. Columns grant_time and granted_memory_kb will be NULL for those queries which are waiting to get their requested memory

SELECT * FROM sys.dm_exec_query_memory_grants  where grant_time is null

I have also checked the “Memory Grants Pending” performance counter to get more idea. If Memory Grants Pending values is great than 0, than there is a problem.

SELECT OBJECT_NAME,cntr_value AS [Memory Grants Pending]
FROM sys.dm_os_performance_counters
WHERE OBJECT_NAME = 'MSSQL$MyInstance:Memory Manager'
AND counter_name = 'Memory Grants Pending'

Step 5 To resolve the issue immediately, as it was active-active cluster and there was memory available to allocate to SQL Server, I have increased the memory.

Long term fix
Find out the memory consuming queries and tune them
Schedule BULK Processing Jobs during off business hours
Make sure stats and indexes are up to date
If the server load increase, plan to add more memory

How to rerun the SQL Agent job step automatically in case of failure?

How to rerun the SQL Agent job step automatically in case of failure?

Take an example of long running the job with the multiple steps where each steps has dependency of the previous steps successful execution. In case if any of the job step fails, entire job will fail and DBA has to manually run the job to complete it and job will run from the beginning.

There could be multiple reason of the job failure for example deadlock, network issue, requested backup file is not available as backup is still in progress etc.

You can avoid this situation using he job step advanced option Retry Attempts and Retry Interval (minutes)

Before the job step considered to have failed, you can set the number of times that the job step should be repeated using “Retry attempts” option. You can also use the Retry interval (minutes) option the number of minutes interval between retry attempts.

Default value of “Retry attempts” and “Retry interval (minutes)” is 0

For example if I will specify the “Retry attempts” to 3 and “Retry interval (minutes)” to 15 minutes, job step will re-run of up to 3 failure at 15 minutes interval.

Capture

What is the role of job owner in SQL Server? How does it affect job?

role of job owner account

Capture

When SQL Server Agent run the schedule job it will check for the permission of the job owner, if job owner is non-sysadmin account, SQL Server logs into job using its own credential and switches the context of the job to non-sysadmin job owner account.

You can see the job history below message for the security context switch

Message
Executed as user: Jugal

If the job owner is sysAdmin account SQL Agent will run the job under its own security context, for example job owner is SA which is SQL Server Sysadmin account. You can see in the job history, SQL Server service account.

Executed as user: SQL Server Service account

It is recommended that individual user accounts should not be set as the owner of the agent jobs because of the security issue and to avoid the important job failures, in case individual account deleted or required permission revoked due to role change or account disable in active directory.

Instead of using individual user accounts as job owner dedicated account with the least privileges should be created to run SQL Server agent jobs.

Spinlocks lightweight synchronization primitives

Spinlocks are lightweight synchronization primitives which are used to protect access to data structures. Spinlocks are used to access data structures for a very short period of time. When a thread attempting to acquire a spinlock and if it is unable to obtain access it executes in a loop periodically checking to determine if the resource is available instead of immediately yielding.

After some period of time a thread waiting on a spinlock will yield before it is able to acquire the resource in order to allow other threads running on the same CPU to execute. This is known as a backoff.

When a large number of threads are contending for access to a single spinlock and it can lead to performance problems

You can use below DMV to get the SPIN LOCK information

select * from sys.dm_os_spinlock_stats

Examples
FGCP_PRP_FILL
OPT_IDX_STATS
BUF_FREE_LIST

Script to get data file size, used space and free space

While troubleshooting the disk space issue, you can use below script to check the data file size, used space and free size.

select 
		DBName,
		name,
		[filename],
		size as 'Size(MB)',
		usedspace as 'UsedSpace(MB)',
		(size - usedspace) as 'AvailableFreeSpace(MB)'
from		
(	
SELECT
db_name(s.database_id) as DBName,
s.name AS [Name],
s.physical_name AS [FileName],
(s.size * CONVERT(float,8))/1024 AS [Size],
(CAST(CASE s.type WHEN 2 THEN 0 ELSE CAST(FILEPROPERTY(s.name, 'SpaceUsed') AS float)* CONVERT(float,8) END AS float))/1024 AS [UsedSpace],
s.file_id AS [ID]
FROM
sys.filegroups AS g
INNER JOIN sys.master_files AS s ON ((s.type = 2 or s.type = 0) and s.database_id = db_id() and (s.drop_lsn IS NULL)) AND (s.data_space_id=g.data_space_id)
) DBFileSizeInfo


Best Practices SQL Server Transaction Log

Background
In SQL Server Database Recovery model will decide how the transaction log will be logged in transaction log file. Transaction log file extension is .LDF

Full – Transaction log is fully logged (Can take log backup)
Bulk Logged – Bulk transaction is minimally logged (Can take log backup)
Simple – Transaction log will be truncated on checkpoint

In transaction log file transactions are sequentially logged, every record in transaction log file is uniquely identified by log sequence number (LSN). LSN data type is Numeric (25,0)

You can follow below best practices for the transaction log file

1. Don’t create multiple log files : As transactions will be logged into log file sequential manner it would not help for data stripping across multiple files
2. Keep the transaction log file on the separate drive
3. Identify the RPO and RTO for the database and according to that choose the recovery model and correct log backup strategy
4. RAID 1 + 0 is high recommended for transaction log
5. AUTO SHRINK should be always off on the database
6. Pre-allocate the space to transaction log file, it will improve the performance. Don’t depend on the auto growth option.
7. Always set the values of Initial size, max size and growth property of the transaction log file
8. Always set auto growth value, don’t set in percentage
9. Transaction Log file internal fragmentation can also lead the performance and database recovery issue. Database should not have an excessive number of Virtual Log Files (VLFs) inside the Transaction Log. Having a large number of small VLFs can slow down the recovery process that a database goes through on startup or after restoring a backup. Make sure transaction log initial size and log growth defined well to avoid internal fragmentation
10. External fragmentation can be removed by using disk defragmentation utility
11. In case of Transaction log full, please use below query to check the cause of the log full and take the decision accordingly.

SELECT name ,
recovery_model_desc ,
log_reuse_wait_desc
FROM sys.databases
WHERE name = @DatabaseName