home

-- Merge for audit trails
Author Nigel Rivett
Using merge for audit trails

I'm usually very slow to use new t-sql statements but this has proved to be very useful.
I had doubts about its performance but those have proved to be mostly unfounded.
I'm sure that it's possible to get things to work faster but for the simplicity of the command you need to decide if the effort is worth it.

First a discusion of the syntax
This is a simple query to update a target table from a source dataset.
Updating rows that exist and inserting rows that don't.
It also maintains a z_inserted and z_updated column.
For those following Kimball it maintains a type 1 dimension.


merge 	-- this is the destination table
using 	-- source to use in the update - think of it as a from clause
on 		-- join criteria to match source rows to target rows
when matched then update set
	colnamea = .colname1 ,
	colnameb = .colname2 ,
	z_updated = getdate()
when not matched by target then insert
	(
	colnamea ,
	colnameb ,
	z_inserted ,
	z_updated
	)
values
	(
	.colname1 ,
	.colname2 ,
	z_inserted ,
	z_updated
	)

We can turn this into an insert only statement by setting the join criteria to "1=0"
We can turn it into an update only statement by omitting the insert clause.

Let's look at the .
This is something that is often built and looked at in isolation - so lets isolate it.
A CTE is a useful construct for this.
You could of course use a derived tables in the merge but I prefer to separate with a CTE.

;with cte as
(
		-- source to use in the update - think of it as a from clause
)
merge 	-- this is the destination table
using cte	
on 		-- join criteria to match source rows to target rows
when matched then update set
	colnamea = cte.colname1 ,
	colnameb = cte.colname2 ,
	z_updated = getdate()
when not matched by target then insert
	(
	colnamea ,
	colnameb ,
	z_inserted ,
	z_updated
	)
values
	(
	cte.colname1 ,
	cte.colname2 ,
	z_inserted ,
	z_updated
	);

Once we have the basic construct we can do interesting things.

Audit Trail
Requirement
An input batch is received containing USERID, TransactionDatetime, Value.
For instance this could be a received file, daily/weekly extract from an external database.

Create an hourly aggregated table of transaction counts and values for each UserID.
Allow for the reversal of a batch due to replacement data.

We will assume that the batch ID is held in a variable. I would expect this to be a stored procedure with the BatchID passed as a parameter.

The aggregate table will be
UserDateHour
	UserDateHour_ID ,		-- identity pk
	UserID ,
	date ,
	hour ,
	TransactionCount ,
	TransactionValue

To cater for the reversal and investigation we create an audit trail table
UserDateHour_au
	UserDateHour_ID ,
	TransactionCount ,
	TransactionValue ,
	BatchID

Note - the BatchID does not appear on the aggregate table, UserID, date, hour do not appear in the audit trail.


i.e. aggregate by date, hour, userID
You could create a temp table with the aggregated data.
Set a flag on each row to say whether it is an update or insert.
Update the existing data
Insert the new data
Add the inserts and updates to an audit trail table by joining to the aggregate table to get the ID.
Remembering to perform this all in a transaction.
(Yes I know some of those statements can be merged but why ruin a good story)

With the merge statement it becomes a single statement
For this we make use of the output clause
This gives some of the fuctionality of a trigger.
You get a deleted and inserted table like in a trigger to show inserted and updated rows.

the cte becomes
;with cte as
(
select	UserID ,
		date = convert(date,transaction_datetime) ,
		hour = datepart(hh,transaction_datetime) ,
		TransactionCount = count(*) ,
		TransactionValue = sum(value)
from InputBatch
group by 
		UserID ,
		convert(date,transaction_datetime) ,
		datepart(hh,transaction_datetime)
)

And the output clause
output
	coalesce(inserted.UserDateHour_id, deleted.UserDateHour_id) ,
	coalesce(inserted.TransactionCount, 0) - coalesce(deleted.TransactionCount, 0) ,
	coalesce(inserted.TransactionValue, 0) - coalesce(deleted.TransactionValue, 0) ,
	@BatchID
into UserDateHour_au
	(
	UserDateHour_id ,
	TransactionCount ,
	TransactionValue
	BatchID
	)

Note
In the output clause we have available the identity from the insert/update statement executed.
The values in the inserted/deleted tables are after the execution of the query
	hence the new value - the old value.
	The coalesce is to cater for inserts. The coalesce for the inserted value is not needed but just in case someone adds a delete clause.
	
Now, how to reverse a batch so that it can be reprocessed?
Easy as we have the identity value and the BatchID in the audit table

update UserDateHour
set
	TransactionCount = t.TransactionCount - au.TransactionCount ,
	TransactionValue = t.TransactionValue - au.TransactionValue
from UserDateHour_au au
	join UserDateHour t
		on au.UserDateHour_id = t.UserDateHour_id
where au.BatchID = t.BatchID

Of course you have to decide what to do with the audit trail table entry after this.
I usually just leave it. You could delete it or record the reversal (might have issues with the PK though).

The full statement would be

;with cte as
(
select	UserID ,
		date = convert(date,transaction_datetime) ,
		hour = datepart(hh,transaction_datetime) ,
		TransactionCount = count(*) ,
		TransactionValue = sum(value)
from InputBatch
group by 
		UserID ,
		convert(date,transaction_datetime) ,
		datepart(hh,transaction_datetime)
)
merge UserDateHour t
using cte
	on t.UserID = cte.UserID
	and t.date = cte.date
	and t.hour = cte.hour
when matched then set
	TransactionCount = t.TransactionCount + cte.TransactionCount ,
	TransactionValue = t.TransactionValue + cte.TransactionValue
when not matched by target then insert
	(
	UserID ,
	date ,
	hour ,
	TransactionCount ,
	TransactionValue
	)
values
	(
	cte.UserID ,
	cte.date ,
	cte.hour ,
	cte.TransactionCount ,
	cte.TransactionValue
	)
output
	coalesce(inserted.UserDateHour_id, deleted.UserDateHour_id) ,
	coalesce(inserted.TransactionCount, 0) - coalesce(deleted.TransactionCount, 0) ,
	coalesce(inserted.TransactionValue, 0) - coalesce(deleted.TransactionValue, 0) ,
	@BatchID
into UserDateHour_au
	(
	UserDateHour_id ,
	TransactionCount ,
	TransactionValue
	BatchID
	);

In Kimball terms, if this is auditing a faact table, you will notice the separation of metrics and key values.


home