home

-- File Processing and Importing
Author Nigel Rivett
I see a lot of systems which process files from many sources. 
The initial implementation is often to create an ssis package which processes a static filename in a static folder.
The package will open the file, import the data, merge with production data then delete the original file. Data manipulation may be before the import or with the merge.
This package is then scheduled to run daily at a particular time when the file is expected to be delivered.
There are often many other later processes which rely on this data

This causes a number of issues
	If the file does not arrive then the package fails and has to be dealt with manually.
	There is no record of the file arriving.
	If there is an issue with the production merge the file import needs to be run again (assuming the file still exists)
	There is no log of the processing steps and only a record of the total time  while the entry exists in the agent log.
	Any dependent processes need to be re-run manually if the job is run at a later time

Solution
Separate out the parts of the processing
File detection
There is a single task which polls for the file arrival and adds this as an entry to a table with a timestamp.
If the file has a static name then rename it to include a timestamp e.g. MyFileName_yyyymmd_hhmmss.ext
This is single task which takes as parameter a file mask, folder and batch type

We now have a generic task that can be used to detect file arrival.
The table which holds the file name can be used to control further processing which can be repeated.

You can probably see where this is headed. We have another task which polls this table and creates a batch for the file.
There is then another task which processes all of the steps for this batch type.
Dependencies are controlled by the completion of this batch. 


home