One of my latest projects involves bulk inserting data from a flat-file into a database table (SQL Server 2005 Database). The data file is provided by another third-party application and FTP’d over to a server that I have access to via batch script and SFTP2.
I use a stored procedure to execute the bulk insert command. The following is the basic syntax to run the bulk insert.
DECLARE @BulkSQL varchar(2000)
SET @SQL = 'BULK INSERT DestinationTable FROM ''//UNC/Path/To/Server/flatfile.dat'' WITH (ROWTERMINATOR=''\n'', FIELDTERMINATOR = ''|'' , FIRSTROW=2)'
EXEC (@BulkSQL)
A little bit of explanation may be necessary. The flat file format is as follows:
- 1st row is the header row where each column is delimited by a pipe (|)
- The remainer of the rows contain data also delimited by pipes.
NAME|ADDRESS|CITY|STATE|PHONE
John Doe|123 Main St|San Francisco|CA|415-234-2344
Jim Smith|33 Elm St|Oakland|CA|510-542-4562
The bulk insert statement has some attributes that you can customize. The field terminator attribute describes how the fields are delimited. The row terminator defines what character will symbolize the end of row marker. The firstrow attribute states which row to start importing from. Since there is a header row, firstrow should be set to 2 (second row).
When I tested the stored procedure containing the bulk insert command, I received the following error:
Server: Msg 4863, Level 16, State 1, Line 1
Bulk insert data conversion error (truncation) for row 1, column 5 (PHONE).
The problem was that it would work sometimes and sometimes it would fail. I had no clue what the deal was. I would switch between flat-files; some worked and some failed. I realized that the truncation error was happening only on column 5, the last column, which was a fixed size of 12 characters. But all the files had a last column of 12 characters! Argh!
I opened up the flat-file.dat in Notepad++ (a free source code editor) as seen below

Then I noticed the “Show All Characters” icon in the toolbar, which looks like symbol for a paragraph mark (or backwards “P”).

Then lo and behold, I saw the problem characters, the CR (carriage return) and LF(line feed) characters. I read somewhere that
- Windows uses carriage return – line feed.
- UNIX uses line feed.
- Macintosh uses carriage return.
But the bulk insert looks for a newline “\n” character to mark end of rows in the data file. Note: sometimes, people consider newline and linefeed the same thing. So, the file was using CR+LF as row terminator, but the bulk statement was looking for “\n” as it’s row terminator (or LF). So, the bulk insert was considering the CR character as part of column 5, thus, creating a truncation error.
Before I changed any of my code, I wanted to be certain what end-of-row marker a fresh data file coming from the other application uses. What I discovered was that it used only LF (line feed) markers.
So, to convert my current data file from CR+LF to just LF, I just selected Format > Convert to Unix as seen here.

And now, I get

Since I changed to row terminator symbol to use line feed, I had to modify the bulk insert statement to the following:
DECLARE @BulkSQL varchar(2000)
SET @SQL = 'BULK INSERT DestinationTable FROM ''//UNC/Path/To/Server/flatfile.dat'' WITH (ROWTERMINATOR='''+ CHAR(10) +''', FIELDTERMINATOR = ''|'' , FIRSTROW=2)'
EXEC (@BulkSQL)
So, I tested the stored procedure again, and it worked perfectly.
Below is a table defining control characters and their ASCII equivalent.
