I recently wrote a few automated database-populating scripts. Specifically, I am running Microsoft SQL Server in a container in a Kubernetes cluster—okay, it's Red Hat OpenShift, but it's still Kubernetes. It was all fun and games until I started mixing Windows and Linux; I was developing on my Windows machine, but obviously the container is running Linux. That's when I got the gem of an error shown in Figure 1. Well, not so much an error as errant output.
What in the world? Here's the CSV data I used to populate the table:
1,Active 2,Inactive 3,Ordered 4,Billed 5,Shipped
Here's the T-SQL code I used for the same purpose:
BULK INSERT dbo.StatusCodes FROM '/tmp/StatusCodes.csv' WITH (FORMAT='CSV',FIELDTERMINATOR=',',KEEPIDENTITY); GO SELECT * FROM dbo.StatusCodes; GO
What is going on here?
TL;DR: Line endings
It's the line endings. They are the issue.
Specifically, Windows and Linux handle line endings differently. To understand why, we need to go back a ways in history.
ASDFJKL
Ever use a manual typewriter? Okay, okay ... enough of the "That's old!" jokes. Figure 2 illustrates.
The typewriter mechanism that holds the rubber cylinder is called the carriage because it carries the paper. (That rubber cylinder is technically known as a platen, but stay with me as I employ poetic license and use "carriage.")
As you type, the carriage moves to the left. When you reach the edge of the paper, you use the big lever on the far left to return the carriage to the starting position; that is, you perform a carriage return. In addition, as the lever moves, it advances the paper up one line, which is known as a line feed.
When you do both movements, you get "carriage return plus line feed," sometimes abbreviated to CRLF or CR/LF. You can move the carriage without feeding one line, and you can advance one line without moving the carriage. They are two distinct and separate actions, but anyone who has mastered the manual typewriter knows that they are typically done in one, swift, soulful, and athletic motion, akin to desktop gymnastics of the highest order. (Please excuse more poetic license as I romanticize about typing.)
Teletype
Meanwhile, over in the world of automation, the Teletype machine became very popular. This allowed the transmission of text around the world, across telephone lines. But long distance calls were expensive, so minimizing the time and data sent was paramount. So, it was decided that one and only one character would be used for a carriage return and line feed, the so-called new line character. You see it as "\n
" in code. You paid for every byte, back then, so cutting costs was important.
We're talking about 300 baud modems here, folks. Just think about that; 300 bits per second; three hundred. Now, we want gigabits everywhere.
Back to line endings
The reasons don't matter: Windows chose the CR/LF model, while Linux uses the \n
model. So, when you create a file on one system and use it on the other, hilarity ensues. Or, in this case, two hours of debugging ending in madness and me contemplating a new career in woodworking.
Quick fix for Linux and Windows line endings
The quick fix for those incompatible line endings was very simple: I altered my T-SQL to include the ROWTERMINATOR specification, like this:
BULK INSERT dbo.StatusCodes FROM '/tmp/StatusCodes.csv' WITH (FORMAT='CSV',FIELDTERMINATOR=',',ROWTERMINATOR = '\r\n',KEEPIDENTITY); GO SELECT * FROM dbo.StatusCodes; GO
That works when uploading my CSV from my Windows machine. When uploading from my Linux machine, I use the following, where the ROWTERMINATOR is the simple new line character:
BULK INSERT dbo.StatusCodes FROM '/tmp/StatusCodes.csv' WITH (FORMAT='CSV',FIELDTERMINATOR=',',ROWTERMINATOR = '\n',KEEPIDENTITY); GO SELECT * FROM dbo.StatusCodes; GO
Simple, but unless you know about it, you either get weird results or some seemingly unrelated error messages. So, be advised. For example, if I try to use the Windows-specific command (where ROWTERMINATOR is "\r\n
") in my Linux environment, I get the following error:
Msg 4879, Level 16, State 1, Server mssql-1-h2c96, Line 2 Bulk load failed due to invalid column value in CSV data file /tmp/StatusCodes.csv in row 1, column 2. Msg 7399, Level 16, State 1, Server mssql-1-h2c96, Line 2 The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error. Msg 7330, Level 16, State 2, Server mssql-1-h2c96, Line 2 Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)". Id statusCodeDescription ----------- ---------------------
What does it all mean?
The upshot is this: You might see some hiccups and weird behavior when you use a file in both Windows and Linux. Just be aware of it and you'll be fine.
Visit my GitHub repository NetCandyStore for all of the code referenced in this article.
Last updated: February 5, 2024