Import a table from SQL Server into CockroachDB

This is a quick tutorial on exporting data out of SQL Server into CockroachDB. This is meant to be
a learning exercise only and not meant for production deployment. I welcome any feedback to
improve the process further. The fastest way to get started with SQL Server is via available
Docker containers. I’m using the following tutorial to deploy SQL Server on Ubuntu from my Mac. 
My SQL Server-Fu is a bit rusty and I opted for following this tutorial to restore WideWordImporters
sample database into my Docker container. You may also need SQL Server tools installed on your
host and you may find direction for Mac OS and Linux at the following site, users of Windows are quite
familiar with download location for their OS. I also used the following directions to install SQL Server
tools on my Mac but ran into compatibility issues with the drivers in my Docker container. This will be a
debug session for another day.

Run SQL Server in Docker

pull the SQL Server image

docker pull mcr.microsoft.com/mssql/server:2019-GA-ubuntu-16.04

execute the container

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=Cockroach!1' \
 -p 1433:1433 --name sql1 \
 -d mcr.microsoft.com/mssql/server:2019-GA-ubuntu-16.04

change password

docker exec -it sql1 /opt/mssql-tools/bin/sqlcmd \
 -S localhost -U SA -P 'Cockroach!1' \
 -Q 'ALTER LOGIN SA WITH PASSWORD="CockroachDB1!"'

Restore a backup file of WideWorldImporters Database in the container

create a backup directory

docker exec -it sql1 mkdir /var/opt/mssql/backup

download WideWorldImporters database

curl -L -o wwi.bak 'https://github.com/Microsoft/sql-server-samples/releases/download/wide-world-importers-v1.0/WideWorldImporters-Full.bak'

copy the backup file into container

docker cp wwi.bak sql1:/var/opt/mssql/backup

list the logical names and paths for the backup

docker exec -it sql1 /opt/mssql-tools/bin/sqlcmd -S localhost \
   -U SA -P 'CockroachDB1!' \
   -Q 'RESTORE FILELISTONLY FROM DISK = "/var/opt/mssql/backup/wwi.bak"' \
   | tr -s ' ' | cut -d ' ' -f 1-2

execute the restore command

docker exec -it sql1 /opt/mssql-tools/bin/sqlcmd \
   -S localhost -U SA -P 'CockroachDB1!' \
   -Q 'RESTORE DATABASE WideWorldImporters FROM DISK = "/var/opt/mssql/backup/wwi.bak" WITH MOVE "WWI_Primary" TO "/var/opt/mssql/data/WideWorldImporters.mdf", MOVE "WWI_UserData" TO "/var/opt/mssql/data/WideWorldImporters_userdata.ndf", MOVE "WWI_Log" TO "/var/opt/mssql/data/WideWorldImporters.ldf", MOVE "WWI_InMemory_Data_1" TO "/var/opt/mssql/data/WideWorldImporters_InMemory_Data_1"'

verify the restored database

docker exec -it sql1 /opt/mssql-tools/bin/sqlcmd \
   -S localhost -U SA -P 'CockroachDB1!' \
   -Q 'SELECT Name FROM sys.Databases'

Access the SQL Server docker container from your host using mssql-cli

mssql-cli is a CLI utility used to connect to SQL Server. It is currently under heavy development and offers a more pleasant experience over sqlcmd, in my opinion. In the typical new Microsoft fashion, the github page for the project can be found here. It is also easier to install than sqlcmd on my Mac.

install mssql-cli tool on your host

sudo pip install mssql-cli

in my case the install fails due to package six being available on my system, so the workaround is

sudo pip install mssql-cli --ignore-installed six

connect to your SQL Server instance directly

mssql-cli -U SA -P 'CockroachDB1!' -S localhost,1433 -d WideWorldImporters

run a sample query

SELECT StockItemID, StockItemName FROM WideWorldImporters.Warehouse.StockItems WHERE StockItemID>=1

redirect output of a query to a file we're going to use for import into CockroachDB

mssql-cli -U SA -P 'CockroachDB1!' -S localhost,1433 -d WideWorldImporters -Q "SELECT * FROM WideWorldImporters.Warehouse.StockItemTransactions" -o StockItemTransactions.csv

Use BCP utility for bulk exporting

bcp is a bulk copy utility that is better served for exporting data out of SQL Server You can try installing sqlcmd and mssqlodbc locally or connect to the container and use bcp that comes bundled with SQL Server. In my experience, setting up bcp with brew on OSX was a challenge, and I opted for executing it inside a container instead.

docker exec -it sql1 bash
/opt/mssql-tools/bin/bcp WideWorldImporters.Warehouse.StockItemTransactions out /var/opt/mssql/backup/StockItemTransactionsBCP.csv -S localhost,1433 -U sa -P 'CockroachDB1!' -c -t',' -r'\n'

copy the file down to your host

docker cp sql1:/var/opt/mssql/backup/StockItemTransactionsBCP.csv .

describe StockItemTransactions table an equivalent of describe tablename in SQL Server is a stored procedure below

exec sp_columns StockItemTransactions

The output is lengthy so it's best to output it to a file

mssql-cli -U SA -P 'CockroachDB1!' -S localhost,1433 -d WideWorldImporters -Q "exec sp_columns StockItemTransactions" -o schema.sql
+--------------------+---------------+-----------------------+-------------------------+-------------+-------------+-------------+----------+---------+---------+------------+-----------+----------------------------------------------+-----------------+--------------------+---------------------+--------------------+---------------+----------------+
| TABLE_QUALIFIER    | TABLE_OWNER   | TABLE_NAME            | COLUMN_NAME             | DATA_TYPE   | TYPE_NAME   | PRECISION   | LENGTH   | SCALE   | RADIX   | NULLABLE   | REMARKS   | COLUMN_DEF                                   | SQL_DATA_TYPE   | SQL_DATETIME_SUB   | CHAR_OCTET_LENGTH   | ORDINAL_POSITION   | IS_NULLABLE   | SS_DATA_TYPE   |
|--------------------+---------------+-----------------------+-------------------------+-------------+-------------+-------------+----------+---------+---------+------------+-----------+----------------------------------------------+-----------------+--------------------+---------------------+--------------------+---------------+----------------|
| WideWorldImporters | Warehouse     | StockItemTransactions | StockItemTransactionID  | 4           | int         | 10          | 4        | 0       | 10      | 0          | NULL      | (NEXT VALUE FOR [Sequences].[TransactionID]) | 4               | NULL               | NULL                | 1                  | NO            | 56             |
| WideWorldImporters | Warehouse     | StockItemTransactions | StockItemID             | 4           | int         | 10          | 4        | 0       | 10      | 0          | NULL      | NULL                                         | 4               | NULL               | NULL                | 2                  | NO            | 56             |
| WideWorldImporters | Warehouse     | StockItemTransactions | TransactionTypeID       | 4           | int         | 10          | 4        | 0       | 10      | 0          | NULL      | NULL                                         | 4               | NULL               | NULL                | 3                  | NO            | 56             |
| WideWorldImporters | Warehouse     | StockItemTransactions | CustomerID              | 4           | int         | 10          | 4        | 0       | 10      | 1          | NULL      | NULL                                         | 4               | NULL               | NULL                | 4                  | YES           | 38             |
| WideWorldImporters | Warehouse     | StockItemTransactions | InvoiceID               | 4           | int         | 10          | 4        | 0       | 10      | 1          | NULL      | NULL                                         | 4               | NULL               | NULL                | 5                  | YES           | 38             |
| WideWorldImporters | Warehouse     | StockItemTransactions | SupplierID              | 4           | int         | 10          | 4        | 0       | 10      | 1          | NULL      | NULL                                         | 4               | NULL               | NULL                | 6                  | YES           | 38             |
| WideWorldImporters | Warehouse     | StockItemTransactions | PurchaseOrderID         | 4           | int         | 10          | 4        | 0       | 10      | 1          | NULL      | NULL                                         | 4               | NULL               | NULL                | 7                  | YES           | 38             |
| WideWorldImporters | Warehouse     | StockItemTransactions | TransactionOccurredWhen | -9          | datetime2   | 27          | 54       | NULL    | NULL    | 0          | NULL      | NULL                                         | -9              | NULL               | NULL                | 8                  | NO            | 0              |
| WideWorldImporters | Warehouse     | StockItemTransactions | Quantity                | 3           | decimal     | 18          | 20       | 3       | 10      | 0          | NULL      | NULL                                         | 3               | NULL               | NULL                | 9                  | NO            | 55             |
| WideWorldImporters | Warehouse     | StockItemTransactions | LastEditedBy            | 4           | int         | 10          | 4        | 0       | 10      | 0          | NULL      | NULL                                         | 4               | NULL               | NULL                | 10                 | NO            | 56             |
| WideWorldImporters | Warehouse     | StockItemTransactions | LastEditedWhen          | -9          | datetime2   | 27          | 54       | NULL    | NULL    | 0          | NULL      | (sysdatetime())                              | -9              | NULL               | NULL                | 11                 | NO            | 0              |
+--------------------+---------------+-----------------------+-------------------------+-------------+-------------+-------------+----------+---------+---------+------------+-----------+----------------------------------------------+-----------------+--------------------+---------------------+--------------------+---------------+----------------+
(11 rows affected)

Import data from SQL Server into CockroachDB

start a single node instance of Cockroach or use your own environment

cockroach start-single-node --insecure --host=localhost --port=26257 --background --external-io-dir $PWD

Note: Cockroach doesn't support schemas currently so we're going to place the table in default schema.

select * from stockitemtransactions where supplierid is not null;

Import the output of bcp from SQL Server into CockroachDB

connect to your instance of CockroachDB SQL Shell

cockroach sql --insecure

import the dataset

root@:26257/defaultdb> CREATE DATABASE IF NOT EXISTS WideWorldImporters;
USE WideWorldImporters;
DROP TABLE IF EXISTS StockItemTransactions;
IMPORT TABLE StockItemTransactions (
        StockItemTransactionID INT8 NOT NULL,
        StockItemID INT8 NOT NULL,
        TransactionTypeID INT8 NOT NULL,
        CustomerID STRING NULL, --INT8
        InvoiceID STRING NULL, --INT8
        SupplierID STRING NULL, --INT8
        PurchaseOrderID STRING NULL, --INT8
        TransactionOccurredWhen TIMESTAMP NOT NULL,
        Quantity DECIMAL NOT NULL,
        LastEditedBy INT8 NOT NULL,
        LastEditedWhen TIMESTAMP NOT NULL
)
CSV DATA ('nodelocal:///StockItemTransactionsBCP.csv');


CREATE DATABASE

Time: 572µs

SET

Time: 196µs

DROP TABLE

Time: 76.272ms

        job_id       |  status   | fraction_completed |  rows  | index_entries | system_records |  bytes
+--------------------+-----------+--------------------+--------+---------------+----------------+----------+
  509222924675055617 | succeeded |                  1 | 236667 |             0 |              0 | 15048500
(1 row)

Time: 443.022ms

One thing you should know is that some features are still maturing in Cockroach and as I highlighted in my tutorial, schemas are not available and data type conversions need more work. I opted using STRING data type for CustomerID, InvoiceID, SupplierID and PurchaseOrderID to complete the tutorial but there should be a better way.

I will be working further on getting closer to a 1:1 conversion. Until then, hope this is a good first start.

Comments

Popular posts from this blog

Vista Vulnerability Report mentions Ubuntu 6.06 LTS

Running CockroachDB with Docker Compose and Minio, Part 2

VirtualBox options to start VM in Normal, Detached and Headless Modes