Tuesday, December 24, 2019

Running CockroachDB with Docker Compose and Minio, Part 2

This is my second post on creating a multi-service architecture with docker-compose. This is meant to be a learning exercise and typically docker-compose is used to set up a local development environment rather than a production-ready set up. I regularly, find myself building these environments to reproduce customer bugs. For a production-specific application, refer to your platform vendor documentation. At some later time, I will cover Kubernetes deployments that can be used as a stepping stone for a real-world application. Until then, let's focus on the task at hand. We're building a microservice architecture with CockroachDB writing changes in real-time to an S3 bucket in JSON format. S3 bucket is served by a service called Minio. It can act like an S3 appliance on premise or serve as a local gateway to your cloud storage. Let's dig in:

Friday, December 20, 2019

Running CockroachDB with Docker Compose, Part 1

Over the next few weeks, I will be publishing a series of tutorials on CockroachDB and various third-party tools to demonstrate how easy it is to integrate CockroachDB into your daily work. Today, we're covering docker-compose and single-node Cockroach cluster as that will be a foundation for the next blog post.

Friday, December 6, 2019

Loading thousands of tables in parallel with Ray into CockroachDB because Why Not?

I came across an interesting scenario working with one of our customers.
They are using a common data integration tool to load hundreds of tables into CockroachDB
simultaneously. They reported an issue that their loads fail intermittently due to an unrecognized error.
As a debug exercise I set out to write a script to import data from an http endpoint into CRDB in parallel.
Disclosure: I do not claim to be an expert in CRDB, Python or anything else for that matter.
This is an exercise in answering a why not? question more so than anything educational.
I wrote a Python script to execute an import job and need to make sure it executes in parallel
to achieve the concurrency scenario I've originally set out to do. I'm new to Python
multiprocessing and a short Google search returned a couple of options. Using built-in multiprocess,
asyncio module and using Ray. Advantage to using multiprocess and asyncio is that they're built-in
Python modules. Since I was rushing through my task, I could not get multiprocess to work on my tight
schedule and checked out Ray. Following a quick start guide I was able to make it work with little to no fuss.

Lessons learned:
1. loading 1000 tables is not a biggie on a local 3 node cluster. It does require starting CRDB with
--max-sql-memory=.25 per node but otherwise it was chugging along. 
2. Ray is way cool, albeit requires a pip install and I will be looking at it further.
3. Cockroach is awesome as it's able to keep up with sheer volume and velocity of the data even
on a single machine. Again, this is not a big data problem and just an exercise in whether it work?
4. Cockroach has room for improvement when doing bulk import, as one of our discussions, I suggested
  to add IF NOT EXISTS syntax to the IMPORT command.

Additonally, CRDB Admin UI comes in handy when bulk loading as you can monitor status of your imports
through the JOBS page.

Main page 

Job details

Failed jobs filter 

Failed job details 

Wednesday, December 4, 2019

Import a table from SQL Server into CockroachDB

This is a quick tutorial on exporting data out of SQL Server into CockroachDB. This is meant to be
a learning exercise only and not meant for production deployment. I welcome any feedback to
improve the process further. The fastest way to get started with SQL Server is via available
Docker containers. I’m using the following tutorial to deploy SQL Server on Ubuntu from my Mac. 
My SQL Server-Fu is a bit rusty and I opted for following this tutorial to restore WideWordImporters
sample database into my Docker container. You may also need SQL Server tools installed on your
host and you may find direction for Mac OS and Linux at the following site, users of Windows are quite
familiar with download location for their OS. I also used the following directions to install SQL Server
tools on my Mac but ran into compatibility issues with the drivers in my Docker container. This will be a
debug session for another day.

I will be working further on getting closer to a 1:1 conversion. Until then, hope this is a good first start.

Monday, December 2, 2019

Using CockroachDB IMPORT with local storage

When doing import/export from CockroachDB there are multiple storage options available.
One option that is less understood is how to do a local import. Based on the conversation
I had with engineering in our brand new community slack, there are several options available.

1. On a single-node cluster, using --external-io-dir option for your import and backup directory.
2. On a multi-node cluster, you may copy your import datasets to every node's extern directory
followed by an IMPORT.
3. The last option is to spin up a local webserver and make your files network-accessible.

Cockroach Labs is working on a more permanent solution when it comes to nodelocal. We
acknowledge the current solution is not perfect and intend to make it less painful.

Demo data generated with Mockaroo.

Friday, November 29, 2019

How to use CockroachDB Community Edition with Django and Multipass on OSX

Over the break, I wrote a short tutorial on using CockroachDB with Django. This is a slight derivative of
that post using Multipass and community edition of CockroachDB. It means it does not require a license
and is missing RBAC features but for the purposes of the tutorial, all features of CockroachDB will be

Since the original tutorial was based on Ubuntu, I decided to use Multipass as it allows me to spin up
Ubuntu instances quickly on my Mac. Binaries for multipass are available for Windows and Linux as well.
Granted same can be achieved with a virtual machine, docker container or a Vagrant environment,
I chose to give this a spin as I was not familiar with the tool.

Documentation on installing cockroach can be found at the following link. The video below depicts
the process in two steps but once you're inside multipass shell, you can follow the instructions as
advertised and download the binary as well as uncompressing it in one step.

For the purposes of brevity, I will only highlight the new commands, for everything else, the original
tutorial and my previous tutorial can be followed as is.

How to Use CockroachDB with your Django Application

This is a short tutorial on using Django with CockroachDB. This tutorial is intended to be a quick ramp

up on CockroachDB with Django, in case you're searching for a proper Django tutorial, this is not it.
At the time of writing, django-cockroachdb library is available in two versions, (2 and 3). This post
focuses on version 2 specifically. This tutorial is based on the Digital Ocean tutorial using Django with
PostgreSQL. I am going to highlight the steps where this tutorial differs from the original. For everything
else, we will assume the tutorial is followed as is. Since we’re going to need RBAC features of
CockroachDB, we will require an enterprise license, feel free to request a trial or try this tutorial with
cockroach demo environment where enterprise features are enabled for 60 minutes, which is plenty to
complete this tutorial.