Posts

Showing posts from 2019

Running CockroachDB with Docker Compose and Minio, Part 2

This is my second post on creating a multi-service architecture with docker-compose. This is meant to be a learning exercise and typically docker-compose is used to set up a local development environment rather than a production-ready set up. I regularly, find myself building these environments to reproduce customer bugs. For a production-specific application, refer to your platform vendor documentation. At some later time, I will cover Kubernetes deployments that can be used as a stepping stone for a real-world application. Until then, let's focus on the task at hand. We're building a microservice architecture with CockroachDB writing changes in real-time to an S3 bucket in JSON format. S3 bucket is served by a service called Minio. It can act like an S3 appliance on premise or serve as a local gateway to your cloud storage. Let's dig in:

Running CockroachDB with Docker Compose, Part 1

Over the next few weeks, I will be publishing a series of tutorials on CockroachDB and various third-party tools to demonstrate how easy it is to integrate CockroachDB into your daily work. Today, we're covering docker-compose and single-node Cockroach cluster as that will be a foundation for the next blog post.

Loading thousands of tables in parallel with Ray into CockroachDB because Why Not?

Image
I came across an interesting scenario working with one of our customers. They are using a common data integration tool to load hundreds of tables into CockroachDB simultaneously. They reported an issue that their loads fail intermittently due to an unrecognized error. As a debug exercise I set out to write a script to import data from an http endpoint into CRDB in parallel. Disclosure: I do not claim to be an expert in CRDB, Python or anything else for that matter. This is an exercise in answering a why not? question more so than anything educational. I wrote a Python script to execute an import job and need to make sure it executes in parallel to achieve the concurrency scenario I've originally set out to do. I'm new to Python multiprocessing and a short Google search returned a couple of options. Using built-in multiprocess, asyncio module and using Ray. Advantage to using multiprocess and asyncio is that they're built-in Python modules. Since I was rushing thr...

Import a table from SQL Server into CockroachDB

This is a quick tutorial on exporting data out of SQL Server into CockroachDB. This is meant to be a learning exercise only and not meant for production deployment. I welcome any feedback to improve the process further. The fastest way to get started with SQL Server is via available Docker containers. I’m using the following tutorial to deploy SQL Server on Ubuntu from my Mac.  My SQL Server-Fu is a bit rusty and I opted for following this tutorial to restore WideWordImporters sample database into my Docker container. You may also need SQL Server tools installed on your host and you may find direction for Mac OS and Linux at the following site , users of Windows are quite familiar with download location for their OS. I also used the following directions to install SQL Server tools on my Mac but ran into compatibility issues with the drivers in my Docker container. This will be a debug session for another day. I will be working further on getting closer to a 1:1 ...

Using CockroachDB IMPORT with local storage

When doing import/export from CockroachDB there are multiple storage options available. One option that is less understood is how to do a local import. Based on the conversation I had with engineering in our brand new community slack , there are several options available. 1. On a single-node cluster , using --external-io-dir option for your import and backup directory. 2. On a multi-node cluster , you may copy your import datasets to every node's extern directory followed by an IMPORT. 3. The last option is to spin up a local webserver and make your files network-accessible. Cockroach Labs is working on a more permanent solution when it comes to nodelocal. We acknowledge the current solution is not perfect and intend to make it less painful. Demo data generated with Mockaroo .

How to use CockroachDB Community Edition with Django and Multipass on OSX

Over the break, I wrote a short tutorial on using CockroachDB with Django. This is a slight derivative of that post using Multipass and community edition of CockroachDB. It means it does not require a license and is missing RBAC features but for the purposes of the tutorial, all features of CockroachDB will be present. Since the original tutorial was based on Ubuntu, I decided to use Multipass as it allows me to spin up Ubuntu instances quickly on my Mac. Binaries for multipass are available for Windows and Linux as well. Granted same can be achieved with a virtual machine, docker container or a Vagrant environment, I chose to give this a spin as I was not familiar with the tool. Documentation on installing cockroach can be found at the following link . The video below depicts the process in two steps but once you're inside multipass shell, you can follow the instructions as advertised and download the binary as well as uncompressing it in one step. For the purpo...

How to Use CockroachDB with your Django Application

This is a short tutorial on using Django with CockroachDB. This tutorial is intended to be a quick ramp up on CockroachDB with Django, in case you're searching for a proper Django tutorial, this is not it. At the time of writing, django-cockroachdb library is available in two versions, (2 and 3) . This post focuses on version 2 specifically. This tutorial is based on the Digital Ocean tutorial using Django with PostgreSQL. I am going to highlight the steps where this tutorial differs from the original. For everything else, we will assume the tutorial is followed as is. Since we’re going to need RBAC features of CockroachDB, we will require an enterprise license, feel free to request a trial or try this tutorial with cockroach demo environment where enterprise features are enabled for 60 minutes, which is plenty to complete this tutorial.