Posts

Showing posts from December, 2019

Running CockroachDB with Docker Compose and Minio, Part 2

This is my second post on creating a multi-service architecture with docker-compose. This is meant to be a learning exercise and typically docker-compose is used to set up a local development environment rather than a production-ready set up. I regularly, find myself building these environments to reproduce customer bugs. For a production-specific application, refer to your platform vendor documentation. At some later time, I will cover Kubernetes deployments that can be used as a stepping stone for a real-world application. Until then, let's focus on the task at hand. We're building a microservice architecture with CockroachDB writing changes in real-time to an S3 bucket in JSON format. S3 bucket is served by a service called Minio. It can act like an S3 appliance on premise or serve as a local gateway to your cloud storage. Let's dig in:

Running CockroachDB with Docker Compose, Part 1

Over the next few weeks, I will be publishing a series of tutorials on CockroachDB and various third-party tools to demonstrate how easy it is to integrate CockroachDB into your daily work. Today, we're covering docker-compose and single-node Cockroach cluster as that will be a foundation for the next blog post.

Loading thousands of tables in parallel with Ray into CockroachDB because Why Not?

Image
I came across an interesting scenario working with one of our customers. They are using a common data integration tool to load hundreds of tables into CockroachDB simultaneously. They reported an issue that their loads fail intermittently due to an unrecognized error. As a debug exercise I set out to write a script to import data from an http endpoint into CRDB in parallel. Disclosure: I do not claim to be an expert in CRDB, Python or anything else for that matter. This is an exercise in answering a why not? question more so than anything educational. I wrote a Python script to execute an import job and need to make sure it executes in parallel to achieve the concurrency scenario I've originally set out to do. I'm new to Python multiprocessing and a short Google search returned a couple of options. Using built-in multiprocess, asyncio module and using Ray. Advantage to using multiprocess and asyncio is that they're built-in Python modules. Since I was rushing thr

Import a table from SQL Server into CockroachDB

This is a quick tutorial on exporting data out of SQL Server into CockroachDB. This is meant to be a learning exercise only and not meant for production deployment. I welcome any feedback to improve the process further. The fastest way to get started with SQL Server is via available Docker containers. I’m using the following tutorial to deploy SQL Server on Ubuntu from my Mac.  My SQL Server-Fu is a bit rusty and I opted for following this tutorial to restore WideWordImporters sample database into my Docker container. You may also need SQL Server tools installed on your host and you may find direction for Mac OS and Linux at the following site , users of Windows are quite familiar with download location for their OS. I also used the following directions to install SQL Server tools on my Mac but ran into compatibility issues with the drivers in my Docker container. This will be a debug session for another day. I will be working further on getting closer to a 1:1 conv

Using CockroachDB IMPORT with local storage

When doing import/export from CockroachDB there are multiple storage options available. One option that is less understood is how to do a local import. Based on the conversation I had with engineering in our brand new community slack , there are several options available. 1. On a single-node cluster , using --external-io-dir option for your import and backup directory. 2. On a multi-node cluster , you may copy your import datasets to every node's extern directory followed by an IMPORT. 3. The last option is to spin up a local webserver and make your files network-accessible. Cockroach Labs is working on a more permanent solution when it comes to nodelocal. We acknowledge the current solution is not perfect and intend to make it less painful. Demo data generated with Mockaroo .