Wednesday, June 3, 2020

Three-headed dog meet cockroach, part 2: CockroachDB with Active Directory

Today, I am going to discuss CockroachDB integration with Active Directory. AD is the commercial brother of Kerberos maintained by Microsoft. AD is a de facto authentication standard across large enterprises and our customers expect products calling themselves enterprise to work seamlessly with Active Directory. Hence, here's my write-up with end to end steps to deploy a lab environment to try on your own.

Monday, May 4, 2020

Three-headed dog meet cockroach: CockroachDB with MIT Kerberos

CockroachDB is a cloud native distributed database that works across various cloud, hybrid and on premise environments. The flexibility of deployments demand varying degrees of security protocols. Most of the time, on premise customers won't accept anything less than Kerberos for their system of record authentication mechanisms. In my Hadoop time, that was the bare minimum requirement to play. CockroachDB today supports Kerberos via GSSAPI for authentication. In this post, I'm going to walk you through setting up Kerberos for CockroachDB and provide a sort of cheat sheet, to make this process more seamless. I'm using a single Centos VM provisioned with Vagrant. It servers as my KDC as well as my CockroachDB instance. On to the setup. The following documents may assist in the entire process: CockroachDB GSSAPI, how to install CockroachDB and configuring CockroachDB for secure access. I recorded the entire process with Asciinema, I split the screencast into two parts.

Part 1


Part 2


Monday, April 27, 2020

Secure CockroachDB with Custom Common Name

CockroachDB out of the box comes with ability to generate certificates with cockroach cert command. This command will provision certs for client and nodes. One common gap we get from our customers is the explicit reliance on CN=node and CN=root. In our latest development release, we're introducing ability to map root and node principals to custom CNs. The process bypasses cockroach cert command in favor of openssl utility. It is very well documented and I recorded a live walk-through of the entire process. I am including my openssl configuration files for convenience:

ca.cnf


# OpenSSL CA configuration file
[ ca ]
default_ca = CA_default

[ CA_default ]
default_days = 365
database = index.txt
serial = serial.txt
default_md = sha256
copy_extensions = copy
unique_subject = no

# Used to create the CA certificate.
[ req ]
prompt=no
distinguished_name = distinguished_name
x509_extensions = extensions

[ distinguished_name ]
organizationName = Example Inc
commonName = Example Inc CA

[ extensions ]
keyUsage = critical,digitalSignature,nonRepudiation,keyEncipherment,keyCertSign
basicConstraints = critical,CA:true,pathlen:1

# Common policy for nodes and users.
[ signing_policy ]
organizationName = supplied
commonName = optional

# Used to sign node certificates.
[ signing_node_req ]
keyUsage = critical,digitalSignature,keyEncipherment
extendedKeyUsage = serverAuth,clientAuth

# Used to sign client certificates.
[ signing_client_req ]
keyUsage = critical,digitalSignature,keyEncipherment
extendedKeyUsage = clientAuth

node.cnf


# OpenSSL node configuration file
[ req ]
prompt=no
distinguished_name = distinguished_name
req_extensions = extensions

[ distinguished_name ]
organizationName = Example Inc

[ extensions ]
subjectAltName = critical,DNS:localhost,DNS:node.example.com,IP:0.0.0.0

client.cnf for root


[ req ]
prompt=no
distinguished_name = distinguished_name
req_extensions = extensions

[ distinguished_name ]
organizationName = Example Inc 
commonName = john

[ extensions ]
subjectAltName = email:john.smith@example.com,DNS:user.example.com,DNS:root

client.cnf for additional users


[ req ]
prompt=no
distinguished_name = distinguished_name
req_extensions = extensions

[ distinguished_name ]
organizationName = Example Inc 
commonName = user2@example.com

[ extensions ]
subjectAltName = email:user2@example.com,DNS:user2.example.com,DNS:user2

What is insecure may never break: CockroachDB insecure cluster take over

I came across an interesting scenario last week. A customer had asked whether it is possible to secure a previously insecure cluster. The short answer is yes. Now, Cockroach Labs does not recommend running an insecure cluster in production. There are only a few additional steps necessary to secure an instance, so why do it? Convenience, you say. It can hurt you down the line but fret not, this article will demonstrate how to fix this. We are going to follow the standard insecure cluster start up procedure. Once complete, we're going to flip to the documentation for a secure cluster to turn each node on with security enabled. Here's a handy video of the procedure in action: I also included the step by step instructions below:

Friday, April 24, 2020

New tricks in CockroachDB: Sql statement redirection from an external file

CockroachDB is the SQL database for building global, scalable cloud services that survive disasters. It is designed to be a Postgresql compatible database with distributed roots. The Postgresql compatibility is being built from scratch in Go. That said, product is undergoing fast pace of innovation and development. One convenience method of passing sql statements from an external file to CLI has been missing. It is not available in the current GA release and I opened a GitHub issue to implement it. In lieu of available convenience option, this post will discuss other methods of achieving the same.

Scenario


I have a table called population with the following schema:

CREATE TABLE population ( state CHAR(2) NULL, city STRING NULL, population INT8 NULL, FAMILY "primary" (state, city, population, rowid) )

The associated CSV file has the following structure:

IN,Evansville,7415326
TX,Fort Worth,9000748
IN,South Bend,2590270
KS,Shawnee Mission,9444064
GA,Macon,9804704
WA,Seattle,7928277

The associated IMPORT command:

IMPORT TABLE population ( state char(2), city TEXT, population INT8 ) CSV DATA ('https://api.mockaroo.com/api/25164a90?count=1000&key=02f7f490');

The associated IMPORT INTO command:

IMPORT INTO population (state, city, population) CSV DATA ('https://api.mockaroo.com/api/25164a90?count=5000&key=02f7f490');

The big difference between IMPORT and IMPORT INTO is the former will create the table for you and the latter expects the table to exist. That said, you can repeatedly issue IMPORT INTO command.

CockroachDB has ability to import CSV from cloud storage, http(s) endpoint, local filesystem, etc. You can read more about IMPORT command, here and the associated IMPORT INTO command, here.

Method 1a: Unix file redirection


Suppose I want to pass an import statement from a file. With Unix file redirection the following can be done with command below:

cockroach sql -e="$(<import_into_statement.sql)" --url 'postgresql://'

-e is functionally equivalent to --execute. This is pretty convoluted and can be further simplified by the next approach.

Method 1b: Unix file redirection


cockroach sql < import_into_statement.sql --url 'postgresql://'

Method 2: Using psql


Given CockroachDB is Postgresql wire compatible, you can use psql utility to achieve the same. I'm using Mac OSX and psql can be easily installed with:

brew install libpq
brew link --force libpq


Once installed, import from an external file can be easily done with built-in -f and --file= flags:

psql -f import_into_statement.sql 'postgresql://root@localhost:26257/defaultdb?sslcert=certs%2Fclient.root.crt&sslkey=certs%2Fclient.root.key&sslmode=verify-full&sslrootcert=certs%2Fca.crt'

Additional file redirection tricks are illustrated in our docs.

Wednesday, March 18, 2020

Exploring CockroachDB with ipython-sql aka sqlmagic and Jupyter Notebook


Today, I will demonstrate how ipython-sql can be leveraged in querying CockroachDB. This will require a secure instance of CockroachDB
for the reasons I will explain below. Running a secure docker-compose instance of CRDB is beyond the scope of this tutorial. Instead,
I will publish everything you need to get through the tutorial in my repo, including the Jupyter Notebook. You may also use CRDB
docs to stand up a secure instance and change the url in the notebook to follow along.

Wednesday, March 11, 2020

Exploring CockroachDB with Jupyter Notebook and Microsoft PowerShell


Today, we're going to venture out into the world of .Net through a scripting language out of Microsoft called PowerShell.
My familiarity with .Net is quite minimal but I do have an extensive background in PowerShell scripting,
albeit going years back. Pardon me for being a bit rusty. I've always loved PowerShell when I was working on the
Microsoft platform, it allows for interactive and object oriented approach working with databases. Scripting
admin tasks for DBAs on Windows was always a challenge for me until PowerShell came into the picture. I had
to maintain many database servers and PowerShell became my best friend. Today, I will show you how
PowerShell can become your best friend working with CockroachDB!

Note: The title is a bit misleading as you will see this tutorial is more about exploring PowerShell from the console
rather than Jupyter Notebook but I do make my best effort to emphasize what does and does not work today in
Jupyter when it comes to PowerShell and CockroachDB. I've burned many hours trying to find a workaround but I
was not able to make the Postgres driver for .Net work with Jupyter Notebook.

Monday, February 24, 2020

Exploring CockroachDB with Jupyter Notebook and R


Today, we're going to explore CockroachDB from the Data Science perspective again. We will continue to use Jupyter notebook but instead of Python, we're going to use the R language. I was inspired to write this post based on an article written by my colleague. I will build on that article by introducing Jupyter Notebook to the mix.

Tuesday, February 11, 2020

Exploring CockroachDB with Jupyter Notebook and Python

Today, we're going to explore CockroachDB from the Data Science perspective, using a popular exploratory web tool called Jupyter Notebook. I was inspired to write this post based on this article. The article goes over using Jupyter with Oracle, MySql and Postgresql, we're going to do the same with Cockroach! One caveat here is the heavy reliance on ipython-sql library. We're going to use Pandas library as the ipython-sql magic functions are not compatible with Cockroach today. Hopefully you will find it useful.




Thursday, January 16, 2020

Import Hadoop HDFS data into CockroachDB

Today we're going to take a slight detour from docker compose and evaluate ingestion of data from
Hadoop into Cockroach. One word of caution, this is being tested on an unsecured cluster with very
small volume of data. Always test your own set up before taking public articles for face value!
CockroachDB can natively import data from HTTP endpoints, object storage with respective APIs
and local/NFS mounts. The full list of supported schemes can be found here.
It does not support HDFS file scheme and we're left to our wild imagination to find alternatives.
As previously discussed, Hadoop community is working on Hadoop Ozone, a native scalable object
store with S3 API compatibility. For reference, here's my article demonstrating CockroachDB and
Ozone integration. The limitation here is that you need to run Hadoop 3 to get access to it. What if
you're on Hadoop 2? There are several choices I can think of off the top of my head. One approach
is to expose webhdfs and IMPORT using http endpoint. The second option is to leverage
previously discussed Minio to expose HDFS via HTTP or S3. Today, we're going to look at both approaches.


Tuesday, January 7, 2020

CockroachDB CDC using Hadoop Ozone S3 Gateway as cloud storage sink, Part 4

Today, we're going to evaluate Hadoop Ozone object store for CockroachDB object store
sink viability. A bit of caution, this article only explores the art of possible, please use the
ideas in this article at your own risk! Firstly, Hadoop Ozone is a new object store Hadoop
Community is working on. It exposes an S3 API backed by HDFS and can scale to billions of
files on prem!

This article only scratches the surface, for everything there is to learn about Hadoop and
Ozone, navigate to their respective websites.



Friday, January 3, 2020

CockroachDB CDC using Minio as cloud storage sink, Part 3


Today, we’re going to explore CDC capability in CockroachDB Enterprise Edition using Minio object store as sink. To achieve this, we’re going to reuse the compose file from the first two tutorials and finally bring this to a close. Without further ado