Wednesday, March 18, 2020

Exploring CockroachDB with ipython-sql aka sqlmagic and Jupyter Notebook

Today, I will demonstrate how ipython-sql can be leveraged in querying CockroachDB. This will require a secure instance of CockroachDB
for the reasons I will explain below. Running a secure docker-compose instance of CRDB is beyond the scope of this tutorial. Instead,
I will publish everything you need to get through the tutorial in my repo, including the Jupyter Notebook. You may also use CRDB
docs to stand up a secure instance and change the url in the notebook to follow along.

Wednesday, March 11, 2020

Exploring CockroachDB with Jupyter Notebook and Microsoft PowerShell

Today, we're going to venture out into the world of .Net through a scripting language out of Microsoft called PowerShell.
My familiarity with .Net is quite minimal but I do have an extensive background in PowerShell scripting,
albeit going years back. Pardon me for being a bit rusty. I've always loved PowerShell when I was working on the
Microsoft platform, it allows for interactive and object oriented approach working with databases. Scripting
admin tasks for DBAs on Windows was always a challenge for me until PowerShell came into the picture. I had
to maintain many database servers and PowerShell became my best friend. Today, I will show you how
PowerShell can become your best friend working with CockroachDB!

Note: The title is a bit misleading as you will see this tutorial is more about exploring PowerShell from the console
rather than Jupyter Notebook but I do make my best effort to emphasize what does and does not work today in
Jupyter when it comes to PowerShell and CockroachDB. I've burned many hours trying to find a workaround but I
was not able to make the Postgres driver for .Net work with Jupyter Notebook.

Monday, February 24, 2020

Exploring CockroachDB with Jupyter Notebook and R

Today, we're going to explore CockroachDB from the Data Science perspective again. We will continue to use Jupyter notebook but instead of Python, we're going to use the R language. I was inspired to write this post based on an article written by my colleague. I will build on that article by introducing Jupyter Notebook to the mix.

Tuesday, February 11, 2020

Exploring CockroachDB with Jupyter Notebook and Python

Today, we're going to explore CockroachDB from the Data Science perspective, using a popular exploratory web tool called Jupyter Notebook. I was inspired to write this post based on this article. The article goes over using Jupyter with Oracle, MySql and Postgresql, we're going to do the same with Cockroach! One caveat here is the heavy reliance on ipython-sql library. We're going to use Pandas library as the ipython-sql magic functions are not compatible with Cockroach today. Hopefully you will find it useful.

Thursday, January 16, 2020

Import Hadoop HDFS data into CockroachDB

Today we're going to take a slight detour from docker compose and evaluate ingestion of data from
Hadoop into Cockroach. One word of caution, this is being tested on an unsecured cluster with very
small volume of data. Always test your own set up before taking public articles for face value!
CockroachDB can natively import data from HTTP endpoints, object storage with respective APIs
and local/NFS mounts. The full list of supported schemes can be found here.
It does not support HDFS file scheme and we're left to our wild imagination to find alternatives.
As previously discussed, Hadoop community is working on Hadoop Ozone, a native scalable object
store with S3 API compatibility. For reference, here's my article demonstrating CockroachDB and
Ozone integration. The limitation here is that you need to run Hadoop 3 to get access to it. What if
you're on Hadoop 2? There are several choices I can think of off the top of my head. One approach
is to expose webhdfs and IMPORT using http endpoint. The second option is to leverage
previously discussed Minio to expose HDFS via HTTP or S3. Today, we're going to look at both approaches.

Tuesday, January 7, 2020

CockroachDB CDC using Hadoop Ozone S3 Gateway as cloud storage sink, Part 4

Today, we're going to evaluate Hadoop Ozone object store for CockroachDB object store
sink viability. A bit of caution, this article only explores the art of possible, please use the
ideas in this article at your own risk! Firstly, Hadoop Ozone is a new object store Hadoop
Community is working on. It exposes an S3 API backed by HDFS and can scale to billions of
files on prem!

This article only scratches the surface, for everything there is to learn about Hadoop and
Ozone, navigate to their respective websites.

Friday, January 3, 2020

CockroachDB CDC using Minio as cloud storage sink, Part 3

Today, we’re going to explore CDC capability in CockroachDB Enterprise Edition using Minio object store as sink. To achieve this, we’re going to reuse the compose file from the first two tutorials and finally bring this to a close. Without further ado