Posts

Showing posts from January, 2020

Import Hadoop HDFS data into CockroachDB

Today we're going to take a slight detour from docker compose and evaluate ingestion of data from Hadoop into Cockroach. One word of caution, this is being tested on an unsecured cluster with very small volume of data. Always test your own set up before taking public articles for face value! CockroachDB can natively import data from HTTP endpoints, object storage with respective APIs and local/NFS mounts. The full list of supported schemes can be found here . It does not support HDFS file scheme and we're left to our wild imagination to find alternatives. As previously discussed, Hadoop community is working on Hadoop Ozone, a native scalable object store with S3 API compatibility. For reference, here's my article demonstrating CockroachDB and Ozone integration. The limitation here is that you need to run Hadoop 3 to get access to it. What if you're on Hadoop 2? There are several choices I can think of off the top of my head. One approach is to

CockroachDB CDC using Hadoop Ozone S3 Gateway as cloud storage sink, Part 4

Today, we're going to evaluate Hadoop Ozone object store for CockroachDB object store sink viability. A bit of caution, this article only explores the art of possible, please use the ideas in this article at your own risk! Firstly, Hadoop Ozone is a new object store Hadoop Community is working on. It exposes an S3 API backed by HDFS and can scale to billions of files on prem! This article only scratches the surface, for everything there is to learn about Hadoop and Ozone , navigate to their respective websites.

CockroachDB CDC using Minio as cloud storage sink, Part 3

Today, we’re going to explore CDC capability in CockroachDB Enterprise Edition using Minio object store as sink. To achieve this, we’re going to reuse the compose file from the first two tutorials and finally bring this to a close. Without further ado