Quick light-weight sandbox environment with Apache Bigtop

I'm a long-time user of Apache Bigtop. My experience with Hadoop and Bigtop predates Ambari. I started using Bigtop with version 0.3. I remember pulling bigtop.repo file and install Hadoop, Pig and Hive for some quick development. Bigtop makes it convenient and easy. Bigtop has matured since then and there are now multiple ways of deployment. There's still a way to pull repo and install manually but there's better ways now with Vagrant and Docker. I won't rehash how to deploy Bigtop using Docker as it was beautifly described here. Admittedly, I'm running it on Mac and was not able to provision a cluster using Docker. I did not try with non-OSX. This post is about Vagrant. Let's get started:

Install VirtualBox and Vagrant

Download 1.1.0 release

wget http://www.apache.org/dist/bigtop/bigtop-1.1.0/bigtop-1.1.0-project.tar.gz

uncompress the tarball

tar -xvzf bigtop-1.1.0-project.tar.gz

change directory to bigtop-1.1.0/bigtop-deploy/vm/vagrant-puppet-vm

cd bigtop-1.1.0/bigtop-deploy/vm/vagrant-puppet-vm

here you can review the README but to keep it short you can edit the vagrantconfig.yaml for any additional customization like changing VM memory, OS, number of CPUs, components (e.g. hadoop, spark, tez, hama, solr) etc and also number of VMs you'd like to provision. This last part is the killer feature, you can provision a Sandbox with multiple nodes, not a single VM. Same is true with Docker provisioner but I can't confirm that for you. Feel free to read the README in bigtop-1.1.0/bigtop-deploy/vm/vagrant-puppet-docker for that approach.

then you can start provisioning your custom sandbox with

vagrant up

wait 5-10min and then you can use standard Vagrant commands to interact with your custom Sandbox.

vagrant ssh bigtop1

now just create your local user and off you go

sudo -u hdfs hdfs dfs -mkdir /user/vagrant
sudo -u hdfs hdfs dfs -chown -R vagrant:hdfs /user/vagrant

for your convenience, add the bigtop machine(s) to /etc/hosts

Now, you're probably wondering why would I use Bigtop over regular sandbox? Well, Sandbox has been getting pretty resource heavy and has a lot of components. I like to provision a small cluster with just a few components like hadoop, spark, yarn and pig. Bigtop makes this possible and runs easily within a memory strapped VM. One downside is that with the latest release, Spark is at 1.5.0 and Hortonworks Sandbox is at 1.6.0, story is the same with other components. There are version gaps and if you can look past it, you have a quick way to prototype without much fuss! This is by no means meant to steal thunder from an excellent Ambari quick start guide, this is meant to demonstrate yet another approach from a rich ecosystem of Hadoop tools.


Popular posts from this blog

Vista Vulnerability Report mentions Ubuntu 6.06 LTS

Running CockroachDB with Docker Compose and Minio, Part 2

Doing "print screen" on a Mac is a pain in the ass