Showing posts from October, 2015

running HCatalog commands within Pig scripts in Oozie

Apache Pig introduced a new option of executing HCatalog commands inside Pig scripts and Grunt shell. For details to get started with it, take a look at my previous post . When you try to execute a pig script with HCatalog command in Oozie, you will actually get the same error, irrelevant whether you changed file as described in the previous article. The error is below: ERROR 2997: Encountered IOException. hcat.bin is not defined. Define it to be your hcat script (Usually $HCAT_HOME/bin/hcat hcat.bin is not defined. Define it to be your hcat script (Usually $HCAT_HOME/bin/hcat at at at at at

Adding WASB blob as HDFS replacement in Hortonworks HDP 2.3.2

DISCLAIMER: it was tested on HDP 2.3.2 only. There are two blocking JIRAs preventing usage of blob storage as primary filesystem on HDP 2.3.0. For HBase, you need to use page blob instead of block blob. First things first, install Azure CLI for Mac or use Azure portal. The steps below are for CLI. azure login enter username enter password azure storage account create storageaccountname --type LRS azure storage account keys list storageaccountname note the account keys, you will need them in the next step azure storage container create storagecontainername --account-name storageaccountname --account-key accountkeystring just to validate it was created azure storage blob list storagecontainernae --account-name storageaccountname --account-key  Once the previous steps have been completed, go to Ambari UI and edit the core-site.xml In addition to these properties, you need to replace fs.defaultFS property with the wasb path.

fix for error "hcat.bin is not defined. Define it to be your hcat script" on HDP 2.3.0 and 2.3.2

If you're running HDP 2.3.0 or 2.3.2 and you're eager to try calling HCatalog commands in your Pig scripts there is a gotcha that you need to be aware of. Apache Pig recently introduced an option of calling HCatalog and Hive commands within Pig. For example, assume we have a file called file.pig. Where file.pig is a regular pig script but it contains the following statement: sql show tables; This will actually work in Sandbox and display the existing tables. You can follow that with your typical Pig commands. However, If you your vanilla cluster or Sandbox is not modified with changes below, you will get the following error: Pig Stack Trace --------------- ERROR 2997: Encountered IOException. /usr/local/hcat/bin/hcat does not exist. Please check your 'hcat.bin' setting in /usr/local/hcat/bin/hcat does not exist. Please check your 'hcat.bin' setting in at