Identify failing machines and missing clients for Oozie workflows
If you ever come across an issue where your Oozie workflow works on most nodes but fails on others, a neat trick to identify a missing client would be to put the following in your shell action script.
echo "I AM: `whoami`"
echo "Running On: `hostname`"
echo "CWD: `pwd`"
echo "Can I see these clients? Hive = `which hive`, Sqoop = `which sqoop`"
echo "`ls -l /usr/hdp/current/sqoop-client/ | head -n 5`"
in my case, I was looking for what user was executing the shell action, in this case it was “yarn” user and it was running on a hostname identified by the hostname command. From then on, it was an easy fix to install the missing client on that host and workflow worked again. Feel free to add other commands to the script to fit your use case. I was testing whether sqoop and hive clients were installed, I also have a check whether /usr/hdp/current/sqoop-client directory exists.
I have an example shell action on my github.