Posts

Process 100 files evenly across 10 nodes

I was recently asked to write a script assuming there are 100 files and they're all equal-sized, how to process them evenly across 10 nodes. I went in completely wrong direction suggesting regex where my regex-fu is really not that strong. Thinking it over on a different occasion, I came up with a simple script that will do just that. Of course the simplest solution is the best. As usual, your comments are welcome. __author__ = ' artem ' nodes = {} files = [] for filenum in range ( 100 ): filename = ' file_00 ' + str (filenum) files.append(filename) for node in range ( 10 ): subset = files[ 0 : 10 ] nodes[node] = subset files[ 0 : 10 ] = [] for node in nodes: print (node, nodes[node]) and the output: 0 ['file_000', 'file_001', 'file_002', 'file_003', 'file_004', 'file_005', 'file_006', 'file_007', 'file_008', 'file_00...

Unfinished reading list

One great thing about subscription services like Safari Books Online is the choice of all great books available at the moment's notice. What I find happening over and over is that I would start reading a book and never finish. This post is a list of all books I started reading but yet to finish. Here it goes. ActiveMQ in Action    excellent book based on the first four chapters I read. I'm not giving up, just shifted in priorities. I intend to finish the book soon. I'm up to chapter 5. Hadoop: The Definitive Guide 2nd edition 3rd edition  4th edition     Hadoop bible, a must read for any self-respecting Hadoop engineer. I made a mistake reading this book first, jumping from Microsoft platform to Java and Big Data. I intend to pick it up again when 4th edition is released. This is a really an in-depth book and not recommended for first time users. After having some considerable time with Hadoop, this is the book to turn to. I read the chapters in no pa...

Book review: Learning Chef

   I am finding myself doing repetitive work once in a while and I've been thinking for a while how to automate the redundant parts of my job. I am familiar with Puppet, Ansible and SaltStack but I never played with Chef before nor Ruby before. I started reading "Learning Chef" and I was surprised to learn that Ruby is the primary language for working with Chef. The other interesting tidbit is that Chef uses Microsoft Powershell for automation on Windows. Onto the review...    The book is pretty easy to read and is intentionally beginner-friendly. The author right off the bat points out two more professional books for Chef, which I appreciated. Kudos go to the author for following a developer approach, referencing Stack Overflow and Ruby language creator's arguments for better programming practices and argument for scripting language such as Ruby vs. a compiled language like Java for automation. The 1st chapter is an introduction to automation principles and ho...

how do you handle multiple versions of Python in your environment?

So I decided to start using more Python and less shell for operations and I realized the code I write on my dev machine, be it in my Arch Linux VM or on my Mac, will determine the final output of the script. Meaning, the same script I write on one platform may not work on the other. This is very frustrating because the code I write on my Mac expects v. 2.7.8 of Python and Arch may as well have a different version and there are major changes between minor versions of Python. Going with the same analogy, same script will not work with my Red Hat 6 machine because that has v. 2.6 of Python, really frustrating. I then decided to only write Python3 code, to my dismay, I have to jump through hoops to install Python3 on my Red Hat boxes. There is a software collection repository but I will have to maintain my own mirror for that, I really want to avoid that. So my questions to all is, how do you handle my situation? Thanks

(Update) Work-around for Oozie's limited hdfs command arsenal

Image
DISCLAIMER: I do not offer any warranty for the below provided code, run at your own risk!!! UPDATE: Turns out I jumped the gun on the whole python script inside Oozie, it is absolutely possible, it's just in six hours of trial and error, I haven't found a solution yet! Oozie is at it's best, drains my life's blood and sanity. Either way, the shell script will work and it's been running fine for me for the last week. The Python script works on it's own but in the confines of Oozie, it doesn't know where the executable is. If you can get subprocess.Popen to work, shoot me a comment, I will greatly appreciate it. I have a love-hate relationship with Oozie. Truthfully, it's more hate than love though. Consider scenario, you need to create time-stamped directories, there are no built-in expression language functions available to just do it out of the box. You're forced to come up with all kinds of hacks to get what you want out of Oozie. What I use...

MobaXterm release 7.3 is out with Shellshock bugfix

New release of my favorite terminal emulator MobaXterm is out with much needed fixes for latest security vulnerabilities as well as some new improvements like Cygwin repository support. I litteraly just found out so don't expect an in-depth overview, just go get it. If you want to read the release notes and download the latest version, go here .

Where did all my images go?

I was just browsing through my past blog posts and noticed that all most recent blog entries were missing images. Did Google have an "oops" moment? I will try to locate the images and repost but most are unfortunately lost. Lesson learned: not to use Blogger as my documentation repository... Bummer!