Thursday, February 27, 2014

Efficient uploading of Jar files to your Hadoop cluster

Copying fat Jar files up to your Hadoop cluster to execute jobs on production-sized data sets in order to find bottlenecks can be painful when you want a quick turn around whilst debugging.

Sometimes local mode just doesn't cut it.

A good solution is to use rsync which supports incremental checking to only transfer file differences.

Command:
rsync -avz /your_source_directory/somejob-0.0.1.jar login@servername:/target_directory/somejob-0.0.1.jar

Options:
-a archive mode
-v verbose mode
-z compress file data during the transfer

No comments:

Post a Comment