Thursday, February 27, 2014

Efficient uploading of Jar files to your Hadoop cluster

Copying fat Jar files up to your Hadoop cluster to execute jobs on production-sized data sets in order to find bottlenecks can be painful when you want a quick turn around whilst debugging.

Sometimes local mode just doesn't cut it.

A good solution is to use rsync which supports incremental checking to only transfer file differences.

Command:
rsync -avz /your_source_directory/somejob-0.0.1.jar login@servername:/target_directory/somejob-0.0.1.jar

Options:
-a archive mode
-v verbose mode
-z compress file data during the transfer

Sunday, February 23, 2014

Solution to Maven solr-core artifact causes Eclipse error: "Missing artifact jdk.tools:jdk.tools:jar:1.6"


In Eclipse with a Maven project, if referencing artifact solr-core (v4.x) Eclipse can return a Maven Dependency Problem "Missing artifact jdk.tools:jdk.tools:jar:1.6".

This also causes a build path problem: The container 'Maven Dependencies' references non existing library 'C:\Users\<userdir>\.m2\repository\jdk\tools\jdk.tools\1.6\jdk.tools-1.6.jar'

The tools jar file is supplied by the JDK so we can exclude it in the pom.xml by adding an <exclusions> section like so:

<dependency>
    <groupId>org.apache.solr</groupId>
    <artifactId>solr-core</artifactId>
    <version>4.5.1</version>
    <exclusions>
        <exclusion>
            <artifactId>jdk.tools</artifactId>
            <groupId>jdk.tools</groupId>
        </exclusion>
    </exclusions>

</dependency>


Does anyone know if there is a better way to resolve this issue?