---
title: Clojure+Leiningenで楽々Hadoop!
tags: []
categories: ["Programming", "Lisp", "Clojure", "Leiningen"]
date: 2010-02-21T18:21:30Z
updated: 2010-03-19T19:54:16Z
---

<p>
Hadoopの簡単なサンプルをClojureで書いてみました。leiningenでビルドするとらくちんでした！
</p>
<h3>動作環境</h3>
<ul>
<li>clojure 1.1.0</li>
<li>hadoop-core 0.20.1</li>
<li><a href="http://github.com/technomancy/leiningen/tarball/1.0.1">leiningen 1.0.1</a><s> (最新の1.1.0だとNG！)</s>→インストール方法は<a href="http://blog.ik.am/entry/view/id/13/title/Leiningen%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB/">こちら</a>。</li>
</ul>
<p>leiningen 1.1.0対応しました(<strong>2010/03/10</strong>)</p>
<p>
サンプルコードはオライリーのHadoop本、例2-6。Java版は<a href="http://github.com/tomwhite/hadoop-book/raw/master/src/main/ch02/java/NewMaxTemperature.java">こちら</a>。
</p>
<h3>project.clj</h3>
<pre class="prettyprint">
(defproject clj-max-temperature "0.1.0-SNAPSHOT" 
  :description "Sample of Hadoop on Clojure" 
  :dependencies [[org.clojure/clojure "1.1.0"]
                 [org.clojure/clojure-contrib "1.1.0"]
                 [org.apache.mahout.hadoop/hadoop-core "0.20.1"]
                 ;; ここから先は全部必要か分からない
                 [commons-cli/commons-cli "1.2"]
                 [commons-codec/commons-codec "1.3"]
                 [commons-el/commons-el "1.0"]
                 [commons-httpclient/commons-httpclient "3.0.1"]
                 [commons-logging/commons-logging "1.0.4"]
                 [commons-logging/commons-logging-api "1.0.4"]
                 [commons-net/commons-net "1.4.1"]]
  :dev-dependencies [[leiningen/lein-swank "1.1.0"]]
  :main clj_max_temperature)
</pre>
<h3>clj_max_temperature.clj</h3>
<pre class="prettyprint">
(ns clj_max_temperature
  (:gen-class)
  (:import (org.apache.hadoop.fs Path)
           (org.apache.hadoop.io IntWritable LongWritable Text)
           (org.apache.hadoop.mapreduce Job Mapper Mapper$Context Reducer Reducer$Context)
           (org.apache.hadoop.mapreduce.lib.input FileInputFormat)
           (org.apache.hadoop.mapreduce.lib.output FileOutputFormat)))

(gen-class 
 :name clj_max_temperature.mapper
 :extends org.apache.hadoop.mapreduce.Mapper
 :prefix "mapper-")

(defn mapper-map [this key value #^Mapper$Context context]
  (let [line (str value)
        year (.substring line 15 19)
        quality (.substring line 92 93)
        air-temperature (Integer/valueOf 
                         (.substring line 
                                     (if (= (.charAt line 87) \+) 88 87)
                                     92))]
    (if (and (not (= air-temperature 9999))
             (.matches quality "[01459]"))
      (.write context (Text. year) (IntWritable. air-temperature)))))

(gen-class
 :name clj_max_temperature.reducer
 :extends org.apache.hadoop.mapreduce.Reducer
 :prefix "reducer-")

(defn reducer-reduce [this key #^Iterable values #^Reducer$Context context]  
  (.write context key (IntWritable. (reduce max (map #(.get %) values)))))

(defn -main [& args]
  (when (not (= (count args) 2))
    (.println System/err "args error!")
    (System/exit -1))
  (let [job (Job.)]
    (FileInputFormat/addInputPath job (Path. (nth args 0)))    
    (FileOutputFormat/setOutputPath job (Path. (nth args 1)))
    (doto job      
      ;; リテラルで現在のクラスの取り方が分からない。。。
      (.setJarByClass (Class/forName "clj_max_temperature"))
      (.setMapperClass (Class/forName "clj_max_temperature.mapper"))
      (.setReducerClass (Class/forName "clj_max_temperature.reducer"))
      (.setOutputKeyClass Text)
      (.setOutputValueClass IntWritable))
    (System/exit (if (.waitForCompletion job true) 0 1))))
</pre>
<p>Reducerの中でmap/reduce使ってるのがいけてるでしょ！</p>
<h3>サンプルコードDL</h3>
<p>プロジェクトは<a href="http://github.com/making/clj-max-temperature">こちら</a></p>
<pre class="prettyprint">
$ git clone git://github.com/making/clj-max-temperature.git
$ cd clj-max-temperature
</pre>
<h3>コンパイル＆実行</h3>
<pre class="prettyprint">
$ lein compile # uberjarだけでcompileも行われるが、コンパイルエラーが発生してもjar作成に進むので開発時は別に実行した方が良い
$ lein uberjar
$ hadoop jar clj-max-temperature-standalone.jar input/sample.txt output
</pre>
<p>
でおｋ。(スタンドアローンモードでのみ確認)<br />
</p>
<p>
スタンドアローンなら、Hadoopをインストールしなくても
</p>
<pre class="prettyprint">
$ java -Xmx1000m -jar clj-max-temperature-standalone.jar input/sample.txt output
</pre>
<p>
だけでもおｋ。簡単な動作確認に良さそう。<br />
</p>
<p>
もっとMapReduceを使いやすくなるマクロをつくりたいね。<br />
<a href="http://github.com/stuartsierra/clojure-hadoop">こんなの</a>ももうあるけど。
</p>
