Hadoop インストールとセットアップ

Java のインストール

[ec2-user@ip-172-31-44-80 ~]$ java -version
java version "1.7.0_171"
OpenJDK Runtime Environment (amzn-2.6.13.0.76.amzn1-x86_64 u171-b01)
OpenJDK 64-Bit Server VM (build 24.171-b01, mixed mode)

Unix ユーザアカウントの作成

hadoopの処理をそれぞれ分けるために専用にUnixユーザーアカウントを作成するのが良い。

[ec2-user@ip-172-31-44-80 ~]$ sudo su
[root@ip-172-31-44-80 ec2-user]# groupadd hadoop
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop hadoop
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop hdfs
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop mapred
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop yarn
[root@ip-172-31-44-80 ec2-user]# ls -al /home/
total 28
drwxr-xr-x  7 root     root     4096 Apr  7 06:55 .
dr-xr-xr-x 25 root     root     4096 Apr  7 06:42 ..
drwx------  3 ec2-user ec2-user 4096 Apr  7 06:42 ec2-user
drwx------  2 hadoop   hadoop   4096 Apr  7 06:55 hadoop
drwx------  2 hdfs     hadoop   4096 Apr  7 06:48 hdfs
drwx------  2 mapred   hadoop   4096 Apr  7 06:48 mapred
drwx------  2 yarn     hadoop   4096 Apr  7 06:48 yarn

パスワードの設定

[ec2-user@ip-172-31-44-80 local]$ sudo passwd hadoop
Changing password for user hadoop.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[ec2-user@ip-172-31-44-80 local]$ sudo passwd hdfs
Changing password for user hdfs.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[ec2-user@ip-172-31-44-80 local]$ sudo passwd mapred
Changing password for user mapred.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[ec2-user@ip-172-31-44-80 local]$ sudo passwd yarn
Changing password for user yarn.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.

生成したユーザーにsudo権限を与える

[ec2-user@ip-172-31-44-80 local]$ sudo visudo
[ec2-user@ip-172-31-44-80 local]$ sudo groupadd sudo
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo hadoop
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo hdfs
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo mapred
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo yarn

visudo実行時は、以下を追加

hadoop  ALL=(ALL)       ALL
hdfs    ALL=(ALL)       ALL
mapred  ALL=(ALL)       ALL
yarn    ALL=(ALL)       ALL

Hadoopのインストール

[ec2-user@ip-172-31-44-80 ~]$ cd /usr/local
[ec2-user@ip-172-31-44-80 local]$ sudo wget http://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz
--2018-04-07 07:00:46--  http://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz
Resolving ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)... 203.178.132.80, 2001:200:0:7c06::9393
Connecting to ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)|203.178.132.80|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 244469481 (233M) [application/x-gzip]
Saving to: ‘hadoop-2.8.3.tar.gz.1’

hadoop-2.8.3.tar.gz.1                              100%[==============================================================================================================>] 233.14M  11.0MB/s    in 31s     

2018-04-07 07:01:17 (7.48 MB/s) - ‘hadoop-2.8.3.tar.gz.1’ saved [244469481/244469481]
[ec2-user@ip-172-31-44-80 local]$ sudo tar xzf hadoop-2.8.3.tar.gz
[ec2-user@ip-172-31-44-80 local]$ sudo chown -R hadoop:hadoop hadoop-2.8.3
[ec2-user@ip-172-31-44-80 ~]$ sudo vim /etc/bashrc

/etc/bashrcには以下を追加。

export HADOOP_HOME=/usr/local/hadoop-2.8.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

編集内容の読み込みとHadoopのインストールができたことの確認

[ec2-user@ip-172-31-44-80 ~]$ . ~/.bashrc
[ec2-user@ip-172-31-44-80 local]$ hadoop version
Hadoop 2.8.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b3fe56402d908019d99af1f1f4fc65cb1d1436a2
Compiled by jdu on 2017-12-05T03:43Z
Compiled with protoc 2.5.0
From source with checksum 9ff4856d824e983fa510d3f843e3f19d
This command was run using /usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-common-2.8.3.jar

SSHの設定

クラスター内のマシンからhdfsやyarnユーザーからパスワードレスログインを許可するようにセットアップする必要がある。SSH鍵生成時のパスフレーズは、Test1234のように入力しておく。

[ec2-user@ip-172-31-44-80 local]$ su hdfs
Password: 
[hdfs@ip-172-31-44-80 local]$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hdfs/.ssh/id_rsa.
Your public key has been saved in /home/hdfs/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:b0iKwqpe1Vl6ziDjn1KC0M1HSnfKODL1od1AuI/7lAU hdfs@ip-172-31-44-80
The key's randomart image is:
+---[RSA 2048]----+
|      o.         |
|     + = .       |
|  . = @E*.       |
| . + O.*=.       |
|  . ++==So       |
| . .o+o=Bo       |
|  o...=o.oo      |
| ... oo ..       |
|=.    o+         |
+----[SHA256]-----+
[hdfs@ip-172-31-44-80 local]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hdfs@ip-172-31-44-80 local]$ exit
exit
[yarn@ip-172-31-44-80 local]$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Saving key "/home/yarn/.ssh/id_rsa" failed: passphrase is too short (minimum five characters)
[yarn@ip-172-31-44-80 local]$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/yarn/.ssh/id_rsa.
Your public key has been saved in /home/yarn/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:EgaMuDWE64ANH3u4mE6bXY5H8Sk699ah0x+Trg3FJQo yarn@ip-172-31-44-80
The key's randomart image is:
+---[RSA 2048]----+
| +.o.            |
|+ = ..           |
|.B =  oE   . .   |
|= = .o .. o o    |
|oo o  + S. o     |
|oo.  + + .. .    |
|o + * . +..+     |
| + = + + o+ o    |
|    + o...o+     |
+----[SHA256]-----+
[yarn@ip-172-31-44-80 local]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh-agentを利用して、SSHができるようにしておく。

[ec2-user@ip-172-31-44-80 local]$ su hdfs
Password: 
[hdfs@ip-172-31-44-80 local]$ eval `ssh-agent`
Agent pid 23016
[hdfs@ip-172-31-44-80 local]$ ssh-add ~/.ssh/id_rsa
Enter passphrase for /home/hdfs/.ssh/id_rsa: 
Identity added: /home/hdfs/.ssh/id_rsa (/home/hdfs/.ssh/id_rsa)
[hdfs@ip-172-31-44-80 local]$ exit
exit
[ec2-user@ip-172-31-44-80 local]$ su yarn
Password: 
[yarn@ip-172-31-44-80 local]$ eval `ssh-agent`
Agent pid 23037
[yarn@ip-172-31-44-80 local]$ ssh-add ~/.ssh/id_rsa
Enter passphrase for /home/yarn/.ssh/id_rsa: 
Identity added: /home/yarn/.ssh/id_rsa (/home/yarn/.ssh/id_rsa)

Hadoopの設定

[hdfs@ip-172-31-44-80 ~]$ cd $HADOOP_HOME/sbin
[hdfs@ip-172-31-44-80 sbin]$ ls
distribute-exclude.sh  hdfs-config.cmd  kms.sh                   slaves.sh      start-balancer.sh  start-secure-dns.sh  stop-all.cmd      stop-dfs.cmd        stop-yarn.cmd   yarn-daemons.sh
hadoop-daemon.sh       hdfs-config.sh   mr-jobhistory-daemon.sh  start-all.cmd  start-dfs.cmd      start-yarn.cmd       stop-all.sh       stop-dfs.sh         stop-yarn.sh
hadoop-daemons.sh      httpfs.sh        refresh-namenodes.sh     start-all.sh   start-dfs.sh       start-yarn.sh        stop-balancer.sh  stop-secure-dns.sh  yarn-daemon.sh

HDFSファイルシステムのフォーマット

HDFSのインストールにはフォーマットが必要。
データノードが全ファイルシステムのメタデータを管理し、データノードは動的にクラスターをjoin/leaveするので、データノードはフォーマット処理に関係しない。
作られるファイルシステムの大きさは、クラスター中のデータノードの数によって決められるので、考える必要はない。

[ec2-user@ip-172-31-44-80 ~]$ su hdfs
Password: 
[hdfs@ip-172-31-44-80 ec2-user]$ hdfs namenode -format
18/04/07 08:03:50 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   user = hdfs
STARTUP_MSG:   host = ip-172-31-44-80.us-west-2.compute.internal/172.31.44.80
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.8.3
STARTUP_MSG:   classpath = /usr/local/hadoop-2.8.3/etc/hadoop:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jsch-0.1.54.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/hadoop-annotations-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/nimbus-jose-jwt-3.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/httpclient-4.5.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/json-smart-1.1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/hadoop-auth-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/htrace-core4-4.0.1-incubating.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/httpcore-4.4.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jcip-annotations-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-nfs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-common-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/okio-1.4.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/hadoop-hdfs-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/okhttp-2.4.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/htrace-core4-4.0.1-incubating.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-native-client-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-native-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-nfs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-client-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/curator-test-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/javassist-3.18.1-GA.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/curator-client-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/json-io-2.5.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/fst-2.50.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-math-2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/java-util-1.9.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-registry-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-tests-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-api-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/hadoop-annotations-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.8.3.jar:/usr/local/hadoop-2.8.3/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r b3fe56402d908019d99af1f1f4fc65cb1d1436a2; compiled by 'jdu' on 2017-12-05T03:43Z
STARTUP_MSG:   java = 1.7.0_171
************************************************************/
18/04/07 08:03:50 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
18/04/07 08:03:50 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-e3ff70d9-024d-4a8c-b199-7a39526f4ee6
18/04/07 08:03:51 INFO namenode.FSEditLog: Edit logging is async:true
18/04/07 08:03:51 INFO namenode.FSNamesystem: KeyProvider: null
18/04/07 08:03:51 INFO namenode.FSNamesystem: fsLock is fair: true
18/04/07 08:03:51 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
18/04/07 08:03:51 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
18/04/07 08:03:51 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
18/04/07 08:03:51 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
18/04/07 08:03:51 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Apr 07 08:03:51
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map BlocksMap
18/04/07 08:03:51 INFO util.GSet: VM type       = 64-bit
18/04/07 08:03:51 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
18/04/07 08:03:51 INFO util.GSet: capacity      = 2^21 = 2097152 entries
18/04/07 08:03:51 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
18/04/07 08:03:51 INFO blockmanagement.BlockManager: defaultReplication         = 3
18/04/07 08:03:51 INFO blockmanagement.BlockManager: maxReplication             = 512
18/04/07 08:03:51 INFO blockmanagement.BlockManager: minReplication             = 1
18/04/07 08:03:51 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
18/04/07 08:03:51 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
18/04/07 08:03:51 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
18/04/07 08:03:51 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
18/04/07 08:03:51 INFO namenode.FSNamesystem: fsOwner             = hdfs (auth:SIMPLE)
18/04/07 08:03:51 INFO namenode.FSNamesystem: supergroup          = supergroup
18/04/07 08:03:51 INFO namenode.FSNamesystem: isPermissionEnabled = true
18/04/07 08:03:51 INFO namenode.FSNamesystem: HA Enabled: false
18/04/07 08:03:51 INFO namenode.FSNamesystem: Append Enabled: true
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map INodeMap
18/04/07 08:03:51 INFO util.GSet: VM type       = 64-bit
18/04/07 08:03:51 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
18/04/07 08:03:51 INFO util.GSet: capacity      = 2^20 = 1048576 entries
18/04/07 08:03:51 INFO namenode.FSDirectory: ACLs enabled? false
18/04/07 08:03:51 INFO namenode.FSDirectory: XAttrs enabled? true
18/04/07 08:03:51 INFO namenode.NameNode: Caching file names occurring more than 10 times
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map cachedBlocks
18/04/07 08:03:51 INFO util.GSet: VM type       = 64-bit
18/04/07 08:03:51 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
18/04/07 08:03:51 INFO util.GSet: capacity      = 2^18 = 262144 entries
18/04/07 08:03:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
18/04/07 08:03:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
18/04/07 08:03:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
18/04/07 08:03:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
18/04/07 08:03:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
18/04/07 08:03:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
18/04/07 08:03:51 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
18/04/07 08:03:51 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map NameNodeRetryCache
18/04/07 08:03:51 INFO util.GSet: VM type       = 64-bit
18/04/07 08:03:51 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
18/04/07 08:03:51 INFO util.GSet: capacity      = 2^15 = 32768 entries
18/04/07 08:03:51 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1797533508-172.31.44.80-1523088231605
18/04/07 08:03:51 INFO common.Storage: Storage directory /tmp/hadoop-hdfs/dfs/name has been successfully formatted.
18/04/07 08:03:51 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-hdfs/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
18/04/07 08:03:51 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-hdfs/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
18/04/07 08:03:51 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/04/07 08:03:51 INFO util.ExitUtil: Exiting with status 0
18/04/07 08:03:51 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-172-31-44-80.us-west-2.compute.internal/172.31.44.80
************************************************************/

デーモンの起動と停止

HDFSデーモンをstart-dfs.shで開始

  • スクリプトは、hdfs getconf -namenodesのコマンドを実行して得られた各マシン上でネームノードを開始。
  • スレーブファイル上にリスト化されている各マシン上でデータノードを開始。
  • hdfs getconf -secondaryNameNodesのコマンドを実行して得られた各マシン上でセカンダリのネームノードを開始。
[ec2-user@ip-172-31-44-80 sbin]$ su hdfs
[hdfs@ip-172-31-44-80 sbin]$ sudo mkdir /usr/local/hadoop-2.8.3/logs/
[hdfs@ip-172-31-44-80 sbin]$ sudo chmod 775 /usr/local/hadoop-2.8.3/logs/
[hdfs@ip-172-31-44-80 sbin]$ sudo chown -Rf hadoop:hadoop /usr/local/hadoop-2.8.3/logs/
[hdfs@ip-172-31-44-80 sbin]$ ls -al /usr/local/hadoop-2.8.3
total 160
drwxrwxr-x 10 hadoop hadoop  4096 Apr  7 08:36 .
drwxr-xr-x 13 root   root    4096 Apr  7 07:04 ..
drwxr-xr-x  2 hadoop hadoop  4096 Dec  5 04:28 bin
drwxr-xr-x  3 hadoop hadoop  4096 Dec  5 04:28 etc
drwxr-xr-x  2 hadoop hadoop  4096 Dec  5 04:28 include
drwxr-xr-x  3 hadoop hadoop  4096 Dec  5 04:28 lib
drwxr-xr-x  2 hadoop hadoop  4096 Dec  5 04:28 libexec
-rw-r--r--  1 hadoop hadoop 99253 Dec  5 04:28 LICENSE.txt
drwxrwxr-x  2 hadoop hadoop  4096 Apr  7 08:36 logs
-rw-r--r--  1 hadoop hadoop 15915 Dec  5 04:28 NOTICE.txt
-rw-r--r--  1 hadoop hadoop  1366 Dec  5 04:28 README.txt
drwxr-xr-x  2 hadoop hadoop  4096 Dec  5 04:28 sbin
drwxr-xr-x  4 hadoop hadoop  4096 Dec  5 04:28 share

core-site.xmlの編集

[hdfs@ip-172-31-44-80 ec2-user]$ sudo vim /usr/local/hadoop-2.8.3/etc/hadoop/core-site.xml

core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:8020</value>
  </property>
</configuration>

start-dfs.shの実行

[hdfs@ip-172-31-44-80 ec2-user]$ start-dfs.sh
Starting namenodes on [localhost]
Enter passphrase for key '/home/hdfs/.ssh/id_rsa': 
localhost: starting namenode, logging to /usr/local/hadoop-2.8.3/logs/hadoop-hdfs-namenode-ip-172-31-44-80.out
Enter passphrase for key '/home/hdfs/.ssh/id_rsa': 
localhost: starting datanode, logging to /usr/local/hadoop-2.8.3/logs/hadoop-hdfs-datanode-ip-172-31-44-80.out
Starting secondary namenodes [0.0.0.0]
Enter passphrase for key '/home/hdfs/.ssh/id_rsa': 
Enter passphrase for key '/home/hdfs/.ssh/id_rsa': 
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.8.3/logs/hadoop-hdfs-secondarynamenode-ip-172-31-44-80.out

ネームノード情報の取得

[hdfs@ip-172-31-44-80 sbin]$ hdfs getconf -namenodes
localhost

セカンダリのネームノード情報の取得

[hdfs@ip-172-31-44-80 sbin]$ hdfs getconf -secondaryNameNodes
0.0.0.0

スレーブファイルは以下。

[hdfs@ip-172-31-44-80 hadoop-2.8.3]$ pwd
/usr/local/hadoop-2.8.3
[hdfs@ip-172-31-44-80 hadoop-2.8.3]$ cat etc/hadoop/slaves 
localhost

YARNデーモンをstart-yarn.shで開始

  • スクリプトは、ローカルマシン上でリソースマネージャを開始
  • スレーブファイルにリスト化された各マシン上でノードマネージャを開始
[hdfs@ip-172-31-44-80 sbin]$ su yarn
Password: 
[yarn@ip-172-31-44-80 sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.8.3/logs/yarn-yarn-resourcemanager-ip-172-31-44-80.out
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:JKwwVSwxYDwyPu0fSeyRd7+/TEDDw9JZxSQQSMjhCr8.
ECDSA key fingerprint is MD5:68:09:14:01:ae:9f:14:5f:ec:79:bd:f9:c8:93:9e:ce.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Enter passphrase for key '/home/yarn/.ssh/id_rsa': 
localhost: starting nodemanager, logging to /usr/local/hadoop-2.8.3/logs/yarn-yarn-nodemanager-ip-172-31-44-80.out

MapReduceデーモンであるjob history serverを開始

[yarn@ip-172-31-44-80 sbin]$ su mapred
Password: 
[mapred@ip-172-31-44-80 sbin]$ mr-jobhistory-daemon.sh start historyserver
[mapred@ip-172-31-44-80 sbin]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.8.3/logs/mapred-mapred-historyserver-ip-172-31-44-80.out

Hadoopクラスターが立ちあがって起動したらユーザーへアクセスする権限を与える

[ec2-user@ip-172-31-44-80 ~]$ sudo su
[root@ip-172-31-44-80 ec2-user]# hadoop fs -mkdir -p /user/ec2-user
[root@ip-172-31-44-80 ec2-user]# hadoop fs -chown ec2-user:ec2-user /user/ec2-user/

ユーザーディレクトリに空間の制限を設けるのも良い。 コマンド: hdfs dfsadmin -setSpaceQuota 1t /user/ec2-user/

ベンチマークを実行

Hadoop 設定

Hadoopのインストールの設定を制御するファイル

  • hadoop-env.sh
    • Hadoopを稼働するスクリプトで利用される環境変数
  • mapred-env.sh
    • MapRecudeを稼働するスクリプトで利用される環境変数。hadoop-env.shを上書き。
  • yarn-env.sh
    • YARNを稼働するスクリプトで利用される環境変数。hadoop-env.shを上書き。
  • core-cite.xml
    • HDFS,MapReduce,YARNに共通なI/Oの設定等
  • hdfs-site.xml
    • HDFSデーモン(ネームノード、セカンダリネームノード、データノード)
  • mapred-site.xml
    • MapReuceデーモン(Job history server)
  • yarn-site.yml
    • YARNデーモン(リソースマネージャ、Web app proxy server, ノードマネージャ)
  • slaves
    • データノードとノードマネージャを稼働するマシンのリスト
  • hadoop-metrics2.properties
    • Hadoopでどのようにメトリクスをpublishするか
  • log4j.properties
    • システムログファイル、ネームノード監査ログ、タスクJVMプロセスのタスクログ
  • hadoop-policy.xml
    • セキュアモードのHadoopを稼働するアクセス制御リストの制御設定
$ ls /usr/local/hadoop-2.8.3/etc/hadoop
capacity-scheduler.xml  hadoop-env.sh               httpfs-env.sh            kms-env.sh            mapred-env.sh               ssl-server.xml.example
configuration.xsl       hadoop-metrics2.properties  httpfs-log4j.properties  kms-log4j.properties  mapred-queues.xml.template  yarn-env.cmd
container-executor.cfg  hadoop-metrics.properties   httpfs-signature.secret  kms-site.xml          mapred-site.xml.template    yarn-env.sh
core-site.xml           hadoop-policy.xml           httpfs-site.xml          log4j.properties      slaves                      yarn-site.xml
hadoop-env.cmd          hdfs-site.xml               kms-acls.xml             mapred-env.cmd        ssl-client.xml.example

Hadoop Clusterのベンチマーク

Benchmarksはtest JARファイルでパッケージ化されている。

  • 確認方法
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-*-tests.jar
An example program must be given as the first argument.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  DistributedFSCheck: Distributed checkup of the file system consistency.
  JHLogAnalyzer: Job History Log analyzer.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  NNdataGenerator: Generate the data to be used by NNloadGenerator
  NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
  NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
  NNstructureGenerator: Generate the structure to be used by NNdataGenerator
  SliveTest: HDFS Stress Test and Live Data Verification.
  TestDFSIO: Distributed i/o benchmark.
  fail: a job that always fails
  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
  largesorter: Large-Sort tester
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  minicluster: Single process HDFS and MR cluster.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode w/ MR.
  nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
  sleep: A job that sleeps at each map and reduce task.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
  timelineperformance: A job that launches mappers to test timlineserver performance.
  • 利用方法の確認方法
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-*-tests.jar \
> TestDFSIO
18/04/15 04:24:01 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]

TerasortでMapReduceのベンチマーク

  • 1,000個のmapを利用するデータのテラバイトのデータ生成方法
$ hadoop jar \
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
teragen -Dmapreduce.job.maps=1000 10t random-data
  • terasortを稼働
$ hadoop jar \
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
terasort random-data sorted-data

  • sanity check
$ hadoop jar \
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
teravalidate sorted-data report

実際には10Mで実行した結果

データ生成

[mapred@ip-172-31-44-80 ec2-user]$ HADOOP_USER_NAME=hdfs JAVA_HOME=/usr/lib/jvm/jre /usr/local/hadoop-2.8.3/bin/hadoop fs -chown mapred:hadoop /
[mapred@ip-172-31-44-80 ec2-user]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen -Dmapreduce.job.maps=1000 10m random-data
18/04/15 04:44:28 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/15 04:44:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/15 04:44:28 INFO terasort.TeraGen: Generating 10000000 using 1
18/04/15 04:44:28 INFO mapreduce.JobSubmitter: number of splits:1
18/04/15 04:44:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local339502331_0001
18/04/15 04:44:29 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/15 04:44:29 INFO mapreduce.Job: Running job: job_local339502331_0001
18/04/15 04:44:29 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/15 04:44:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 04:44:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 04:44:29 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/04/15 04:44:29 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/15 04:44:29 INFO mapred.LocalJobRunner: Starting task: attempt_local339502331_0001_m_000000_0
18/04/15 04:44:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 04:44:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 04:44:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 04:44:29 INFO mapred.MapTask: Processing split: org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@756a69a3
18/04/15 04:44:30 INFO mapreduce.Job: Job job_local339502331_0001 running in uber mode : false
18/04/15 04:44:30 INFO mapreduce.Job:  map 0% reduce 0%
18/04/15 04:44:41 INFO mapred.LocalJobRunner: 
18/04/15 04:44:41 INFO mapred.Task: Task:attempt_local339502331_0001_m_000000_0 is done. And is in the process of committing
18/04/15 04:44:41 INFO mapred.LocalJobRunner: 
18/04/15 04:44:41 INFO mapred.Task: Task attempt_local339502331_0001_m_000000_0 is allowed to commit now
18/04/15 04:44:41 INFO output.FileOutputCommitter: Saved output of task 'attempt_local339502331_0001_m_000000_0' to hdfs://localhost:8020/user/mapred/random-data/_temporary/0/task_local339502331_0001_m_000000
18/04/15 04:44:41 INFO mapred.LocalJobRunner: map
18/04/15 04:44:41 INFO mapred.Task: Task 'attempt_local339502331_0001_m_000000_0' done.
18/04/15 04:44:41 INFO mapred.Task: Final Counters for attempt_local339502331_0001_m_000000_0: Counters: 21
    File System Counters
        FILE: Number of bytes read=302059
        FILE: Number of bytes written=672763
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=0
        HDFS: Number of bytes written=1000000000
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Map-Reduce Framework
        Map input records=10000000
        Map output records=10000000
        Input split bytes=82
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=96
        Total committed heap usage (bytes)=137363456
    org.apache.hadoop.examples.terasort.TeraGen$Counters
        CHECKSUM=21472776955442690
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=1000000000
18/04/15 04:44:41 INFO mapred.LocalJobRunner: Finishing task: attempt_local339502331_0001_m_000000_0
18/04/15 04:44:41 INFO mapred.LocalJobRunner: map task executor complete.
18/04/15 04:44:42 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 04:44:42 INFO mapreduce.Job: Job job_local339502331_0001 completed successfully
18/04/15 04:44:42 INFO mapreduce.Job: Counters: 21
    File System Counters
        FILE: Number of bytes read=302059
        FILE: Number of bytes written=672763
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=0
        HDFS: Number of bytes written=1000000000
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Map-Reduce Framework
        Map input records=10000000
        Map output records=10000000
        Input split bytes=82
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=96
        Total committed heap usage (bytes)=137363456
    org.apache.hadoop.examples.terasort.TeraGen$Counters
        CHECKSUM=21472776955442690
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=1000000000

terasortを稼働

[mapred@ip-172-31-44-80 ~]$ hadoop jar \
> $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
> terasort random-data sorted-data
18/04/15 05:01:57 INFO terasort.TeraSort: starting
18/04/15 05:01:58 INFO input.FileInputFormat: Total input files to process : 1
Spent 79ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 81ms
Sampling 8 splits of 8
Making 1 from 100000 sampled records
Computing parititions took 707ms
Spent 790ms computing partitions.
18/04/15 05:01:59 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/15 05:01:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/15 05:01:59 INFO mapreduce.JobSubmitter: number of splits:8
18/04/15 05:01:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local2107661221_0001
18/04/15 05:02:00 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-mapred/mapred/local/1523768520025/_partition.lst <- /home/mapred/_partition.lst
18/04/15 05:02:00 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:8020/user/mapred/sorted-data/_partition.lst as file:/tmp/hadoop-mapred/mapred/local/1523768520025/_partition.lst
18/04/15 05:02:00 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/15 05:02:00 INFO mapreduce.Job: Running job: job_local2107661221_0001
18/04/15 05:02:00 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/15 05:02:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:00 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:00 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/04/15 05:02:00 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/15 05:02:00 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000000_0
18/04/15 05:02:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:00 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:00 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:00 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:0+134217728
18/04/15 05:02:00 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:00 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:00 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:00 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:00 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:00 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:01 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:01 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:01 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:01 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:01 INFO mapreduce.Job: Job job_local2107661221_0001 running in uber mode : false
18/04/15 05:02:01 INFO mapreduce.Job:  map 0% reduce 0%
18/04/15 05:02:03 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:03 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:03 INFO mapred.LocalJobRunner: 
18/04/15 05:02:03 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:03 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:03 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
18/04/15 05:02:03 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
18/04/15 05:02:05 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:05 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:05 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586498 bytes
18/04/15 05:02:06 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000000_0 is done. And is in the process of committing
18/04/15 05:02:06 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:06 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000000_0' done.
18/04/15 05:02:06 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000000_0: Counters: 22
    File System Counters
        FILE: Number of bytes read=139889898
        FILE: Number of bytes written=279851724
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=144217800
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=27
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Map-Reduce Framework
        Map input records=1342178
        Map output records=1342178
        Map output bytes=136902156
        Map output materialized bytes=139586518
        Input split bytes=123
        Combine input records=0
        Spilled Records=2684356
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=46
        Total committed heap usage (bytes)=355991552
    File Input Format Counters 
        Bytes Read=134217800
18/04/15 05:02:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000000_0
18/04/15 05:02:06 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000001_0
18/04/15 05:02:06 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:06 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:06 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:06 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:134217728+134217728
18/04/15 05:02:06 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:06 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:06 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:06 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:06 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:06 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:07 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:07 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:07 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:07 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:07 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:09 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:09 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:09 INFO mapred.LocalJobRunner: 
18/04/15 05:02:09 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:09 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:09 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:09 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:10 INFO mapreduce.Job:  map 13% reduce 0%
18/04/15 05:02:10 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:10 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:10 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:12 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000001_0 is done. And is in the process of committing
18/04/15 05:02:12 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:12 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000001_0' done.
18/04/15 05:02:12 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000001_0: Counters: 22
    File System Counters
        FILE: Number of bytes read=279477325
        FILE: Number of bytes written=559024590
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=278435500
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=29
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Map-Reduce Framework
        Map input records=1342177
        Map output records=1342177
        Map output bytes=136902054
        Map output materialized bytes=139586414
        Input split bytes=123
        Combine input records=0
        Spilled Records=2684354
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=226
        Total committed heap usage (bytes)=423100416
    File Input Format Counters 
        Bytes Read=134217700
18/04/15 05:02:12 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000001_0
18/04/15 05:02:12 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000002_0
18/04/15 05:02:12 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:12 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:12 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:12 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:268435456+134217728
18/04/15 05:02:12 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:12 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:12 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:12 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:12 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:12 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:12 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:12 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:12 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:12 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:13 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:14 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:14 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:14 INFO mapred.LocalJobRunner: 
18/04/15 05:02:14 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:14 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:14 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:14 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:15 INFO mapreduce.Job:  map 25% reduce 0%
18/04/15 05:02:16 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:16 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:16 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:17 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000002_0 is done. And is in the process of committing
18/04/15 05:02:17 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:17 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000002_0' done.
18/04/15 05:02:17 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000002_0: Counters: 22
    File System Counters
        FILE: Number of bytes read=419064752
        FILE: Number of bytes written=838197456
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=412653200
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=31
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Map-Reduce Framework
        Map input records=1342177
        Map output records=1342177
        Map output bytes=136902054
        Map output materialized bytes=139586414
        Input split bytes=123
        Combine input records=0
        Spilled Records=2684354
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=83
        Total committed heap usage (bytes)=413663232
    File Input Format Counters 
        Bytes Read=134217700
18/04/15 05:02:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000002_0
18/04/15 05:02:17 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000003_0
18/04/15 05:02:17 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:17 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:17 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:17 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:402653184+134217728
18/04/15 05:02:17 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:17 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:17 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:17 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:17 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:17 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:18 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:18 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:18 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:18 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:18 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:19 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:19 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:20 INFO mapred.LocalJobRunner: 
18/04/15 05:02:20 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:20 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:20 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
18/04/15 05:02:20 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
18/04/15 05:02:20 INFO mapreduce.Job:  map 38% reduce 0%
18/04/15 05:02:21 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:21 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:21 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586498 bytes
18/04/15 05:02:22 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000003_0 is done. And is in the process of committing
18/04/15 05:02:22 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:22 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000003_0' done.
18/04/15 05:02:22 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000003_0: Counters: 22
    File System Counters
        FILE: Number of bytes read=558652283
        FILE: Number of bytes written=1117370530
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=546871000
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=33
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Map-Reduce Framework
        Map input records=1342178
        Map output records=1342178
        Map output bytes=136902156
        Map output materialized bytes=139586518
        Input split bytes=123
        Combine input records=0
        Spilled Records=2684356
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=82
        Total committed heap usage (bytes)=430440448
    File Input Format Counters 
        Bytes Read=134217800
18/04/15 05:02:22 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000003_0
18/04/15 05:02:22 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000004_0
18/04/15 05:02:22 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:22 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:22 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:22 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:805306368+134217728
18/04/15 05:02:22 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:22 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:22 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:22 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:22 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:22 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:23 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:23 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:23 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:23 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:23 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:25 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:25 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:25 INFO mapred.LocalJobRunner: 
18/04/15 05:02:25 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:25 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:25 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:25 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:26 INFO mapreduce.Job:  map 50% reduce 0%
18/04/15 05:02:26 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:26 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:26 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:28 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000004_0 is done. And is in the process of committing
18/04/15 05:02:28 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:28 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000004_0' done.
18/04/15 05:02:28 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000004_0: Counters: 22
    File System Counters
        FILE: Number of bytes read=698239710
        FILE: Number of bytes written=1396543396
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=681088700
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=35
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Map-Reduce Framework
        Map input records=1342177
        Map output records=1342177
        Map output bytes=136902054
        Map output materialized bytes=139586414
        Input split bytes=123
        Combine input records=0
        Spilled Records=2684354
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=147
        Total committed heap usage (bytes)=492306432
    File Input Format Counters 
        Bytes Read=134217700
18/04/15 05:02:28 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000004_0
18/04/15 05:02:28 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000005_0
18/04/15 05:02:28 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:28 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:28 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:28 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:671088640+134217728
18/04/15 05:02:28 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:28 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:28 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:28 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:28 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:28 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:28 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:28 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:28 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:28 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:28 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:30 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:30 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:30 INFO mapred.LocalJobRunner: 
18/04/15 05:02:30 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:30 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:30 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:30 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:31 INFO mapreduce.Job:  map 63% reduce 0%
18/04/15 05:02:32 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:32 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:32 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:33 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000005_0 is done. And is in the process of committing
18/04/15 05:02:33 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:33 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000005_0' done.
18/04/15 05:02:33 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000005_0: Counters: 22
    File System Counters
        FILE: Number of bytes read=837826625
        FILE: Number of bytes written=1675716262
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=815306400
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=37
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Map-Reduce Framework
        Map input records=1342177
        Map output records=1342177
        Map output bytes=136902054
        Map output materialized bytes=139586414
        Input split bytes=123
        Combine input records=0
        Spilled Records=2684354
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=82
        Total committed heap usage (bytes)=488636416
    File Input Format Counters 
        Bytes Read=134217700
18/04/15 05:02:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000005_0
18/04/15 05:02:33 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000006_0
18/04/15 05:02:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:33 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:33 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:33 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:536870912+134217728
18/04/15 05:02:33 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:33 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:33 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:33 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:33 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:33 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:34 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:34 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:34 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:34 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:34 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:35 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:35 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:35 INFO mapred.LocalJobRunner: 
18/04/15 05:02:35 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:35 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:35 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:35 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:36 INFO mapreduce.Job:  map 75% reduce 0%
18/04/15 05:02:37 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:37 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:37 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:38 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000006_0 is done. And is in the process of committing
18/04/15 05:02:38 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:38 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000006_0' done.
18/04/15 05:02:38 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000006_0: Counters: 22
    File System Counters
        FILE: Number of bytes read=977413540
        FILE: Number of bytes written=1954889128
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=949524100
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=39
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Map-Reduce Framework
        Map input records=1342177
        Map output records=1342177
        Map output bytes=136902054
        Map output materialized bytes=139586414
        Input split bytes=123
        Combine input records=0
        Spilled Records=2684354
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=79
        Total committed heap usage (bytes)=488112128
    File Input Format Counters 
        Bytes Read=134217700
18/04/15 05:02:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000006_0
18/04/15 05:02:38 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000007_0
18/04/15 05:02:38 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:38 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:38 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:939524096+60475904
18/04/15 05:02:38 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:38 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:38 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:38 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:38 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:38 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:39 INFO mapred.LocalJobRunner: 
18/04/15 05:02:39 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:39 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:39 INFO mapred.MapTask: bufstart = 0; bufend = 61685418; bufvoid = 104857600
18/04/15 05:02:39 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23795364(95181456); length = 2419033/6553600
18/04/15 05:02:39 INFO mapreduce.Job:  map 88% reduce 0%
18/04/15 05:02:40 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:40 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000007_0 is done. And is in the process of committing
18/04/15 05:02:40 INFO mapred.LocalJobRunner: map
18/04/15 05:02:40 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000007_0' done.
18/04/15 05:02:40 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000007_0: Counters: 22
    File System Counters
        FILE: Number of bytes read=977414035
        FILE: Number of bytes written=2017784102
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1010000000
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=41
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Map-Reduce Framework
        Map input records=604759
        Map output records=604759
        Map output bytes=61685418
        Map output materialized bytes=62894942
        Input split bytes=123
        Combine input records=0
        Spilled Records=604759
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=76
        Total committed heap usage (bytes)=492306432
    File Input Format Counters 
        Bytes Read=60475900
18/04/15 05:02:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000007_0
18/04/15 05:02:40 INFO mapred.LocalJobRunner: map task executor complete.
18/04/15 05:02:40 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/04/15 05:02:40 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_r_000000_0
18/04/15 05:02:40 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:40 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:40 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:40 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6cd08990
18/04/15 05:02:40 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=344614496, maxSingleShuffleLimit=86153624, mergeThreshold=227445584, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/04/15 05:02:40 INFO reduce.EventFetcher: attempt_local2107661221_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/04/15 05:02:40 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000005_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000005_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:40 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000005_0
18/04/15 05:02:40 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000002_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000002_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000002_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000001_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000001_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000001_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000004_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000004_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:41 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000004_0
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000007_0 decomp: 62894938 len: 62894942 to MEMORY
18/04/15 05:02:41 INFO reduce.InMemoryMapOutput: Read 62894938 bytes from map-output for attempt_local2107661221_0001_m_000007_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 62894938, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->62894938
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000000_0: Shuffling to disk since 139586514 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000000_0 decomp: 139586514 len: 139586518 to DISK
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586518 bytes from map-output for attempt_local2107661221_0001_m_000000_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000003_0: Shuffling to disk since 139586514 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000003_0 decomp: 139586514 len: 139586518 to DISK
18/04/15 05:02:42 INFO reduce.OnDiskMapOutput: Read 139586518 bytes from map-output for attempt_local2107661221_0001_m_000003_0
18/04/15 05:02:42 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000006_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:42 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000006_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:43 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000006_0
18/04/15 05:02:43 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/04/15 05:02:43 INFO mapred.LocalJobRunner: 8 / 8 copied.
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 7 on-disk map-outputs
18/04/15 05:02:43 INFO mapred.Merger: Merging 1 sorted segments
18/04/15 05:02:43 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 62894925 bytes
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: Merged 1 segments, 62894938 bytes to disk to satisfy reduce memory limit
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: Merging 8 files, 1040000048 bytes from disk
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/04/15 05:02:43 INFO mapred.Merger: Merging 8 sorted segments
18/04/15 05:02:43 INFO mapred.Merger: Down to the last merge-pass, with 8 segments left of total size: 1039999912 bytes
18/04/15 05:02:43 INFO mapred.LocalJobRunner: 8 / 8 copied.
18/04/15 05:02:43 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/04/15 05:02:52 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:02:53 INFO mapreduce.Job:  map 100% reduce 87%
18/04/15 05:02:57 INFO mapred.Task: Task:attempt_local2107661221_0001_r_000000_0 is done. And is in the process of committing
18/04/15 05:02:57 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:02:57 INFO mapred.Task: Task attempt_local2107661221_0001_r_000000_0 is allowed to commit now
18/04/15 05:02:57 INFO output.FileOutputCommitter: Saved output of task 'attempt_local2107661221_0001_r_000000_0' to hdfs://localhost:8020/user/mapred/sorted-data/_temporary/0/task_local2107661221_0001_r_000000
18/04/15 05:02:57 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:02:57 INFO mapred.Task: Task 'attempt_local2107661221_0001_r_000000_0' done.
18/04/15 05:02:57 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_r_000000_0: Counters: 29
    File System Counters
        FILE: Number of bytes read=3057414387
        FILE: Number of bytes written=3057784150
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1010000000
        HDFS: Number of bytes written=1000000000
        HDFS: Number of read operations=44
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Map-Reduce Framework
        Combine input records=0
        Combine output records=0
        Reduce input groups=10000000
        Reduce shuffle bytes=1040000048
        Reduce input records=10000000
        Reduce output records=10000000
        Spilled Records=10000000
        Shuffled Maps =8
        Failed Shuffles=0
        Merged Map outputs=8
        GC time elapsed (ms)=69
        Total committed heap usage (bytes)=500170752
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Output Format Counters 
        Bytes Written=1000000000
18/04/15 05:02:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_r_000000_0
18/04/15 05:02:57 INFO mapred.LocalJobRunner: reduce task executor complete.
18/04/15 05:02:58 INFO mapreduce.Job:  map 100% reduce 100%
18/04/15 05:02:58 INFO mapreduce.Job: Job job_local2107661221_0001 completed successfully
18/04/15 05:02:58 INFO mapreduce.Job: Counters: 35
    File System Counters
        FILE: Number of bytes read=7945392555
        FILE: Number of bytes written=12897161338
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=5848096700
        HDFS: Number of bytes written=1000000000
        HDFS: Number of read operations=316
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=20
    Map-Reduce Framework
        Map input records=10000000
        Map output records=10000000
        Map output bytes=1020000000
        Map output materialized bytes=1040000048
        Input split bytes=984
        Combine input records=0
        Combine output records=0
        Reduce input groups=10000000
        Reduce shuffle bytes=1040000048
        Reduce input records=10000000
        Reduce output records=10000000
        Spilled Records=29395241
        Shuffled Maps =8
        Failed Shuffles=0
        Merged Map outputs=8
        GC time elapsed (ms)=890
        Total committed heap usage (bytes)=4084727808
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=1000000000
    File Output Format Counters 
        Bytes Written=1000000000
18/04/15 05:02:58 INFO terasort.TeraSort: done

sanity check

[mapred@ip-172-31-44-80 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teravalidate sorted-data report
18/04/15 05:06:35 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/15 05:06:35 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/15 05:06:36 INFO input.FileInputFormat: Total input files to process : 1
Spent 37ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
18/04/15 05:06:36 INFO mapreduce.JobSubmitter: number of splits:1
18/04/15 05:06:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1823691639_0001
18/04/15 05:06:36 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/15 05:06:36 INFO mapreduce.Job: Running job: job_local1823691639_0001
18/04/15 05:06:36 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/15 05:06:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:06:36 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:06:36 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/04/15 05:06:36 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/15 05:06:36 INFO mapred.LocalJobRunner: Starting task: attempt_local1823691639_0001_m_000000_0
18/04/15 05:06:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:06:36 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:06:36 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:06:36 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/sorted-data/part-r-00000:0+1000000000
18/04/15 05:06:36 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:06:36 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:06:36 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:06:36 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:06:36 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:06:36 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:06:37 INFO mapreduce.Job: Job job_local1823691639_0001 running in uber mode : false
18/04/15 05:06:37 INFO mapreduce.Job:  map 0% reduce 0%
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 
18/04/15 05:06:42 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:06:42 INFO mapred.MapTask: Spilling map output
18/04/15 05:06:42 INFO mapred.MapTask: bufstart = 0; bufend = 82; bufvoid = 104857600
18/04/15 05:06:42 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600
18/04/15 05:06:42 INFO mapred.MapTask: Finished spill 0
18/04/15 05:06:42 INFO mapred.Task: Task:attempt_local1823691639_0001_m_000000_0 is done. And is in the process of committing
18/04/15 05:06:42 INFO mapred.LocalJobRunner: map
18/04/15 05:06:42 INFO mapred.Task: Task 'attempt_local1823691639_0001_m_000000_0' done.
18/04/15 05:06:42 INFO mapred.Task: Final Counters for attempt_local1823691639_0001_m_000000_0: Counters: 22
    File System Counters
        FILE: Number of bytes read=302147
        FILE: Number of bytes written=675681
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1000000000
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=5
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=1
    Map-Reduce Framework
        Map input records=10000000
        Map output records=3
        Map output bytes=82
        Map output materialized bytes=94
        Input split bytes=123
        Combine input records=0
        Spilled Records=3
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=74
        Total committed heap usage (bytes)=296747008
    File Input Format Counters 
        Bytes Read=1000000000
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Finishing task: attempt_local1823691639_0001_m_000000_0
18/04/15 05:06:42 INFO mapred.LocalJobRunner: map task executor complete.
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Starting task: attempt_local1823691639_0001_r_000000_0
18/04/15 05:06:42 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:06:42 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:06:42 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:06:42 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@5cde5e61
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=333971456, maxSingleShuffleLimit=83492864, mergeThreshold=220421168, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/04/15 05:06:42 INFO reduce.EventFetcher: attempt_local1823691639_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/04/15 05:06:42 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1823691639_0001_m_000000_0 decomp: 90 len: 94 to MEMORY
18/04/15 05:06:42 INFO reduce.InMemoryMapOutput: Read 90 bytes from map-output for attempt_local1823691639_0001_m_000000_0
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 90, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->90
18/04/15 05:06:42 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
18/04/15 05:06:42 INFO mapred.Merger: Merging 1 sorted segments
18/04/15 05:06:42 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: Merged 1 segments, 90 bytes to disk to satisfy reduce memory limit
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: Merging 1 files, 94 bytes from disk
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/04/15 05:06:42 INFO mapred.Merger: Merging 1 sorted segments
18/04/15 05:06:42 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/04/15 05:06:42 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/04/15 05:06:42 INFO mapred.Task: Task:attempt_local1823691639_0001_r_000000_0 is done. And is in the process of committing
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/04/15 05:06:42 INFO mapred.Task: Task attempt_local1823691639_0001_r_000000_0 is allowed to commit now
18/04/15 05:06:42 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1823691639_0001_r_000000_0' to hdfs://localhost:8020/user/mapred/report/_temporary/0/task_local1823691639_0001_r_000000
18/04/15 05:06:42 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:06:42 INFO mapred.Task: Task 'attempt_local1823691639_0001_r_000000_0' done.
18/04/15 05:06:42 INFO mapred.Task: Final Counters for attempt_local1823691639_0001_r_000000_0: Counters: 29
    File System Counters
        FILE: Number of bytes read=302367
        FILE: Number of bytes written=675775
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1000000000
        HDFS: Number of bytes written=24
        HDFS: Number of read operations=8
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Map-Reduce Framework
        Combine input records=0
        Combine output records=0
        Reduce input groups=3
        Reduce shuffle bytes=94
        Reduce input records=3
        Reduce output records=1
        Spilled Records=3
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=296747008
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Output Format Counters 
        Bytes Written=24
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Finishing task: attempt_local1823691639_0001_r_000000_0
18/04/15 05:06:42 INFO mapred.LocalJobRunner: reduce task executor complete.
18/04/15 05:06:43 INFO mapreduce.Job:  map 100% reduce 100%
18/04/15 05:06:43 INFO mapreduce.Job: Job job_local1823691639_0001 completed successfully
18/04/15 05:06:43 INFO mapreduce.Job: Counters: 35
    File System Counters
        FILE: Number of bytes read=604514
        FILE: Number of bytes written=1351456
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=2000000000
        HDFS: Number of bytes written=24
        HDFS: Number of read operations=13
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Map-Reduce Framework
        Map input records=10000000
        Map output records=3
        Map output bytes=82
        Map output materialized bytes=94
        Input split bytes=123
        Combine input records=0
        Combine output records=0
        Reduce input groups=3
        Reduce shuffle bytes=94
        Reduce input records=3
        Reduce output records=1
        Spilled Records=6
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=74
        Total committed heap usage (bytes)=593494016
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=1000000000
    File Output Format Counters 
        Bytes Written=24

参考

  • Amazon | Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale | Tom White | Software Development
    https://www.amazon.co.jp/Hadoop-Definitive-Storage-Analysis-Internet/dp/1491901632

コメントを残す

メールアドレスが公開されることはありません。 * が付いている欄は必須項目です