Hadoop のインストールと環境構築
Hadoop インストールとセットアップ
Java のインストール
[ec2-user@ip-172-31-44-80 ~]$ java -version
java version "1.7.0_171"
OpenJDK Runtime Environment (amzn-2.6.13.0.76.amzn1-x86_64 u171-b01)
OpenJDK 64-Bit Server VM (build 24.171-b01, mixed mode)
Unix ユーザアカウントの作成
hadoopの処理をそれぞれ分けるために専用にUnixユーザーアカウントを作成するのが良い。
[ec2-user@ip-172-31-44-80 ~]$ sudo su
[root@ip-172-31-44-80 ec2-user]# groupadd hadoop
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop hadoop
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop hdfs
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop mapred
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop yarn
[root@ip-172-31-44-80 ec2-user]# ls -al /home/
total 28
drwxr-xr-x 7 root root 4096 Apr 7 06:55 .
dr-xr-xr-x 25 root root 4096 Apr 7 06:42 ..
drwx------ 3 ec2-user ec2-user 4096 Apr 7 06:42 ec2-user
drwx------ 2 hadoop hadoop 4096 Apr 7 06:55 hadoop
drwx------ 2 hdfs hadoop 4096 Apr 7 06:48 hdfs
drwx------ 2 mapred hadoop 4096 Apr 7 06:48 mapred
drwx------ 2 yarn hadoop 4096 Apr 7 06:48 yarn
パスワードの設定
[ec2-user@ip-172-31-44-80 local]$ sudo passwd hadoop
Changing password for user hadoop.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[ec2-user@ip-172-31-44-80 local]$ sudo passwd hdfs
Changing password for user hdfs.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[ec2-user@ip-172-31-44-80 local]$ sudo passwd mapred
Changing password for user mapred.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[ec2-user@ip-172-31-44-80 local]$ sudo passwd yarn
Changing password for user yarn.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
生成したユーザーにsudo権限を与える
[ec2-user@ip-172-31-44-80 local]$ sudo visudo
[ec2-user@ip-172-31-44-80 local]$ sudo groupadd sudo
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo hadoop
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo hdfs
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo mapred
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo yarn
visudo実行時は、以下を追加
hadoop ALL=(ALL) ALL
hdfs ALL=(ALL) ALL
mapred ALL=(ALL) ALL
yarn ALL=(ALL) ALL
Hadoopのインストール
[ec2-user@ip-172-31-44-80 ~]$ cd /usr/local
[ec2-user@ip-172-31-44-80 local]$ sudo wget http://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz
--2018-04-07 07:00:46-- http://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz
Resolving ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)... 203.178.132.80, 2001:200:0:7c06::9393
Connecting to ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)|203.178.132.80|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 244469481 (233M) [application/x-gzip]
Saving to: ‘hadoop-2.8.3.tar.gz.1’
hadoop-2.8.3.tar.gz.1 100%[==============================================================================================================>] 233.14M 11.0MB/s in 31s
2018-04-07 07:01:17 (7.48 MB/s) - ‘hadoop-2.8.3.tar.gz.1’ saved [244469481/244469481]
[ec2-user@ip-172-31-44-80 local]$ sudo tar xzf hadoop-2.8.3.tar.gz
[ec2-user@ip-172-31-44-80 local]$ sudo chown -R hadoop:hadoop hadoop-2.8.3
[ec2-user@ip-172-31-44-80 ~]$ sudo vim /etc/bashrc
/etc/bashrcには以下を追加。
export HADOOP_HOME=/usr/local/hadoop-2.8.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
編集内容の読み込みとHadoopのインストールができたことの確認
[ec2-user@ip-172-31-44-80 ~]$ . ~/.bashrc
[ec2-user@ip-172-31-44-80 local]$ hadoop version
Hadoop 2.8.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b3fe56402d908019d99af1f1f4fc65cb1d1436a2
Compiled by jdu on 2017-12-05T03:43Z
Compiled with protoc 2.5.0
From source with checksum 9ff4856d824e983fa510d3f843e3f19d
This command was run using /usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-common-2.8.3.jar
SSHの設定
クラスター内のマシンからhdfsやyarnユーザーからパスワードレスログインを許可するようにセットアップする必要がある。SSH鍵生成時のパスフレーズは、Test1234のように入力しておく。
[ec2-user@ip-172-31-44-80 local]$ su hdfs
Password:
[hdfs@ip-172-31-44-80 local]$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hdfs/.ssh/id_rsa.
Your public key has been saved in /home/hdfs/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:b0iKwqpe1Vl6ziDjn1KC0M1HSnfKODL1od1AuI/7lAU hdfs@ip-172-31-44-80
The key's randomart image is:
+---[RSA 2048]----+
| o. |
| + = . |
| . = @E*. |
| . + O.*=. |
| . ++==So |
| . .o+o=Bo |
| o...=o.oo |
| ... oo .. |
|=. o+ |
+----[SHA256]-----+
[hdfs@ip-172-31-44-80 local]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hdfs@ip-172-31-44-80 local]$ exit
exit
[yarn@ip-172-31-44-80 local]$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Saving key "/home/yarn/.ssh/id_rsa" failed: passphrase is too short (minimum five characters)
[yarn@ip-172-31-44-80 local]$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/yarn/.ssh/id_rsa.
Your public key has been saved in /home/yarn/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:EgaMuDWE64ANH3u4mE6bXY5H8Sk699ah0x+Trg3FJQo yarn@ip-172-31-44-80
The key's randomart image is:
+---[RSA 2048]----+
| +.o. |
|+ = .. |
|.B = oE . . |
|= = .o .. o o |
|oo o + S. o |
|oo. + + .. . |
|o + * . +..+ |
| + = + + o+ o |
| + o...o+ |
+----[SHA256]-----+
[yarn@ip-172-31-44-80 local]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh-agentを利用して、SSHができるようにしておく。
[ec2-user@ip-172-31-44-80 local]$ su hdfs
Password:
[hdfs@ip-172-31-44-80 local]$ eval `ssh-agent`
Agent pid 23016
[hdfs@ip-172-31-44-80 local]$ ssh-add ~/.ssh/id_rsa
Enter passphrase for /home/hdfs/.ssh/id_rsa:
Identity added: /home/hdfs/.ssh/id_rsa (/home/hdfs/.ssh/id_rsa)
[hdfs@ip-172-31-44-80 local]$ exit
exit
[ec2-user@ip-172-31-44-80 local]$ su yarn
Password:
[yarn@ip-172-31-44-80 local]$ eval `ssh-agent`
Agent pid 23037
[yarn@ip-172-31-44-80 local]$ ssh-add ~/.ssh/id_rsa
Enter passphrase for /home/yarn/.ssh/id_rsa:
Identity added: /home/yarn/.ssh/id_rsa (/home/yarn/.ssh/id_rsa)
Hadoopの設定
[hdfs@ip-172-31-44-80 ~]$ cd $HADOOP_HOME/sbin
[hdfs@ip-172-31-44-80 sbin]$ ls
distribute-exclude.sh hdfs-config.cmd kms.sh slaves.sh start-balancer.sh start-secure-dns.sh stop-all.cmd stop-dfs.cmd stop-yarn.cmd yarn-daemons.sh
hadoop-daemon.sh hdfs-config.sh mr-jobhistory-daemon.sh start-all.cmd start-dfs.cmd start-yarn.cmd stop-all.sh stop-dfs.sh stop-yarn.sh
hadoop-daemons.sh httpfs.sh refresh-namenodes.sh start-all.sh start-dfs.sh start-yarn.sh stop-balancer.sh stop-secure-dns.sh yarn-daemon.sh
HDFSファイルシステムのフォーマット
HDFSのインストールにはフォーマットが必要。 データノードが全ファイルシステムのメタデータを管理し、データノードは動的にクラスターをjoin/leaveするので、データノードはフォーマット処理に関係しない。 作られるファイルシステムの大きさは、クラスター中のデータノードの数によって決められるので、考える必要はない。
[ec2-user@ip-172-31-44-80 ~]$ su hdfs
Password:
[hdfs@ip-172-31-44-80 ec2-user]$ hdfs namenode -format
18/04/07 08:03:50 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = hdfs
STARTUP_MSG: host = ip-172-31-44-80.us-west-2.compute.internal/172.31.44.80
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.8.3
STARTUP_MSG: classpath = /usr/local/hadoop-2.8.3/etc/hadoop:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jsch-0.1.54.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/hadoop-annotations-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/nimbus-jose-jwt-3.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/httpclient-4.5.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/json-smart-1.1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/hadoop-auth-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/htrace-core4-4.0.1-incubating.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/httpcore-4.4.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jcip-annotations-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-nfs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-common-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/okio-1.4.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/hadoop-hdfs-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/okhttp-2.4.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/htrace-core4-4.0.1-incubating.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-native-client-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-native-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-nfs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-client-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/curator-test-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/javassist-3.18.1-GA.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/curator-client-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/json-io-2.5.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/fst-2.50.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-math-2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/java-util-1.9.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-registry-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-tests-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-api-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/hadoop-annotations-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.8.3.jar:/usr/local/hadoop-2.8.3/contrib/capacity-scheduler/*.jar
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r b3fe56402d908019d99af1f1f4fc65cb1d1436a2; compiled by 'jdu' on 2017-12-05T03:43Z
STARTUP_MSG: java = 1.7.0_171
************************************************************/
18/04/07 08:03:50 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
18/04/07 08:03:50 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-e3ff70d9-024d-4a8c-b199-7a39526f4ee6
18/04/07 08:03:51 INFO namenode.FSEditLog: Edit logging is async:true
18/04/07 08:03:51 INFO namenode.FSNamesystem: KeyProvider: null
18/04/07 08:03:51 INFO namenode.FSNamesystem: fsLock is fair: true
18/04/07 08:03:51 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
18/04/07 08:03:51 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
18/04/07 08:03:51 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
18/04/07 08:03:51 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
18/04/07 08:03:51 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Apr 07 08:03:51
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map BlocksMap
18/04/07 08:03:51 INFO util.GSet: VM type = 64-bit
18/04/07 08:03:51 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
18/04/07 08:03:51 INFO util.GSet: capacity = 2^21 = 2097152 entries
18/04/07 08:03:51 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
18/04/07 08:03:51 INFO blockmanagement.BlockManager: defaultReplication = 3
18/04/07 08:03:51 INFO blockmanagement.BlockManager: maxReplication = 512
18/04/07 08:03:51 INFO blockmanagement.BlockManager: minReplication = 1
18/04/07 08:03:51 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
18/04/07 08:03:51 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
18/04/07 08:03:51 INFO blockmanagement.BlockManager: encryptDataTransfer = false
18/04/07 08:03:51 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
18/04/07 08:03:51 INFO namenode.FSNamesystem: fsOwner = hdfs (auth:SIMPLE)
18/04/07 08:03:51 INFO namenode.FSNamesystem: supergroup = supergroup
18/04/07 08:03:51 INFO namenode.FSNamesystem: isPermissionEnabled = true
18/04/07 08:03:51 INFO namenode.FSNamesystem: HA Enabled: false
18/04/07 08:03:51 INFO namenode.FSNamesystem: Append Enabled: true
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map INodeMap
18/04/07 08:03:51 INFO util.GSet: VM type = 64-bit
18/04/07 08:03:51 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
18/04/07 08:03:51 INFO util.GSet: capacity = 2^20 = 1048576 entries
18/04/07 08:03:51 INFO namenode.FSDirectory: ACLs enabled? false
18/04/07 08:03:51 INFO namenode.FSDirectory: XAttrs enabled? true
18/04/07 08:03:51 INFO namenode.NameNode: Caching file names occurring more than 10 times
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map cachedBlocks
18/04/07 08:03:51 INFO util.GSet: VM type = 64-bit
18/04/07 08:03:51 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
18/04/07 08:03:51 INFO util.GSet: capacity = 2^18 = 262144 entries
18/04/07 08:03:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
18/04/07 08:03:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
18/04/07 08:03:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
18/04/07 08:03:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
18/04/07 08:03:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
18/04/07 08:03:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
18/04/07 08:03:51 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
18/04/07 08:03:51 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map NameNodeRetryCache
18/04/07 08:03:51 INFO util.GSet: VM type = 64-bit
18/04/07 08:03:51 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
18/04/07 08:03:51 INFO util.GSet: capacity = 2^15 = 32768 entries
18/04/07 08:03:51 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1797533508-172.31.44.80-1523088231605
18/04/07 08:03:51 INFO common.Storage: Storage directory /tmp/hadoop-hdfs/dfs/name has been successfully formatted.
18/04/07 08:03:51 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-hdfs/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
18/04/07 08:03:51 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-hdfs/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
18/04/07 08:03:51 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/04/07 08:03:51 INFO util.ExitUtil: Exiting with status 0
18/04/07 08:03:51 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-172-31-44-80.us-west-2.compute.internal/172.31.44.80
************************************************************/
デーモンの起動と停止
HDFSデーモンをstart-dfs.shで開始
- スクリプトは、hdfs getconf -namenodesのコマンドを実行して得られた各マシン上でネームノードを開始。
- スレーブファイル上にリスト化されている各マシン上でデータノードを開始。
- hdfs getconf -secondaryNameNodesのコマンドを実行して得られた各マシン上でセカンダリのネームノードを開始。
[ec2-user@ip-172-31-44-80 sbin]$ su hdfs
[hdfs@ip-172-31-44-80 sbin]$ sudo mkdir /usr/local/hadoop-2.8.3/logs/
[hdfs@ip-172-31-44-80 sbin]$ sudo chmod 775 /usr/local/hadoop-2.8.3/logs/
[hdfs@ip-172-31-44-80 sbin]$ sudo chown -Rf hadoop:hadoop /usr/local/hadoop-2.8.3/logs/
[hdfs@ip-172-31-44-80 sbin]$ ls -al /usr/local/hadoop-2.8.3
total 160
drwxrwxr-x 10 hadoop hadoop 4096 Apr 7 08:36 .
drwxr-xr-x 13 root root 4096 Apr 7 07:04 ..
drwxr-xr-x 2 hadoop hadoop 4096 Dec 5 04:28 bin
drwxr-xr-x 3 hadoop hadoop 4096 Dec 5 04:28 etc
drwxr-xr-x 2 hadoop hadoop 4096 Dec 5 04:28 include
drwxr-xr-x 3 hadoop hadoop 4096 Dec 5 04:28 lib
drwxr-xr-x 2 hadoop hadoop 4096 Dec 5 04:28 libexec
-rw-r--r-- 1 hadoop hadoop 99253 Dec 5 04:28 LICENSE.txt
drwxrwxr-x 2 hadoop hadoop 4096 Apr 7 08:36 logs
-rw-r--r-- 1 hadoop hadoop 15915 Dec 5 04:28 NOTICE.txt
-rw-r--r-- 1 hadoop hadoop 1366 Dec 5 04:28 README.txt
drwxr-xr-x 2 hadoop hadoop 4096 Dec 5 04:28 sbin
drwxr-xr-x 4 hadoop hadoop 4096 Dec 5 04:28 share
core-site.xmlの編集
[hdfs@ip-172-31-44-80 ec2-user]$ sudo vim /usr/local/hadoop-2.8.3/etc/hadoop/core-site.xml
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
start-dfs.shの実行
[hdfs@ip-172-31-44-80 ec2-user]$ start-dfs.sh
Starting namenodes on [localhost]
Enter passphrase for key '/home/hdfs/.ssh/id_rsa':
localhost: starting namenode, logging to /usr/local/hadoop-2.8.3/logs/hadoop-hdfs-namenode-ip-172-31-44-80.out
Enter passphrase for key '/home/hdfs/.ssh/id_rsa':
localhost: starting datanode, logging to /usr/local/hadoop-2.8.3/logs/hadoop-hdfs-datanode-ip-172-31-44-80.out
Starting secondary namenodes [0.0.0.0]
Enter passphrase for key '/home/hdfs/.ssh/id_rsa':
Enter passphrase for key '/home/hdfs/.ssh/id_rsa':
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.8.3/logs/hadoop-hdfs-secondarynamenode-ip-172-31-44-80.out
ネームノード情報の取得
[hdfs@ip-172-31-44-80 sbin]$ hdfs getconf -namenodes
localhost
セカンダリのネームノード情報の取得
[hdfs@ip-172-31-44-80 sbin]$ hdfs getconf -secondaryNameNodes
0.0.0.0
スレーブファイルは以下。
[hdfs@ip-172-31-44-80 hadoop-2.8.3]$ pwd
/usr/local/hadoop-2.8.3
[hdfs@ip-172-31-44-80 hadoop-2.8.3]$ cat etc/hadoop/slaves
localhost
YARNデーモンをstart-yarn.shで開始
- スクリプトは、ローカルマシン上でリソースマネージャを開始
- スレーブファイルにリスト化された各マシン上でノードマネージャを開始
[hdfs@ip-172-31-44-80 sbin]$ su yarn
Password:
[yarn@ip-172-31-44-80 sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.8.3/logs/yarn-yarn-resourcemanager-ip-172-31-44-80.out
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:JKwwVSwxYDwyPu0fSeyRd7+/TEDDw9JZxSQQSMjhCr8.
ECDSA key fingerprint is MD5:68:09:14:01:ae:9f:14:5f:ec:79:bd:f9:c8:93:9e:ce.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Enter passphrase for key '/home/yarn/.ssh/id_rsa':
localhost: starting nodemanager, logging to /usr/local/hadoop-2.8.3/logs/yarn-yarn-nodemanager-ip-172-31-44-80.out
MapReduceデーモンであるjob history serverを開始
[yarn@ip-172-31-44-80 sbin]$ su mapred
Password:
[mapred@ip-172-31-44-80 sbin]$ mr-jobhistory-daemon.sh start historyserver
[mapred@ip-172-31-44-80 sbin]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.8.3/logs/mapred-mapred-historyserver-ip-172-31-44-80.out
Hadoopクラスターが立ちあがって起動したらユーザーへアクセスする権限を与える
[ec2-user@ip-172-31-44-80 ~]$ sudo su
[root@ip-172-31-44-80 ec2-user]# hadoop fs -mkdir -p /user/ec2-user
[root@ip-172-31-44-80 ec2-user]# hadoop fs -chown ec2-user:ec2-user /user/ec2-user/
ユーザーディレクトリに空間の制限を設けるのも良い。 コマンド: hdfs dfsadmin -setSpaceQuota 1t /user/ec2-user/
ベンチマークを実行
Hadoop 設定
Hadoopのインストールの設定を制御するファイル
- hadoop-env.sh
- Hadoopを稼働するスクリプトで利用される環境変数
- mapred-env.sh
- MapRecudeを稼働するスクリプトで利用される環境変数。hadoop-env.shを上書き。
- yarn-env.sh
- YARNを稼働するスクリプトで利用される環境変数。hadoop-env.shを上書き。
- core-cite.xml
- HDFS,MapReduce,YARNに共通なI/Oの設定等
- hdfs-site.xml
- HDFSデーモン(ネームノード、セカンダリネームノード、データノード)
- mapred-site.xml
- MapReuceデーモン(Job history server)
- yarn-site.yml
- YARNデーモン(リソースマネージャ、Web app proxy server, ノードマネージャ)
- slaves
- データノードとノードマネージャを稼働するマシンのリスト
- hadoop-metrics2.properties
- Hadoopでどのようにメトリクスをpublishするか
- log4j.properties
- システムログファイル、ネームノード監査ログ、タスクJVMプロセスのタスクログ
- hadoop-policy.xml
- セキュアモードのHadoopを稼働するアクセス制御リストの制御設定
$ ls /usr/local/hadoop-2.8.3/etc/hadoop
capacity-scheduler.xml hadoop-env.sh httpfs-env.sh kms-env.sh mapred-env.sh ssl-server.xml.example
configuration.xsl hadoop-metrics2.properties httpfs-log4j.properties kms-log4j.properties mapred-queues.xml.template yarn-env.cmd
container-executor.cfg hadoop-metrics.properties httpfs-signature.secret kms-site.xml mapred-site.xml.template yarn-env.sh
core-site.xml hadoop-policy.xml httpfs-site.xml log4j.properties slaves yarn-site.xml
hadoop-env.cmd hdfs-site.xml kms-acls.xml mapred-env.cmd ssl-client.xml.example
Hadoop Clusterのベンチマーク
Benchmarksはtest JARファイルでパッケージ化されている。
- 確認方法
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-*-tests.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode w/ MR.
nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
timelineperformance: A job that launches mappers to test timlineserver performance.
- 利用方法の確認方法
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-*-tests.jar \
> TestDFSIO
18/04/15 04:24:01 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]
TerasortでMapReduceのベンチマーク
- 1,000個のmapを利用するデータのテラバイトのデータ生成方法
$ hadoop jar \
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
teragen -Dmapreduce.job.maps=1000 10t random-data
- terasortを稼働
$ hadoop jar \
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
terasort random-data sorted-data
- sanity check
$ hadoop jar \
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
teravalidate sorted-data report
実際には10Mで実行した結果
データ生成
[mapred@ip-172-31-44-80 ec2-user]$ HADOOP_USER_NAME=hdfs JAVA_HOME=/usr/lib/jvm/jre /usr/local/hadoop-2.8.3/bin/hadoop fs -chown mapred:hadoop /
[mapred@ip-172-31-44-80 ec2-user]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen -Dmapreduce.job.maps=1000 10m random-data
18/04/15 04:44:28 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/15 04:44:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/15 04:44:28 INFO terasort.TeraGen: Generating 10000000 using 1
18/04/15 04:44:28 INFO mapreduce.JobSubmitter: number of splits:1
18/04/15 04:44:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local339502331_0001
18/04/15 04:44:29 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/15 04:44:29 INFO mapreduce.Job: Running job: job_local339502331_0001
18/04/15 04:44:29 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/15 04:44:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 04:44:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 04:44:29 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/04/15 04:44:29 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/15 04:44:29 INFO mapred.LocalJobRunner: Starting task: attempt_local339502331_0001_m_000000_0
18/04/15 04:44:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 04:44:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 04:44:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 04:44:29 INFO mapred.MapTask: Processing split: org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@756a69a3
18/04/15 04:44:30 INFO mapreduce.Job: Job job_local339502331_0001 running in uber mode : false
18/04/15 04:44:30 INFO mapreduce.Job: map 0% reduce 0%
18/04/15 04:44:41 INFO mapred.LocalJobRunner:
18/04/15 04:44:41 INFO mapred.Task: Task:attempt_local339502331_0001_m_000000_0 is done. And is in the process of committing
18/04/15 04:44:41 INFO mapred.LocalJobRunner:
18/04/15 04:44:41 INFO mapred.Task: Task attempt_local339502331_0001_m_000000_0 is allowed to commit now
18/04/15 04:44:41 INFO output.FileOutputCommitter: Saved output of task 'attempt_local339502331_0001_m_000000_0' to hdfs://localhost:8020/user/mapred/random-data/_temporary/0/task_local339502331_0001_m_000000
18/04/15 04:44:41 INFO mapred.LocalJobRunner: map
18/04/15 04:44:41 INFO mapred.Task: Task 'attempt_local339502331_0001_m_000000_0' done.
18/04/15 04:44:41 INFO mapred.Task: Final Counters for attempt_local339502331_0001_m_000000_0: Counters: 21
File System Counters
FILE: Number of bytes read=302059
FILE: Number of bytes written=672763
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=1000000000
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Map input records=10000000
Map output records=10000000
Input split bytes=82
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=96
Total committed heap usage (bytes)=137363456
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=21472776955442690
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1000000000
18/04/15 04:44:41 INFO mapred.LocalJobRunner: Finishing task: attempt_local339502331_0001_m_000000_0
18/04/15 04:44:41 INFO mapred.LocalJobRunner: map task executor complete.
18/04/15 04:44:42 INFO mapreduce.Job: map 100% reduce 0%
18/04/15 04:44:42 INFO mapreduce.Job: Job job_local339502331_0001 completed successfully
18/04/15 04:44:42 INFO mapreduce.Job: Counters: 21
File System Counters
FILE: Number of bytes read=302059
FILE: Number of bytes written=672763
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=1000000000
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Map input records=10000000
Map output records=10000000
Input split bytes=82
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=96
Total committed heap usage (bytes)=137363456
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=21472776955442690
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1000000000
terasortを稼働
[mapred@ip-172-31-44-80 ~]$ hadoop jar \
> $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
> terasort random-data sorted-data
18/04/15 05:01:57 INFO terasort.TeraSort: starting
18/04/15 05:01:58 INFO input.FileInputFormat: Total input files to process : 1
Spent 79ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 81ms
Sampling 8 splits of 8
Making 1 from 100000 sampled records
Computing parititions took 707ms
Spent 790ms computing partitions.
18/04/15 05:01:59 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/15 05:01:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/15 05:01:59 INFO mapreduce.JobSubmitter: number of splits:8
18/04/15 05:01:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local2107661221_0001
18/04/15 05:02:00 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-mapred/mapred/local/1523768520025/_partition.lst <- /home/mapred/_partition.lst
18/04/15 05:02:00 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:8020/user/mapred/sorted-data/_partition.lst as file:/tmp/hadoop-mapred/mapred/local/1523768520025/_partition.lst
18/04/15 05:02:00 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/15 05:02:00 INFO mapreduce.Job: Running job: job_local2107661221_0001
18/04/15 05:02:00 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/15 05:02:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:00 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:00 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/04/15 05:02:00 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/15 05:02:00 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000000_0
18/04/15 05:02:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:00 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:00 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:00 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:0+134217728
18/04/15 05:02:00 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:00 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:00 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:00 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:00 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:00 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:01 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:01 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:01 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:01 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:01 INFO mapreduce.Job: Job job_local2107661221_0001 running in uber mode : false
18/04/15 05:02:01 INFO mapreduce.Job: map 0% reduce 0%
18/04/15 05:02:03 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:03 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:03 INFO mapred.LocalJobRunner:
18/04/15 05:02:03 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:03 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:03 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
18/04/15 05:02:03 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
18/04/15 05:02:05 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:05 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:05 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586498 bytes
18/04/15 05:02:06 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000000_0 is done. And is in the process of committing
18/04/15 05:02:06 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:06 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000000_0' done.
18/04/15 05:02:06 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000000_0: Counters: 22
File System Counters
FILE: Number of bytes read=139889898
FILE: Number of bytes written=279851724
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=144217800
HDFS: Number of bytes written=0
HDFS: Number of read operations=27
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Map-Reduce Framework
Map input records=1342178
Map output records=1342178
Map output bytes=136902156
Map output materialized bytes=139586518
Input split bytes=123
Combine input records=0
Spilled Records=2684356
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=46
Total committed heap usage (bytes)=355991552
File Input Format Counters
Bytes Read=134217800
18/04/15 05:02:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000000_0
18/04/15 05:02:06 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000001_0
18/04/15 05:02:06 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:06 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:06 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:134217728+134217728
18/04/15 05:02:06 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:06 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:06 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:06 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:06 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:06 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:07 INFO mapreduce.Job: map 100% reduce 0%
18/04/15 05:02:07 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:07 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:07 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:07 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:09 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:09 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:09 INFO mapred.LocalJobRunner:
18/04/15 05:02:09 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:09 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:09 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:09 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:10 INFO mapreduce.Job: map 13% reduce 0%
18/04/15 05:02:10 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:10 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:10 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:12 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000001_0 is done. And is in the process of committing
18/04/15 05:02:12 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:12 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000001_0' done.
18/04/15 05:02:12 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000001_0: Counters: 22
File System Counters
FILE: Number of bytes read=279477325
FILE: Number of bytes written=559024590
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=278435500
HDFS: Number of bytes written=0
HDFS: Number of read operations=29
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Map-Reduce Framework
Map input records=1342177
Map output records=1342177
Map output bytes=136902054
Map output materialized bytes=139586414
Input split bytes=123
Combine input records=0
Spilled Records=2684354
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=226
Total committed heap usage (bytes)=423100416
File Input Format Counters
Bytes Read=134217700
18/04/15 05:02:12 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000001_0
18/04/15 05:02:12 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000002_0
18/04/15 05:02:12 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:12 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:12 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:12 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:268435456+134217728
18/04/15 05:02:12 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:12 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:12 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:12 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:12 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:12 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:12 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:12 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:12 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:12 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:13 INFO mapreduce.Job: map 100% reduce 0%
18/04/15 05:02:14 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:14 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:14 INFO mapred.LocalJobRunner:
18/04/15 05:02:14 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:14 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:14 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:14 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:15 INFO mapreduce.Job: map 25% reduce 0%
18/04/15 05:02:16 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:16 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:16 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:17 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000002_0 is done. And is in the process of committing
18/04/15 05:02:17 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:17 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000002_0' done.
18/04/15 05:02:17 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000002_0: Counters: 22
File System Counters
FILE: Number of bytes read=419064752
FILE: Number of bytes written=838197456
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=412653200
HDFS: Number of bytes written=0
HDFS: Number of read operations=31
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Map-Reduce Framework
Map input records=1342177
Map output records=1342177
Map output bytes=136902054
Map output materialized bytes=139586414
Input split bytes=123
Combine input records=0
Spilled Records=2684354
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=83
Total committed heap usage (bytes)=413663232
File Input Format Counters
Bytes Read=134217700
18/04/15 05:02:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000002_0
18/04/15 05:02:17 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000003_0
18/04/15 05:02:17 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:17 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:17 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:17 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:402653184+134217728
18/04/15 05:02:17 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:17 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:17 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:17 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:17 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:17 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:18 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:18 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:18 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:18 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:18 INFO mapreduce.Job: map 100% reduce 0%
18/04/15 05:02:19 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:19 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:20 INFO mapred.LocalJobRunner:
18/04/15 05:02:20 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:20 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:20 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
18/04/15 05:02:20 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
18/04/15 05:02:20 INFO mapreduce.Job: map 38% reduce 0%
18/04/15 05:02:21 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:21 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:21 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586498 bytes
18/04/15 05:02:22 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000003_0 is done. And is in the process of committing
18/04/15 05:02:22 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:22 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000003_0' done.
18/04/15 05:02:22 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000003_0: Counters: 22
File System Counters
FILE: Number of bytes read=558652283
FILE: Number of bytes written=1117370530
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=546871000
HDFS: Number of bytes written=0
HDFS: Number of read operations=33
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Map-Reduce Framework
Map input records=1342178
Map output records=1342178
Map output bytes=136902156
Map output materialized bytes=139586518
Input split bytes=123
Combine input records=0
Spilled Records=2684356
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=82
Total committed heap usage (bytes)=430440448
File Input Format Counters
Bytes Read=134217800
18/04/15 05:02:22 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000003_0
18/04/15 05:02:22 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000004_0
18/04/15 05:02:22 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:22 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:22 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:22 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:805306368+134217728
18/04/15 05:02:22 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:22 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:22 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:22 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:22 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:22 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:23 INFO mapreduce.Job: map 100% reduce 0%
18/04/15 05:02:23 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:23 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:23 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:23 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:25 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:25 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:25 INFO mapred.LocalJobRunner:
18/04/15 05:02:25 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:25 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:25 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:25 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:26 INFO mapreduce.Job: map 50% reduce 0%
18/04/15 05:02:26 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:26 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:26 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:28 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000004_0 is done. And is in the process of committing
18/04/15 05:02:28 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:28 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000004_0' done.
18/04/15 05:02:28 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000004_0: Counters: 22
File System Counters
FILE: Number of bytes read=698239710
FILE: Number of bytes written=1396543396
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=681088700
HDFS: Number of bytes written=0
HDFS: Number of read operations=35
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Map-Reduce Framework
Map input records=1342177
Map output records=1342177
Map output bytes=136902054
Map output materialized bytes=139586414
Input split bytes=123
Combine input records=0
Spilled Records=2684354
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=147
Total committed heap usage (bytes)=492306432
File Input Format Counters
Bytes Read=134217700
18/04/15 05:02:28 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000004_0
18/04/15 05:02:28 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000005_0
18/04/15 05:02:28 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:28 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:28 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:28 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:671088640+134217728
18/04/15 05:02:28 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:28 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:28 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:28 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:28 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:28 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:28 INFO mapreduce.Job: map 100% reduce 0%
18/04/15 05:02:28 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:28 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:28 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:28 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:30 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:30 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:30 INFO mapred.LocalJobRunner:
18/04/15 05:02:30 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:30 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:30 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:30 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:31 INFO mapreduce.Job: map 63% reduce 0%
18/04/15 05:02:32 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:32 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:32 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:33 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000005_0 is done. And is in the process of committing
18/04/15 05:02:33 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:33 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000005_0' done.
18/04/15 05:02:33 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000005_0: Counters: 22
File System Counters
FILE: Number of bytes read=837826625
FILE: Number of bytes written=1675716262
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=815306400
HDFS: Number of bytes written=0
HDFS: Number of read operations=37
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Map-Reduce Framework
Map input records=1342177
Map output records=1342177
Map output bytes=136902054
Map output materialized bytes=139586414
Input split bytes=123
Combine input records=0
Spilled Records=2684354
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=82
Total committed heap usage (bytes)=488636416
File Input Format Counters
Bytes Read=134217700
18/04/15 05:02:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000005_0
18/04/15 05:02:33 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000006_0
18/04/15 05:02:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:33 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:33 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:33 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:536870912+134217728
18/04/15 05:02:33 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:33 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:33 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:33 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:33 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:33 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:34 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:34 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:34 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:34 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:34 INFO mapreduce.Job: map 100% reduce 0%
18/04/15 05:02:35 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:35 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:35 INFO mapred.LocalJobRunner:
18/04/15 05:02:35 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:35 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:35 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:35 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:36 INFO mapreduce.Job: map 75% reduce 0%
18/04/15 05:02:37 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:37 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:37 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:38 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000006_0 is done. And is in the process of committing
18/04/15 05:02:38 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:38 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000006_0' done.
18/04/15 05:02:38 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000006_0: Counters: 22
File System Counters
FILE: Number of bytes read=977413540
FILE: Number of bytes written=1954889128
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=949524100
HDFS: Number of bytes written=0
HDFS: Number of read operations=39
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Map-Reduce Framework
Map input records=1342177
Map output records=1342177
Map output bytes=136902054
Map output materialized bytes=139586414
Input split bytes=123
Combine input records=0
Spilled Records=2684354
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=79
Total committed heap usage (bytes)=488112128
File Input Format Counters
Bytes Read=134217700
18/04/15 05:02:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000006_0
18/04/15 05:02:38 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000007_0
18/04/15 05:02:38 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:38 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:38 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:38 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:939524096+60475904
18/04/15 05:02:38 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:38 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:38 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:38 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:38 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:38 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:39 INFO mapred.LocalJobRunner:
18/04/15 05:02:39 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:39 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:39 INFO mapred.MapTask: bufstart = 0; bufend = 61685418; bufvoid = 104857600
18/04/15 05:02:39 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23795364(95181456); length = 2419033/6553600
18/04/15 05:02:39 INFO mapreduce.Job: map 88% reduce 0%
18/04/15 05:02:40 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:40 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000007_0 is done. And is in the process of committing
18/04/15 05:02:40 INFO mapred.LocalJobRunner: map
18/04/15 05:02:40 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000007_0' done.
18/04/15 05:02:40 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000007_0: Counters: 22
File System Counters
FILE: Number of bytes read=977414035
FILE: Number of bytes written=2017784102
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1010000000
HDFS: Number of bytes written=0
HDFS: Number of read operations=41
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Map-Reduce Framework
Map input records=604759
Map output records=604759
Map output bytes=61685418
Map output materialized bytes=62894942
Input split bytes=123
Combine input records=0
Spilled Records=604759
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=76
Total committed heap usage (bytes)=492306432
File Input Format Counters
Bytes Read=60475900
18/04/15 05:02:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000007_0
18/04/15 05:02:40 INFO mapred.LocalJobRunner: map task executor complete.
18/04/15 05:02:40 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/04/15 05:02:40 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_r_000000_0
18/04/15 05:02:40 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:40 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:40 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:40 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6cd08990
18/04/15 05:02:40 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=344614496, maxSingleShuffleLimit=86153624, mergeThreshold=227445584, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/04/15 05:02:40 INFO reduce.EventFetcher: attempt_local2107661221_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/04/15 05:02:40 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000005_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000005_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:40 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000005_0
18/04/15 05:02:40 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000002_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000002_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000002_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000001_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000001_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000001_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000004_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000004_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:41 INFO mapreduce.Job: map 100% reduce 0%
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000004_0
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000007_0 decomp: 62894938 len: 62894942 to MEMORY
18/04/15 05:02:41 INFO reduce.InMemoryMapOutput: Read 62894938 bytes from map-output for attempt_local2107661221_0001_m_000007_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 62894938, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->62894938
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000000_0: Shuffling to disk since 139586514 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000000_0 decomp: 139586514 len: 139586518 to DISK
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586518 bytes from map-output for attempt_local2107661221_0001_m_000000_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000003_0: Shuffling to disk since 139586514 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000003_0 decomp: 139586514 len: 139586518 to DISK
18/04/15 05:02:42 INFO reduce.OnDiskMapOutput: Read 139586518 bytes from map-output for attempt_local2107661221_0001_m_000003_0
18/04/15 05:02:42 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000006_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:42 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000006_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:43 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000006_0
18/04/15 05:02:43 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/04/15 05:02:43 INFO mapred.LocalJobRunner: 8 / 8 copied.
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 7 on-disk map-outputs
18/04/15 05:02:43 INFO mapred.Merger: Merging 1 sorted segments
18/04/15 05:02:43 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 62894925 bytes
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: Merged 1 segments, 62894938 bytes to disk to satisfy reduce memory limit
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: Merging 8 files, 1040000048 bytes from disk
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/04/15 05:02:43 INFO mapred.Merger: Merging 8 sorted segments
18/04/15 05:02:43 INFO mapred.Merger: Down to the last merge-pass, with 8 segments left of total size: 1039999912 bytes
18/04/15 05:02:43 INFO mapred.LocalJobRunner: 8 / 8 copied.
18/04/15 05:02:43 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/04/15 05:02:52 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:02:53 INFO mapreduce.Job: map 100% reduce 87%
18/04/15 05:02:57 INFO mapred.Task: Task:attempt_local2107661221_0001_r_000000_0 is done. And is in the process of committing
18/04/15 05:02:57 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:02:57 INFO mapred.Task: Task attempt_local2107661221_0001_r_000000_0 is allowed to commit now
18/04/15 05:02:57 INFO output.FileOutputCommitter: Saved output of task 'attempt_local2107661221_0001_r_000000_0' to hdfs://localhost:8020/user/mapred/sorted-data/_temporary/0/task_local2107661221_0001_r_000000
18/04/15 05:02:57 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:02:57 INFO mapred.Task: Task 'attempt_local2107661221_0001_r_000000_0' done.
18/04/15 05:02:57 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_r_000000_0: Counters: 29
File System Counters
FILE: Number of bytes read=3057414387
FILE: Number of bytes written=3057784150
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1010000000
HDFS: Number of bytes written=1000000000
HDFS: Number of read operations=44
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=10000000
Reduce shuffle bytes=1040000048
Reduce input records=10000000
Reduce output records=10000000
Spilled Records=10000000
Shuffled Maps =8
Failed Shuffles=0
Merged Map outputs=8
GC time elapsed (ms)=69
Total committed heap usage (bytes)=500170752
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=1000000000
18/04/15 05:02:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_r_000000_0
18/04/15 05:02:57 INFO mapred.LocalJobRunner: reduce task executor complete.
18/04/15 05:02:58 INFO mapreduce.Job: map 100% reduce 100%
18/04/15 05:02:58 INFO mapreduce.Job: Job job_local2107661221_0001 completed successfully
18/04/15 05:02:58 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=7945392555
FILE: Number of bytes written=12897161338
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5848096700
HDFS: Number of bytes written=1000000000
HDFS: Number of read operations=316
HDFS: Number of large read operations=0
HDFS: Number of write operations=20
Map-Reduce Framework
Map input records=10000000
Map output records=10000000
Map output bytes=1020000000
Map output materialized bytes=1040000048
Input split bytes=984
Combine input records=0
Combine output records=0
Reduce input groups=10000000
Reduce shuffle bytes=1040000048
Reduce input records=10000000
Reduce output records=10000000
Spilled Records=29395241
Shuffled Maps =8
Failed Shuffles=0
Merged Map outputs=8
GC time elapsed (ms)=890
Total committed heap usage (bytes)=4084727808
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1000000000
File Output Format Counters
Bytes Written=1000000000
18/04/15 05:02:58 INFO terasort.TeraSort: done
sanity check
[mapred@ip-172-31-44-80 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teravalidate sorted-data report
18/04/15 05:06:35 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/15 05:06:35 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/15 05:06:36 INFO input.FileInputFormat: Total input files to process : 1
Spent 37ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
18/04/15 05:06:36 INFO mapreduce.JobSubmitter: number of splits:1
18/04/15 05:06:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1823691639_0001
18/04/15 05:06:36 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/15 05:06:36 INFO mapreduce.Job: Running job: job_local1823691639_0001
18/04/15 05:06:36 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/15 05:06:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:06:36 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:06:36 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/04/15 05:06:36 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/15 05:06:36 INFO mapred.LocalJobRunner: Starting task: attempt_local1823691639_0001_m_000000_0
18/04/15 05:06:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:06:36 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:06:36 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:06:36 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/sorted-data/part-r-00000:0+1000000000
18/04/15 05:06:36 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:06:36 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:06:36 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:06:36 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:06:36 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:06:36 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:06:37 INFO mapreduce.Job: Job job_local1823691639_0001 running in uber mode : false
18/04/15 05:06:37 INFO mapreduce.Job: map 0% reduce 0%
18/04/15 05:06:42 INFO mapred.LocalJobRunner:
18/04/15 05:06:42 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:06:42 INFO mapred.MapTask: Spilling map output
18/04/15 05:06:42 INFO mapred.MapTask: bufstart = 0; bufend = 82; bufvoid = 104857600
18/04/15 05:06:42 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600
18/04/15 05:06:42 INFO mapred.MapTask: Finished spill 0
18/04/15 05:06:42 INFO mapred.Task: Task:attempt_local1823691639_0001_m_000000_0 is done. And is in the process of committing
18/04/15 05:06:42 INFO mapred.LocalJobRunner: map
18/04/15 05:06:42 INFO mapred.Task: Task 'attempt_local1823691639_0001_m_000000_0' done.
18/04/15 05:06:42 INFO mapred.Task: Final Counters for attempt_local1823691639_0001_m_000000_0: Counters: 22
File System Counters
FILE: Number of bytes read=302147
FILE: Number of bytes written=675681
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1000000000
HDFS: Number of bytes written=0
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Map-Reduce Framework
Map input records=10000000
Map output records=3
Map output bytes=82
Map output materialized bytes=94
Input split bytes=123
Combine input records=0
Spilled Records=3
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=74
Total committed heap usage (bytes)=296747008
File Input Format Counters
Bytes Read=1000000000
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Finishing task: attempt_local1823691639_0001_m_000000_0
18/04/15 05:06:42 INFO mapred.LocalJobRunner: map task executor complete.
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Starting task: attempt_local1823691639_0001_r_000000_0
18/04/15 05:06:42 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:06:42 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:06:42 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:06:42 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@5cde5e61
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=333971456, maxSingleShuffleLimit=83492864, mergeThreshold=220421168, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/04/15 05:06:42 INFO reduce.EventFetcher: attempt_local1823691639_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/04/15 05:06:42 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1823691639_0001_m_000000_0 decomp: 90 len: 94 to MEMORY
18/04/15 05:06:42 INFO reduce.InMemoryMapOutput: Read 90 bytes from map-output for attempt_local1823691639_0001_m_000000_0
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 90, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->90
18/04/15 05:06:42 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
18/04/15 05:06:42 INFO mapred.Merger: Merging 1 sorted segments
18/04/15 05:06:42 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: Merged 1 segments, 90 bytes to disk to satisfy reduce memory limit
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: Merging 1 files, 94 bytes from disk
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/04/15 05:06:42 INFO mapred.Merger: Merging 1 sorted segments
18/04/15 05:06:42 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/04/15 05:06:42 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/04/15 05:06:42 INFO mapred.Task: Task:attempt_local1823691639_0001_r_000000_0 is done. And is in the process of committing
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/04/15 05:06:42 INFO mapred.Task: Task attempt_local1823691639_0001_r_000000_0 is allowed to commit now
18/04/15 05:06:42 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1823691639_0001_r_000000_0' to hdfs://localhost:8020/user/mapred/report/_temporary/0/task_local1823691639_0001_r_000000
18/04/15 05:06:42 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:06:42 INFO mapred.Task: Task 'attempt_local1823691639_0001_r_000000_0' done.
18/04/15 05:06:42 INFO mapred.Task: Final Counters for attempt_local1823691639_0001_r_000000_0: Counters: 29
File System Counters
FILE: Number of bytes read=302367
FILE: Number of bytes written=675775
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1000000000
HDFS: Number of bytes written=24
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=94
Reduce input records=3
Reduce output records=1
Spilled Records=3
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=296747008
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=24
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Finishing task: attempt_local1823691639_0001_r_000000_0
18/04/15 05:06:42 INFO mapred.LocalJobRunner: reduce task executor complete.
18/04/15 05:06:43 INFO mapreduce.Job: map 100% reduce 100%
18/04/15 05:06:43 INFO mapreduce.Job: Job job_local1823691639_0001 completed successfully
18/04/15 05:06:43 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=604514
FILE: Number of bytes written=1351456
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2000000000
HDFS: Number of bytes written=24
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=10000000
Map output records=3
Map output bytes=82
Map output materialized bytes=94
Input split bytes=123
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=94
Reduce input records=3
Reduce output records=1
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=74
Total committed heap usage (bytes)=593494016
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1000000000
File Output Format Counters
Bytes Written=24
参考
- Amazon | Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale | Tom White | Software Development https://www.amazon.co.jp/Hadoop-Definitive-Storage-Analysis-Internet/dp/1491901632