hadoop2.3单机搭建

| 2014年3月23日

没事整理了之前搭建hadoop的过程,这里使用了最新的hadoop版本,想在单机上做一些测试,顺手也就整理了一下这个文档。

一、准备环境

1.Hadoop是用Java开发的,必须要安装JDK1.6或更高版本

apt-get install openjdk-6-jdk

2.Hadoop是通过SSH来启动slave主机中的守护进程,必须安装OpenSSH

apt-get install openssh-server

3.Hadoop更新比较快,我们采用最新版hadoop2.3来安装

4.配置对应Hosts记录,关闭iptables和selinux(过程略)

5.创建相同用户及配置无密码认证

二.配置SSH无密码登录

helight:data1$ ssh-keygen -t rsa #一直回车生成

helight:data1$ cd

helight:~$ ls

helight:~$ls -a

. .. .bash_logout .bashrc .profile .ssh

helight:~$ cd .ssh/

helight:.ssh$ ls

id_rsa id_rsa.pub

helight:.ssh$ cat id_rsa.pub » authorized_keys

helight:.ssh$ ls

authorized_keys id_rsa id_rsa.pub

helight:.ssh$ chmod 600 authorized_keys

hhelight:.ssh$ chmod 700 ../.ssh #目录权限必须设置700

root@debian:/data1# vim /etc/ssh/sshd_config #开启RSA认证

RSAAuthentication yes

PubkeyAuthentication yes

AuthorizedKeysFile %h/.ssh/authorized_keys

三、Hadoop的安装与配置

1.下载

http://hadoop.apache.org/下载hadoop Release 版本,这里我下载了2.3.0的版本,目前是最新版本。解压到本地文件系统中,我下载到了/data1/目录中。

$tar xzf hadoop-2.3.0.tar.gz

2.用户设置环境变量 编辑~/.bashrc,添加JAVA_HOME,PATH

JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64

export JAVA_HOME export HADOOP_HOME=/data1/hadoop/

export HADOOP_HOME

PATH=$PATH:/data1/hadoop/bin:/data1/hadoop-2.3.0/sbin

export PATH

编辑完成之后,重新打开一个shell终端,这个大家应该都知道是为了上面配置的环境变量在新终端中生效,或者执行source ~/.bashrc

再来看看配置结果:看一下hadoop可不可以执行。

helight:data1$ hadoop version

Hadoop 2.3.0

Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123

Compiled by jenkins on 2014-02-11T13:40Z

Compiled with protoc 2.5.0

From source with checksum dfe46336fbc6a044bc124392ec06b85

This command was run using /data1/hadoop-2.3.0/share/hadoop/common/hadoop-common-2.3.0.jar

helight:data1$

3.hadoop环境变量配置

cd hadoop-2.3.0/etc/hadoop

vim hadoop-env.sh

The java implementation to use.

JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64

export JAVA_HOME

export JAVA_HOME=${JAVA_HOME}

export HADOOP_HOME=/data1/hadoop-2.3.0/

export HADOOP_HOME

PATH=$PATH:/data1/hadoop-2.3.0/bin/:/data1/hadoop-2.3.0/sbin

export PATH

4.slaves设置从节点

只有一个所以是localhost,要是多个需要设置hosts和这里添加域名,并且设置ssh登录无密码。

helight:hadoop$ vi slaves

localhost

5.hdfs核心配置文件:core-site.xml

fs.defaultFS

hdfs://localhost:9000

io.file.buffer.size

131072

hadoop.tmp.dir

file:/home/hadoop/tmp

6.hdfs存储配置文件:hdfs-site.xml

dfs.namenode.name.dir

file:/home/hadoop/dfs/name

dfs.namenode.data.dir

file:/home/hadoop/dfs/data

dfs.replication #数据副本数量,默认3,设置1

1

7.mapreaduce管理器配置文件:yarn-site.xml

yarn.resourcemanager.address

localhost:8032

yarn.resourcemanager.scheduler.address

localhost:8030

yarn.resourcemanager.resource-tracker.address

localhost:8031

yarn.resourcemanager.admin.address

localhost:8033

yarn.resourcemanager.webapp.address

localhost:8088

yarn.nodemanager.aux-services

mapreduce_shuffle

   yarn.nodemanager.aux-services.mapreduce.shuffle.class

   org.apache.hadoop.mapred.ShuffleHandler

8.mapreaduce框架mapred-site.xml

mapreduce.framework.name

yarn

mapreduce.jobhistory.address

localhost:10020

mapreduce.jobhistory.webapp.address

localhost:19888

四、格式化文件系统并启动

1.格式化新的分布式文件系统(hdfs namenode -format)

初始化hdfs目录结构和数据。

helight:sbin$ hdfs namenode -format

2.启动hadoop:

这里使用了start-all.sh ,实际上这个脚本是启动了start-dfs.sh和start-yarn.sh ,所以也可以分阶段启动,start-dfs.sh启动了hdfs,start-yarn.sh 启动mapreduce调度框架。

helight:sbin$ ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

14/03/23 22:38:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Starting namenodes on [localhost]

localhost: starting namenode, logging to /data1/hadoop-2.3.0/logs/hadoop-helight-namenode-debian.out

localhost: starting datanode, logging to /data1/hadoop-2.3.0/logs/hadoop-helight-datanode-debian.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /data1/hadoop-2.3.0/logs/hadoop-helight-secondarynamenode-debian.out

14/03/23 22:39:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

starting yarn daemons

starting resourcemanager, logging to /data1/hadoop-2.3.0/logs/yarn-helight-resourcemanager-debian.out

localhost: starting nodemanager, logging to /data1/hadoop-2.3.0/logs/yarn-helight-nodemanager-debian.out

3使用jps检查守护进程是否启动

helight:sbin$ jps

14506 NodeManager

14205 SecondaryNameNode

14720 Jps

14072 DataNode

13981 NameNode

14411 ResourceManager

4.查看集群状态

hdfs dfsadmin -report

14/03/23 22:43:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Configured Capacity: 83225169920 (77.51 GB)

Present Capacity: 78803722240 (73.39 GB)

DFS Remaining: 78803697664 (73.39 GB)

DFS Used: 24576 (24 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0


Datanodes available: 1 (1 total, 0 dead)

Live datanodes:

Name: 127.0.0.1:50010 (localhost)

Hostname: debian.xu

Decommission Status : Normal

Configured Capacity: 83225169920 (77.51 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 4421447680 (4.12 GB)

DFS Remaining: 78803697664 (73.39 GB)

DFS Used%: 0.00%

DFS Remaining%: 94.69%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Last contact: Sun Mar 23 22:43:48 HKT 2014

5.通过web查看资源(http://localhost:8088

主要看mr进程管理

6、查看HDFS状态(http://localhost:50070

可以看到hdfs的运营情况,是hdfs的一个简单oss。

看完本文有收获?请分享给更多人

关注「黑光技术」,关注大数据+微服务