围炉网

一行代码,一篇日志,一个梦想,一个世界

Running Cloudera in Distributed Mode

Running Cloudera in Distributed Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Distributed mode(multi node cluster)Prerequisite: Before starting Cloudera in distributed mode you must setup Cloudera in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine(cluster) on a single machine).Deploy Cloudera (CDH3) on Cluster:

COMMAND DESCRIPTION
for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done Before starting Cloudera in distributed mode first stop each cluster
update-alternatives –display hadoop-0.20-conf To list alternative Hadoop configurations on Your system
cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster Copy the default configuration to your custom directory
update-alternatives –install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50 To activate the new configuration on Your systems
update-alternatives –display hadoop-0.20-conf To Check the new configuration on Your systems
or
update-alternatives –set hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster
To manually set the configuration
vi /etc/hosts Then type
IP-add master(eg: 192.168.0.1 master)
IP-add slave(eg: 192.168.0.2 slave)
sudo apt-get install openssh-server openssh-client install ssh
ssh-keygen -t rsa -P "" generating rsa key for passwordless ssh
ssh-copy-id -i $HOME/.ssh/id_rsa.pub slave setting passwordless ssh
Now go to your custom directory (conf.cluster) and change configuration files
vi masters
then erase old contents and type master
masters file defines the namenodes of our multi-node cluster
vi slaves
then erase old contents and type slave
slaves file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will be run.
vi core-site.xml
then type:
<property>
  <name>fs.default.name</name>

  <value>hdfs://master:54310</value>

 </property>
Edit configuration file core-site.xml
vi mapred-site.xml
then type:
<property>
  <name>mapred.job.tracker</name>

  <value>master:54311</value>

 </property>
Edit configuration file mapred-site.xml
vi hdfs-site.xml
then type:

<property>
  <name>dfs.replication</name>

  <value>1</value>
  </property>

Edit configuration file hdfs-site.xml

(value=number of slaves)

Now copy /etc/hadoop-0.20/conf.cluster directory to all nodes in your cluster
update-alternatives –install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50 Set alternative rules on all nodes to activate your configuration.
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done

for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done

Restart the daemons on all nodes in your cluster using the service scripts so that the new configuration files are read and then stop them
su -s /bin/bash – hdfs -c 'hadoop namenode -format' Format namenode manually(Before starting namenode)
You must run the commands on the correct server, according to your role definition
/etc/init.d/hadoop-0.20-namenode start
 /etc/init.d/hadoop-0.20-secondarynamenode start
/etc/init.d/hadoop-0.20-jobtracker start

To start the daemons on namenode
on master
/etc/init.d/hadoop-0.20-datanode start
/etc/init.d/hadoop-0.20-tasktracker start
To start the daemons on datanode

on slave

Congratulations Cloudera CDH setup is completed

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

沪ICP备15009335号-2