【混沌初开】001

编程入门行业动态更新时间:2024-10-26 00:24:42

【<a href=https://www.elefans.com/category/jswz/34/1766258.html style= 混沌初开】001"/>

【混沌初开】001

1. 通过VMware Workstation安装一台Ubuntu操作系统的虚拟机（Master.Hadoop） 2. 配置虚拟机IP地址和Hostname 2.1 配置IP： root@Master:~# cat /etc/network/interfaces # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(6). # The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet static address 192.168.142.141 netmask 255.255.255.0 gateway 192.168.142.1 root@Master:~# 2.2 配置Hostname： root@Master:~# cat /etc/hostname Master.Hadoop 3. 创建一个hadoop用户组： 3.1 root用户执行：groupadd hadoop 4. 新建一个hadoop用户： 4.1 root用户执行：useradd -s /bin/bash -d /home/hadoop -m hadoop -g hadoop 5. 安装JDK并配置环境变量 5.1 下载Linux版本的JDK包，上传到/opt/java下 5.2 解压安装包，得到解压后目录：/opt/java/jdk1.7.0_79 5.3 环境变量配置：(追加配置) vi /etc/profile ... export JAVA_HOME=/opt/java/jdk1.7.0_79 export HADOOP_HOME=/home/hadoop/hadoop-2.6.4 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar: export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin: 6. 复制虚拟机，在VMware中打开并修改IP地址和Hostname，形成三台虚拟Ubuntu节点 6.1 Master.Hadoop：192.168.142.141 6.2 Slave1.Hadoop：192.168.142.142 6.3 Slave2.Hadoop：192.168.142.143 6.4 修改IP地址：vi /etc/network/interfaces 6.5 修改Hostname：vi /etc/hostname 7. 配置三台虚拟机的hosts文件（将三台虚拟机Hostname映射加入） root@Master:~# cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 ubuntu-cc.localdomain ubuntu-cc 192.168.142.141 Master.Hadoop 192.168.142.142 Slave1.Hadoop 192.168.142.143 Slave2.Hadoop 8. 实现ssh免密钥登录 8.1 在Master的hadoop用户下：ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 8.2 在Master的hadoop用户下：cp id_dsa.pub authorized_keys 8.3 在Slave1和Slave2分别执行：ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 8.4 将Master的id_dsa.pub追加到Slave1和Slave2的authorized_keys文件中 8.5 测试免密钥登录：ssh Slave1.Hadoop 9. 下载最新Hadoop安装包：.html 10. 将下载后的hadoop-2.6.4.tar.gz上传到hadoop用户的根目录：/home/hadoop；并解压 11. 到/home/hadoop/hadoop-2.6.4/etc/hadoop目录下，开始配置Hadoop 11.1 vi core-site.xml

 1  <?xml version="1.0" encoding="UTF-8"?>
 2         <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3         <!--
 4           Licensed under the Apache License, Version 2.0 (the "License");
 5           you may not use this file except in compliance with the License.
 6           You may obtain a copy of the License at
 7  
 8             .0
 9  
10           Unless required by applicable law or agreed to in writing, software
11           distributed under the License is distributed on an "AS IS" BASIS,
12           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13           See the License for the specific language governing permissions and
14           limitations under the License. See accompanying LICENSE file.
15         -->
16  
17         <!-- Put site-specific property overrides in this file. -->
18  
19         <configuration>
20                 <property>
21                         <name>hadoop.tmp.dir</name>
22                         <value>/home/hadoop/tmp</value>
23                         <description>Abase for other temporary directories.</description>
24                 </property>
25                 <property>
26                         <name>fs.defaultFS</name>
27                         <value>hdfs://Master.Hadoop:9000</value>
28                 </property>
29                 <property>
30                         <name>io.file.buffer.size</name>
31                         <value>4096</value>
32                 </property>
33         </configuration>

11.2 vi hadoop-env.sh和yarn-env.sh，在开头添加如下环境变量（必须配置绝对路径，否则启动时会报 JAVA_HOME is not set and could not be found.） export JAVA_HOME=/opt/java/jdk1.7.0_79 11.3 vi hdfs-site.xml

 1 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 2         <!--
 3           Licensed under the Apache License, Version 2.0 (the "License");
 4           you may not use this file except in compliance with the License.
 5           You may obtain a copy of the License at
 6  
 7             .0
 8  
 9           Unless required by applicable law or agreed to in writing, software
10           distributed under the License is distributed on an "AS IS" BASIS,
11           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12           See the License for the specific language governing permissions and
13           limitations under the License. See accompanying LICENSE file.
14         -->
15  
16         <!-- Put site-specific property overrides in this file. -->
17  
18         <configuration>
19                 <property>
20                         <name>dfs.namenode.name.dir</name>
21                         <value>file:///home/hadoop/dfs/name</value>
22                 </property>
23                 <property>
24                         <name>dfs.datanode.data.dir</name>
25                         <value>file:///home/hadoop/dfs/data</value>
26                 </property>
27                 <property>
28                         <name>dfs.replication</name>
29                         <value>2</value>
30                 </property>
31                 <property>
32                         <name>dfs.nameservices</name>
33                         <value>hadoop-cluster1</value>
34                 </property>
35                 <property>
36                         <name>dfs.namenode.secondary.http-address</name>
37                         <value>Master.Hadoop:50090</value>
38                 </property>
39                 <property>
40                         <name>dfs.webhdfs.enabled</name>
41                         <value>true</value>
42                 </property>
43         </configuration>

11.4 vi mapred-site.xml（若不存在则从template文件copy）

 1  <?xml version="1.0"?>
 2         <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3         <!--
 4           Licensed under the Apache License, Version 2.0 (the "License");
 5           you may not use this file except in compliance with the License.
 6           You may obtain a copy of the License at
 7  
 8             .0
 9  
10           Unless required by applicable law or agreed to in writing, software
11           distributed under the License is distributed on an "AS IS" BASIS,
12           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13           See the License for the specific language governing permissions and
14           limitations under the License. See accompanying LICENSE file.
15         -->
16  
17         <!-- Put site-specific property overrides in this file. -->
18  
19         <configuration>
20                 <property>
21                         <name>mapreduce.framework.name</name>
22                         <value>yarn</value>
23                         <final>true</final>
24                 </property>
25  
26                 <property>
27                         <name>mapreduce.jobtracker.http.address</name>
28                         <value>Master.Hadoop:50030</value>
29                 </property>
30                 <property>
31                         <name>mapreduce.jobhistory.address</name>
32                         <value>Master.Hadoop:10020</value>
33                 </property>
34                 <property>
35                         <name>mapreduce.jobhistory.webapp.address</name>
36                         <value>Master.Hadoop:19888</value>
37                 </property>
38                 <property>
39                         <name>mapred.job.tracker</name>
40                         <value>http://Master.Hadoop:9001</value>
41                 </property>
42         </configuration>

11.5 vi yarn-site.xml

 1 <?xml version="1.0"?>
 2         <!--
 3           Licensed under the Apache License, Version 2.0 (the "License");
 4           you may not use this file except in compliance with the License.
 5           You may obtain a copy of the License at
 6  
 7             .0
 8  
 9           Unless required by applicable law or agreed to in writing, software
10           distributed under the License is distributed on an "AS IS" BASIS,
11           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12           See the License for the specific language governing permissions and
13           limitations under the License. See accompanying LICENSE file.
14         -->
15         <configuration>
16  
17                 <!-- Site specific YARN configuration properties -->
18                 <property>
19                         <name>yarn.resourcemanager.hostname</name>
20                         <value>Master.Hadoop</value>
21                 </property>
22                 <property>
23                         <name>yarn.nodemanager.aux-services</name>
24                         <value>mapreduce_shuffle</value>
25                 </property>
26                 <property>
27                         <name>yarn.resourcemanager.address</name>
28                         <value>Master.Hadoop:8032</value>
29                 </property>
30                 <property>
31                         <name>yarn.resourcemanager.scheduler.address</name>
32                         <value>Master.Hadoop:8030</value>
33                 </property>
34                 <property>
35                         <name>yarn.resourcemanager.resource-tracker.address</name>
36                         <value>Master.Hadoop:8031</value>
37                 </property>
38                 <property>
39                         <name>yarn.resourcemanager.admin.address</name>
40                         <value>Master.Hadoop:8033</value>
41                 </property>
42                 <property>
43                         <name>yarn.resourcemanager.webapp.address</name>
44                         <value>Master.Hadoop:8088</value>
45                 </property>
46         </configuration>

12. 在hadoop根目录下创建目录：mkdir tmp dfs dfs/name dfs/data（到此：单机的Server配置完成） 13. 单机验证 13.1 hadoop@Master:~/hadoop-2.6.4/bin$ hdfs namenode -format 13.2 hadoop@Master:~/hadoop-2.6.4/bin$ cd ../sbin/ 13.3 hadoop@Master:~/hadoop-2.6.4/sbin$ start-dfs.sh 13.4 hadoop@Master:~/hadoop-2.6.4/sbin$ start-yarn.sh 13.5 hadoop@Master:~/hadoop-2.6.4/sbin$ mr-jobhistory-daemon.sh start historyserver 13.6 验证地址1：http://192.168.142.141:50070/ 13.7 验证地址2：http://192.168.142.141:8088/ 14. 执行：hadoop dfsadmin -report hadoop@Master:~/hadoop-2.6.4/bin$ hadoop dfsadmin -report DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Configured Capacity: 121819234304 (113.45 GB) Present Capacity: 111630487552 (103.96 GB) DFS Remaining: 111630434304 (103.96 GB) DFS Used: 53248 (52 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Live datanodes (2): Name: 192.168.142.143:50010 (Slave2.Hadoop) Hostname: Slave2.Hadoop Decommission Status : Normal Configured Capacity: 60909617152 (56.73 GB) DFS Used: 24576 (24 KB) Non DFS Used: 5094354944 (4.74 GB) DFS Remaining: 55815237632 (51.98 GB) DFS Used%: 0.00% DFS Remaining%: 91.64% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Tue Feb 23 17:18:59 CST 2016 Name: 192.168.142.142:50010 (Slave1.Hadoop) Hostname: Slave1.Hadoop Decommission Status : Normal Configured Capacity: 60909617152 (56.73 GB) DFS Used: 28672 (28 KB) Non DFS Used: 5094391808 (4.74 GB) DFS Remaining: 55815196672 (51.98 GB) DFS Used%: 0.00% DFS Remaining%: 91.64% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Tue Feb 23 17:18:59 CST 2016 15. 配置Slaves： 15.1 进入/home/hadoop/hadoop-2.6.4/etc/hadoop 15.2 添加Node配置 hadoop@Master:~/hadoop-2.6.4/etc/hadoop$ vi slaves Slave1.Hadoop Slave2.Hadoop 16. 分发Hadoop到Slave机器： 16.1 scp -r /home/hadoop/hadoop-2.6.4 192.168.142.142:/home/hadoop 16.2 scp -r /home/hadoop/hadoop-2.6.4 192.168.142.143:/home/hadoop 17. 最终验证 17.1 启动服务 hadoop@Master:~/hadoop-2.6.4/sbin$ ./stop-all.sh This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh Stopping namenodes on [Master.Hadoop] Master.Hadoop: stopping namenode Slave1.Hadoop: stopping datanode Slave2.Hadoop: stopping datanode stopping yarn daemons stopping resourcemanager Slave2.Hadoop: stopping nodemanager Slave1.Hadoop: stopping nodemanager no proxyserver to stop hadoop@Master:~/hadoop-2.6.4/sbin$ ./start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [Master.Hadoop] Master.Hadoop: starting namenode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-namenode-Master.Hadoop.out Slave2.Hadoop: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-Slave2.Hadoop.out Slave1.Hadoop: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-Slave1.Hadoop.out starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-Master.Hadoop.out Slave1.Hadoop: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-Slave1.Hadoop.out Slave2.Hadoop: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-Slave2.Hadoop.out hadoop@Master:~/hadoop-2.6.4/sbin$ 17.2 Master机器进程 hadoop@Master:~/hadoop-2.6.4/sbin$ jps 4870 NameNode 5112 ResourceManager 2897 JobHistoryServer 5384 Jps 17.3 Slave机器进程 hadoop@Slave1:~$ jps 2614 DataNode 2886 Jps 2758 NodeManager 18. 查看Nodes：http://192.168.142.141:8088/cluster/nodes 19. 【注】 19.1 【注1】：在15.2配置时，若不加入Master，则最终Nodes中是没有Master节点的；同时Master上也会少一个DataNode进程 19.2 【注2】：sudo无法执行时，在/etc/sudoers中加入：hadoop ALL=(ALL) ALL（hadoop为需要sudo的用户名） 20. Eclipse测试程序运行流程 20.1【数据准备】 hadoop@Master:~$ hadoop fs -mkdir /input hadoop@Master:~$ hadoop fs -put ./hadoop-2.6.4/README.txt /input hadoop@Master:~$ hadoop fs -chmod 777 / 20.2【代码】

 1 package com.ttfisher;
 2 
 3 import java.io.IOException;
 4 import java.util.StringTokenizer;
 5 import org.apache.hadoop.conf.Configuration;
 6 import org.apache.hadoop.fs.Path;
 7 import org.apache.hadoop.io.IntWritable;
 8 import org.apache.hadoop.io.Text;
 9 import org.apache.hadoop.mapreduce.Job;
10 import org.apache.hadoop.mapreduce.Mapper;
11 import org.apache.hadoop.mapreduce.Reducer;
12 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
13 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
14 import org.apache.hadoop.util.GenericOptionsParser;
15 
16 public class WordCount {
17 
18     public static class TokenizerMapper    extends Mapper<Object, Text, Text, IntWritable> {
19     
20         private final static IntWritable one = new IntWritable(1);
21         private Text word = new Text();
22         
23         public void map(Object key, Text value, Context context) 
24             throws IOException, InterruptedException {
25 
26             StringTokenizer itr = new StringTokenizer(value.toString());
27             while (itr.hasMoreTokens()) {
28                 word.set(itr.nextToken());
29                 context.write(word, one);
30             }
31         }
32     }
33 
34     public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
35 
36         private IntWritable result = new IntWritable();
37         
38         public void reduce(Text key, Iterable<IntWritable> values,Context context) 
39             throws IOException, InterruptedException {
40 
41             int sum = 0;
42             for (IntWritable val : values) {
43                 sum += val.get();
44             }
45             result.set(sum);
46             context.write(key, result);
47         }
48     }
49 
50     public static void main(String[] args) throws Exception {
51 
52         Configuration conf = new Configuration();
53         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
54         if (otherArgs.length != 2) {
55             System.err.println("Usage: wordcount <in> <out>");
56             System.exit(2);
57         }
58 
59         Job job = new Job(conf, "word count");
60         job.setJarByClass(WordCount.class);
61         job.setMapperClass(TokenizerMapper.class);
62         job.setCombinerClass(IntSumReducer.class);
63         job.setReducerClass(IntSumReducer.class);
64         job.setOutputKeyClass(Text.class);
65         job.setOutputValueClass(IntWritable.class);
66         FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
67         FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
68         System.exit(job.waitForCompletion(true) ? 0 : 1);
69     }
70 }

20.3【运行参数】

hdfs://Master.Hadoop:9000/input/README.txt hdfs://Master.Hadoop:9000/output 20.4【VM argument 】 -Xmx512m 20.5【Run on hadoop】

转载于:.html

更多推荐

【混沌初开】001

本文发布于:2024-02-13 00:23:37，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1689990.html