IDEA编写MapReduce词频统计并打包提交到Hadoop集群运行

编程入门行业动态更新时间:2024-10-26 09:34:03

IDEA编写MapReduce<a href=https://www.elefans.com/category/jswz/34/1767675.html style= 词频统计并打包提交到Hadoop集群运行"/>

IDEA编写MapReduce词频统计并打包提交到Hadoop集群运行

文章目录

前言
一、编写MapReduce（以词频统计为例）
- 1.数据集和需求
- 2.pom依赖
- 3.编写MapReduce
- 4.打包代码
二、提交到hadoop集群运行
- 1.将Windows下的jar包上传到虚拟机linux
- 2.在hadoop上运行MapReduce jar包

前言

如果数据集很小，需求不大，我们可以直接在IDEA上运行MapReduce程序得到结果，但数据集很大、需求很多、代码又很繁琐时，再用IDEA运行就非常浪费时间，所以这个时候便需要将代码打包提交到hadoop集群运行，可以节省很多时间。

一、编写MapReduce（以词频统计为例）

1.数据集和需求

数据集：

computer,huangxiaoming,85,86,41,75,93,42,85
computer,xuzheng,54,52,86,91,42
computer,huangbo,85,42,96,38
english,zhaobenshan,54,52,86,91,42,85,75
english,liuyifei,85,41,75,21,85,96,14
algorithm,liuyifei,75,85,62,48,54,96,15
computer,huangjiaju,85,75,86,85,85
english,liuyifei,76,95,86,74,68,74,48
english,huangdatou,48,58,67,86,15,33,85
algorithm,huanglei,76,95,86,74,68,74,48
algorithm,huangjiaju,85,75,86,85,85,74,86
computer,huangdatou,48,58,67,86,15,33,85
english,zhouqi,85,86,41,75,93,42,85,75,55,47,22
english,huangbo,85,42,96,38,55,47,22
algorithm,liutao,85,75,85,99,66
computer,huangzitao,85,86,41,75,93,42,85
math,wangbaoqiang,85,86,41,75,93,42,85
computer,liujialing,85,41,75,21,85,96,14,74,86
computer,liuyifei,75,85,62,48,54,96,15
computer,liutao,85,75,85,99,66,88,75,91
computer,huanglei,76,95,86,74,68,74,48
english,liujialing,75,85,62,48,54,96,15
math,huanglei,76,95,86,74,68,74,48
math,huangjiaju,85,75,86,85,85,74,86
math,liutao,48,58,67,86,15,33,85
english,huanglei,85,75,85,99,66,88,75,91
math,xuzheng,54,52,86,91,42,85,75
math,huangxiaoming,85,75,85,99,66,88,75,91
math,liujialing,85,86,41,75,93,42,85,75
english,huangxiaoming,85,86,41,75,93,42,85
algorithm,huangdatou,48,58,67,86,15,33,85
algorithm,huangzitao,85,86,41,75,93,42,85,75

需求：统计每类课程的参考人数和每类课程平均分

分析：
1.先计算每个学生的平均分
2.由于需要统计每个课程的参考人数，所以reduce接收的key需要接收课程course
3.在reduce中统计每类课程的平均分和参考人数

2.pom依赖

要在idea上编写MapReduce，pom需要一下几个依赖：
hadoop-hdfs，hadoop-client，hadoop-common
可以去，注意和本地安装的hadoop版本相对应。
下面是我的pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns=".0.0"xmlns:xsi=""xsi:schemaLocation=".0.0 .0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>org.example</groupId><artifactId>MapReduceTest</artifactId><version>1.0-SNAPSHOT</version><properties><mavenpiler.source>8</mavenpiler.source><mavenpiler.target>8</mavenpiler.target></properties><dependencies>

更多推荐

IDEA编写MapReduce词频统计并打包提交到Hadoop集群运行

本文发布于:2024-03-06 05:57:42，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1714542.html