博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Hadoop学习笔记——WordCount
阅读量:6295 次
发布时间:2019-06-22

本文共 3701 字,大约阅读时间需要 12 分钟。

1.在IDEA下新建工程,选择from Mevan

GroupId:WordCount

ArtifactId:com.hadoop.1st

Project name:WordCount

2.pom.xml文件

4.0.0
WordCount
com.hadoop.1st
1.0-SNAPSHOT
apache
http://maven.apache.org
org.apache.hadoop
hadoop-core
1.2.1
org.apache.hadoop
hadoop-common
2.7.1
maven-dependency-plugin
false
true
./lib

 3.main/java目录下新建WordCount.java文件

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import java.io.IOException;import java.util.StringTokenizer;/** * Created by common on 17-3-26. */public class WordCount {    public static class WordCountMap extends            Mapper
{ private final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer token = new StringTokenizer(line); while (token.hasMoreTokens()) { word.set(token.nextToken()); context.write(word, one); } } } public static class WordCountReduce extends Reducer
{ public void reduce(Text key, Iterable
values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf); job.setJarByClass(WordCount.class); job.setJobName("wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordCountMap.class); job.setReducerClass(WordCountReduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); }}

 4.在src同级目录下新建input目录,以及下面的test.segmented文件

test.segmented文件内容

aabbccddaacceeffffgghhaa

4.在run configuration下设置运行方式为Application

5.运行java文件,将会生成output目录,part-r-00000为运行的结果,下次运行必须删除output目录,否则会报错

 

转载地址:http://popta.baihongyu.com/

你可能感兴趣的文章
Linux state 方式 安装nginx 服务
查看>>
LNMP(php-fpm的pool,慢执行日志,定义open_bashdir,php-fpm进程管理
查看>>
Flask rst 文档转换为html格式文件
查看>>
python 安装第三方库pygame
查看>>
Linux下的grep命令详解
查看>>
磁盘系统管理
查看>>
Linux下ftp+ssl实现ftps
查看>>
JavaScript基础
查看>>
Nginx之反向代理与负载均衡实现动静分离实战
查看>>
Object类型转换为long或者Long
查看>>
16位流应用与代码统计器例题
查看>>
linux内核中符号地址的获取
查看>>
内存对齐的问题
查看>>
分析动态代理给Spring事务埋下的坑
查看>>
从不用 try-catch 实现的 async/await 语法说错误处理
查看>>
Zabbix Python API 应用实战
查看>>
DC学院学习笔记(六):数据库和SQL语言简述
查看>>
系统自动登录及盘符无法双击打开问题处理
查看>>
IE11下载文件时出现文件名乱码
查看>>
修行的心态,积极的态度
查看>>