使用Spark java查找最大的行号(Finding largest line number using Spark java)
我面临一个问题,我必须找出最大的线及其索引。 这是我的方法
SparkConf conf = new SparkConf().setMaster("local").setAppName("basicavg"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> rdd = sc.textFile("/home/impadmin/ravi.txt"); JavaRDD<Tuple2<Integer,String>> words = rdd.map(new Function<String, Tuple2<Integer,String>>() { @Override public Tuple2<Integer,String> call(String v1) throws Exception { // TODO Auto-generated method stub return new Tuple2<Integer, String>(v1.split(" ").length, v1); } }); JavaPairRDD<Integer, String> linNoToWord = JavaPairRDD.fromJavaRDD(words).sortByKey(false); System.out.println(linNoToWord.first()._1+" ********************* "+linNoToWord.first()._2);I am facing a problem in which i have to find out the largest line and its index. Here is my approach
SparkConf conf = new SparkConf().setMaster("local").setAppName("basicavg"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> rdd = sc.textFile("/home/impadmin/ravi.txt"); JavaRDD<Tuple2<Integer,String>> words = rdd.map(new Function<String, Tuple2<Integer,String>>() { @Override public Tuple2<Integer,String> call(String v1) throws Exception { // TODO Auto-generated method stub return new Tuple2<Integer, String>(v1.split(" ").length, v1); } }); JavaPairRDD<Integer, String> linNoToWord = JavaPairRDD.fromJavaRDD(words).sortByKey(false); System.out.println(linNoToWord.first()._1+" ********************* "+linNoToWord.first()._2);最满意答案
由于您关注行号和文本,请尝试此操作。
首先创建一个可序列化的类Line :
public static class Line implements Serializable { public Line(Long lineNo, String text) { lineNo_ = lineNo; text_ = text; } public Long lineNo_; public String text_; }然后执行以下操作:
SparkConf conf = new SparkConf().setMaster("local[1]").setAppName("basicavg"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> rdd = sc.textFile("/home/impadmin/words.txt"); JavaPairRDD<Long, Line> linNoToWord2 = rdd.zipWithIndex().mapToPair(new PairFunction<Tuple2<String,Long>, Long, Line>() { public Tuple2<Long, Line> call(Tuple2<String, Long> t){ return new Tuple2<Long, Line>(Long.valueOf(t._1.split(" ").length), new Line(t._2, t._1)); } }).sortByKey(false); System.out.println(linNoToWord2.first()._1+" ********************* "+linNoToWord2.first()._2.text_);In this way the tupleRDD will get sorted on the basis of key and the first element in the new rdd after sorting is of highest length:
JavaRDD<String> rdd = sc.textFile("/home/impadmin/ravi.txt"); JavaRDD<Tuple2<Integer,String>> words = rdd.map(new Function<String, Tuple2<Integer,String>>() { @Override public Tuple2<Integer,String> call(String v1) throws Exception { // TODO Auto-generated method stub return new Tuple2<Integer, String>(v1.split(" ").length, v1); } }); JavaRDD<Tuple2<Integer,String>> tupleRDD1= tupleRDD.sortBy(new Function<Tuple2<Integer,String>, Integer>() { @Override public Integer call(Tuple2<Integer, String> v1) throws Exception { // TODO Auto-generated method stub return v1._1; } }, false, 1); System.out.println(tupleRDD1.first()); }更多推荐
发布评论