資源簡介
北京大學網絡大數據管理與應用大作業,使用pagerank來分析微博數據。4個Spark和2個Hadoop實現

代碼片段和文件信息
import?org.apache.commons.lang.StringUtils;
import?org.apache.hadoop.conf.Configuration;
import?org.apache.hadoop.fs.FileSystem;
import?org.apache.hadoop.fs.Path;
import?org.apache.hadoop.io.DoubleWritable;
import?org.apache.hadoop.io.LongWritable;
import?org.apache.hadoop.io.Text;
import?org.apache.hadoop.mapreduce.Job;
import?org.apache.hadoop.mapreduce.Mapper;
import?org.apache.hadoop.mapreduce.Reducer;
import?org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import?org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import?java.io.*;
import?java.util.*;
public?class?hadoopPageRank?{
????public?static?class?initGraphMapper?extends?Mapper?{
????????@Override
????????protected?void?map(LongWritable?key?Text?value?Context?context)?throws?IOException?InterruptedException?{
????????????String[]?lineSplit?=?value.toString().split(“\t“);
????????????lineSplit[0]?=?StringUtils.strip(lineSplit[0]?“\““);?lineSplit[1]?=?StringUtils.strip(lineSplit[1]?“\““);
????????????if?(lineSplit[1].equals(“0“)?||?lineSplit[0].equals(“0“))?return;
????????????context.write(new?Text(lineSplit[0])?new?Text(lineSplit[1]));
????????????context.write(new?Text(lineSplit[0])?new?Text(“node“));
????????????context.write(new?Text(lineSplit[1])?new?Text(“node“));
????????}
????}
????public?static?class?initGraphReducer?extends?Reducer?{
????????@Override
????????protected?void?reduce(Text?key?Iterable?values?Context?context)?throws?IOException?InterruptedException?{
????????????Set?set?=?new?HashSet<>();
????????????for?(Text?text:?values)?{
????????????????String?val?=?text.toString();
????????????????if?(val.equals(“node“))?continue;
????????????????set.add(val);
????????????}
????????????//?if?a?node?does?not?have?out?edges?then?add?a?edge?to?itself
????????????if?(set.size()?>?0)?context.write(key?new?Text(“1.0“?+?“\t“?+?String.join(“\t“?set)));
????????????else?context.write(key?new?Text(“1.0“?+?“\t“?+?key.toString()));
????????}
????}
????public?static?class?pageRankMapper?extends?Mapper?{
????????@Override
????????protected?void?map(LongWritable?key?Text?value?Context?context)?throws?IOException?InterruptedException?{
????????????String[]?lineSplit?=?value.toString().split(“\t“);
????????????Double?rank?=?Double.parseDouble(lineSplit[1]);
????????????int?size?=?lineSplit.length?-?2;
????????????for?(int?i=0;?i ????????????????if?(i?==?0?||?i?==?1)?continue;
????????????????context.write(new?Text(lineSplit[i])?new?Text(“rank“?+?“?“?+?rank?/?size));
????????????????context.write(new?Text(lineSplit[0])?new?Text(lineSplit[i]));
????????????}
????????}
????}
????public?static?class?pageRankReducer?extends?Reducer?{
????????@Override
????????protected?void?reduce(Text?key?Iterable?values?Context?context)?throws?IOException?InterruptedException?{
????????????Double?res?=?0.0;
?????????
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2017-12-09?15:39??input_small\
?????文件?????7105030??2017-11-25?15:15??input_small\page_rank_data_small.txt
?????文件????????2560??2017-12-01?14:49??pom.xm
?????目錄???????????0??2017-11-25?15:11??src\
?????目錄???????????0??2017-11-25?15:11??src\main\
?????目錄???????????0??2017-12-09?17:20??src\main\java\
?????文件????????9335??2017-12-02?10:35??src\main\java\hadoopPageRank.java
?????文件????????9207??2017-12-09?16:02??src\main\java\hadoopPageRankAverage.java
?????文件????????4395??2017-12-08?11:04??src\main\java\sparkPageRank.java
?????文件????????4389??2017-12-08?14:41??src\main\java\sparkPageRankAverage.java
?????文件????????8083??2017-12-09?17:20??src\main\java\sparkPageRankAverageV2.java
?????文件????????3136??2017-12-01?19:30??src\main\java\sparkPageRankBasic.java
?????文件????????3110??2017-12-01?23:23??src\main\java\sparkPageRankHashMap.java
?????文件????????6952??2017-12-08?13:46??src\main\java\sparkPageRankV2.java
?????目錄???????????0??2017-12-02?11:02??src\main\resources\
?????文件?????????327??2017-12-02?11:02??src\main\resources\log4j.properties
?????目錄???????????0??2017-11-25?15:11??src\test\
?????目錄???????????0??2017-11-25?15:11??src\test\java\
評論
共有 條評論