91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

資源簡介

(1)創(chuàng)建RDD (2)將RDD轉(zhuǎn)為DataFrame (3)調(diào)用registerTempTable,注冊為表,表名為:tb_book (4)使用使用sql語句查詢前15條 (5)模糊查詢書名包含“微積分”的書 (6)輸出圖書的前10行的name和price字段信息 (7)統(tǒng)計(jì)書名包含“微積分”的書的數(shù)量 (8)查詢評分大于9的圖書,,且只展示前10條 (9)計(jì)算所有書名包含“微積分”的評分平均值 (10)把書目按照評分從高到低進(jìn)行排列,且只展示前15條 (11)把圖書按照出版社進(jìn)行分組,統(tǒng)計(jì)出不同出版社圖書的總數(shù) (12)將書名包含“微積分”的書記錄保存到本地或HDFS上,且保存的格式為csv,文件名為:學(xué)號.csv (13)然后再從該csv文件加載,創(chuàng)建DataFrame,并查詢和顯示

資源截圖

代碼片段和文件信息

from?pyspark.shell?import?sc
from?pyspark.sql.types?import?*
from?pyspark.sql?import?SparkSession
spark?=SparkSession.builder.master(“l(fā)ocal“).appName(“Word?Count“).config(“spark.some.config.option“?“some-value“).getOrCreate()

rdd?=?sc.textFile(“xxxxx.txt“)
header?=?rdd.first().split(““)
header1?=?rdd.first()
print(header)
schema?=?StructType([StructField(header[0]StringType()True)StructField(header[1]StringType()True)StructField(header[2]StringType()True)StructField(header[3]StringType()True)StructField(header[4]StringType()True)StructField(header[5]StringType()True)])

def?filter_line(line):
????if?line?==?header1:
????????return?False
????else:
????????return?True

rdd1?=?rdd.filter(lambda?line:line!=header1).map(lambda?line:line.split(““)).map(lambda?x:tuple(x))

df?=?rdd1.toDF(schema)

#?#?df.show()
#?“““
#?建表
#?“““
df.registerTempTable(“tb_book“)


#?spark.sql(“select?*?from?tb_book“).show(15)
#?spark.sql(“select?*?from?tb_book?where?‘書名‘?like?‘%%微積分%%‘“).show()
#?spark.sql(“select?‘書名‘‘價格‘?from?tb_book“).show(10)
#?number?=spark.sql(“select?*?from?tb_book?where?‘書名‘?like?‘%%微積分%%‘“).count()
#?print(number)
#?spark.sql(“select?*?from?tb_book?where?‘評分‘>‘9‘“).show(10)
#?spark.sql(“select?avg?(‘評分‘)?from?tb_book?where?‘書名‘?like?‘%%微積分%%‘?“).show()
#?spark.sql(“select?*?from?tb_book?order?by?‘評分‘?desc?“).show(15)
#?group?=spark.sql(“select?‘出版社‘count(‘出版社‘)?from?tb_book?GROUP?by?‘出版社‘“).collect()
#?print(group)

#?ddf?=spark.sql(“select?*?from?tb_book?where?‘書名‘?like?‘%%微積分%%‘“)
#?ddf.write.parquet(“/home/zhuang/138/tb_book“)
#?ddf.write.format(‘csv‘).save(“/home/zhuang/138/16034460138.csv“)
#?userDF=spark.read.format(“csv“).load(‘/home/zhuang/138/16034460138.csv/part-00000-e2c9db96-961e-45d7-8221-c2ff5e90d174-c000.csv‘)
#?userDF.printSchema()
#?userDF.show()

#?rdd_1?=sc.textFile(“/home/zhuang/138/16034460138.csv/part-00000-e2c9db96-961e-45d7-8221-c2ff5e90d174-c000.csv“)
#?rdd_2?=?rdd_1.map(lambda?line:line.split(““)).map(lambda?x:tuple(x))
#?dfff?=?rdd_2.toDF(schema)
#?dfff.select(“書名“).show()

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????文件?????5526130??2019-05-15?02:46??book.txt
?????文件????????2191??2019-05-16?00:54??test2.py

評論

共有 條評論

相關(guān)資源