Tensorflow練習1對電影評論進行分類

大小: 481KB

文件類型: .zip

金幣: 2

下載: 0 次

發布日期: 2021-06-04
語言: Python
標簽: Tensorflow??

高速下載

資源簡介

是按照demo來運行的,其中遇到了不少的錯誤,一一修改后得到的可運行版本,里面已經包含了數據源,開發環境是python3.5.2+Tensorflow1.5.0,親測可用

資源截圖

小圖大圖

代碼片段和文件信息

import?numpy?as?np
import?tensorflow?as?tf
import?random
import?pickle
from?collections?import?Counter
?
import?nltk
from?nltk.tokenize?import?word_tokenize
“““
‘I‘m?super?man‘
tokenize:
[‘I‘?‘‘m‘?‘super‘‘man‘?]?
“““
from?nltk.stem?import?WordNetLemmatizer
“““
詞形還原（lemmatizer），即把一個任何形式的英語單詞還原到一般形式，與詞根還原不同（stemmer），后者是抽取一個單詞的詞根。
“““
?
pos_file?=?‘pos.txt‘
neg_file?=?‘neg.txt‘
?
#?創建詞匯表
def?create_lexicon（pos_file?neg_file）:
	lex?=?[]
	#?讀取文件
	def?process_file（f）:
		with?open（pos_file?‘r‘）?as?f:
			lex?=?[]
			lines?=?f.readlines（）
			#print（lines）
			for?line?in?lines:
				words?=?word_tokenize（line.lower（））
				lex?+=?words
			return?lex
?
	lex?+=?process_file（pos_file）
	lex?+=?process_file（neg_file）
	#print（len（lex））
	lemmatizer?=?WordNetLemmatizer（）
	lex?=?[lemmatizer.lemmatize（word）?for?word?in?lex]?#?詞形還原?（cats->cat）
?
	word_count?=?Counter（lex）
	#print（word_count）
	#?{‘.‘:?13944?‘‘:?10536?‘the‘:?10120?‘a‘:?9444?‘and‘:?7108?‘of‘:?6624?‘it‘:?4748?‘to‘:?3940......}
	#?去掉一些常用詞像thea?and等等，和一些不常用詞;?這些詞對判斷一個評論是正面還是負面沒有做任何貢獻
	lex?=?[]
	for?word?in?word_count:
		if?word_count[word]??20:??#?這寫死了，好像能用百分比
			lex.append（word）????????#?齊普夫定律-使用Python驗證文本的Zipf分布?http://blog.topspeedsnail.com/archives/9546
	return?lex
?
lex?=?create_lexicon（pos_file?neg_file）
#lex里保存了文本中出現過的單詞。
?
#?把每條評論轉換為向量?轉換原理：
#?假設lex為[‘woman‘?‘great‘?‘feel‘?‘actually‘?‘looking‘?‘latest‘?‘seen‘?‘is‘]?當然實際上要大的多
#?評論‘i?think?this?movie?is?great‘?轉換為?[01000001]?把評論中出現的字在lex中標記，出現過的標記為1，其余標記為0
def?normalize_dataset（lex）:
	dataset?=?[]
	#?lex:詞匯表；review:評論；clf:評論對應的分類，[01]代表負面評論?[10]代表正面評論?
	def?string_to_vector（lex?review?clf）:
		words?=?word_tokenize（line.lower（））
		lemmatizer?=?WordNetLemmatizer（）
		words?=?[lemmatizer.lemmatize（word）?for?word?in?words]
?
		features?=?np.zeros（len（lex））
		for?word?in?words:
			if?word?in?lex:
				features[lex.index（word）]?=?1??#?一個句子中某個詞可能出現兩次可以用+=1，其實區別不大
		return?[features?clf]
?
	with?open（pos_file?‘r‘）?as?f:
		lines?=?f.readlines（）
		for?line?in?lines:
			one_sample?=?string_to_vector（lex?line?[10]）??#?[array（[?0.??1.??0.?...??0.??0.??0.]）?[10]]
			dataset.append（one_sample）
	with?open（neg_file?‘r‘）?as?f:
		lines?=?f.readlines（）
		for?line?in?lines:
			one_sample?=?string_to_vector（lex?line?[01]）??#?[array（[?0.??0.??0.?...??0.??0.??0.]）?[01]]]
			dataset.append（one_sample）
	
	#print（len（dataset））
	return?dataset
?
dataset?=?normalize_dataset（lex）
random.shuffle（dataset）
“““
#把整理好的數據保存到文件，方便使用。到此完成了數據的整理工作
with?open（‘save.pickle‘?‘wb‘）?as?f:
	pickle.dump（dataset?f）
“““
?
#?取樣本中的10%做為測試數據
test_size?=?int（len（dataset）?*?0.1）
?
dataset?=?np.array（dataset）
?
train_dataset?=?dataset[:-test_size]
test_dataset?=?dataset[-test_size:]
?
#?Feed-Forward?Neural?Network
#?定義每個層有多少‘神經元‘‘
n_input_layer?=?len（lex）??#?輸入層
?
n_layer_1?=?1000????#?hide?layer
n_layer_2?=?1000

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????目錄???????????0??2018-02-08?16:06??CommentClass\
?????文件????????6160??2018-02-08?15:57??CommentClass\tesww.py
?????文件??????626757??2018-02-08?14:50??CommentClass\pos.txt
?????文件??????612846??2018-02-08?14:50??CommentClass\neg.txt

上一篇：sniffer_get_body.py
下一篇：python讀取wav時頻譜繪制

91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

Tensorflow練習1對電影評論進行分類

資源簡介

資源截圖

代碼片段和文件信息

評論

相關資源