投稿
APP
微信扫一扫获取更多

自然语言处理 | 使用Spacy 进行自然语言处理(二)

美股投资网

2018-09-03 18:59:31

图灵汇官网

上次我们简单介绍了Spacy，了解了它的安装及一些基础功能，如实体识别等。今天我们将进一步探讨Spacy的其他高级功能，包括词性还原、词性标注、名词块识别和依存分析等。

接下来，让我们直接进入代码部分：

```python import encoreweb_sm

加载Spacy模型

parser = encoreweb_sm.load()

定义一段文本

sentences = "There is an art, it says, or rather, a knack to flying." "The knack lies in learning how to throw yourself at the ground and miss." "In the beginning the Universe was created." "This has made a lot of people very angry and been widely regarded as a bad move."

解析文本中的句子

print("解析文本中的句子：") sents = [sent for sent in parser(sentences).sents] for sentence in sents: print(sentence)

分词处理

print("n分词结果：") tokens = [token for token in sents[0] if len(token) > 1] print(tokens)

词性还原

print("n词性还原结果：") lemmatokens = [token.lemma for token in sents[0] if len(token) > 1] print(lemma_tokens)

简化的词性标注

print("n简化版词性标注：") postokens = [token.pos for token in sents[0] if len(token) > 1] print(pos_tokens)

详细的词性标注

print("n详细版词性标注：") tagtokens = [token.tag for token in sents[0] if len(token) > 1] print(tag_tokens)

依存关系分析

print("n依存关系分析：") deptokens = [token.dep for token in sents[0] if len(token) > 1] print(dep_tokens)

名词块分析

print("n名词块分析：") doc = parser(u"Autonomous cars shift insurance liability toward manufacturers") chunktext = [chunk.text for chunk in doc.nounchunks] print(chunk_text)