NLTK
NLTK 库的使用方法
- 安装
收起
bash
复制
代码语言:javascript代码运行次数:0运行复制pip install nltk
- 下载相关数据
- 首次使用时,需要下载 NLTK 的语料库和其他数据资源。在 Python 脚本或交互式环境中运行以下代码:
收起
python
复制
代码语言:javascript代码运行次数:0运行复制import nltk
nltk.download()
- 这会弹出一个下载器窗口,你可以选择需要下载的数据,如
punkt
(用于句子和单词切分的语料库)、averaged_perceptron_tagger
(词性标注器)等。
三、代码示例
1. 句子和单词切分(Tokenization)
收起
python
复制
代码语言:javascript代码运行次数:0运行复制import nltk
text = "Natural Language Processing is an interesting field. It has many applications."
# 句子切分
sentences = nltk.sent_tokenize(text)
print("Sentences:")
for sentence in sentences:
print(sentence)
# 单词切分
words = []
for sentence in sentences:
word_tokens = nltk.word_tokenize(sentence)
words.extend(word_tokens)
print("\nWords:")
for word in words:
print(word)
2. 词性标注(Part - of - Speech Tagging)
收起
python
复制
代码语言:javascript代码运行次数:0运行复制import nltk
text = "I love apples. They are delicious."
words = nltk.word_tokenize(text)
tagged_words = nltk.pos_tag(words)
print("Tagged words:")
for word, tag in tagged_words:
print(word, "-", tag)
3. 命名实体识别(Named Entity Recognition)
收起
python
复制
代码语言:javascript代码运行次数:0运行复制import nltk
text = "Apple Inc. is headquartered in Cupertino, California."
words = nltk.word_tokenize(text)
tagged_words = nltk.pos_tag(words)
named_entities = nltk.ne_chunk(tagged_words)
print("Named entities:")
print(named_entities)
4. 词干提取(Stemming)
收起
python
复制
代码语言:javascript代码运行次数:0运行复制from nltk.stem import PorterStemmer
ps = PorterStemmer()
words = ["running", "runs", "ran", "easily", "fairly"]
for word in words:
stem = ps.stem(word)
print(word, "->", stem)
发布评论