Tutorial (Stopwords for nagisa)
How to use stopwords for nagisa
This tutorial provides how to use stopwords for Japanese text in nagisa.
Install python libraries
Before we get started, please run the following command to install the libraries used in this tutorial.
pip install nagisa
Get stopwords
Nagisa provides a built-in Japanese stopwords list. You can use nagisa.stopwords to easily filter out common Japanese stopwords (such as particles and auxiliary verbs) from tokenized results.
python tutorial_stopwords.py
tutorial_stopwords.py
1import nagisa
2
3text = "日本語のストップワードを簡単に利用できます。"
4tokens = nagisa.tagging(text)
5print(tokens.words)
6# => ['日本', '語', 'の', 'ストップ', 'ワード', 'を', '簡単', 'に', '利用', 'でき', 'ます', '。']
7
8# Filter out stopwords from the tokenized result
9words = [word for word in tokens.words if word not in nagisa.stopwords]
10print(words)
11# => ['日本', '語', 'ストップ', 'ワード', '簡単', '利用', '。']