Tutorial (Stopwords for nagisa)

How to use stopwords for nagisa

This tutorial provides how to use stopwords for Japanese text in nagisa.

Install python libraries

Before we get started, please run the following command to install the libraries used in this tutorial.

pip install nagisa

Get stopwords

Nagisa provides a built-in Japanese stopwords list. You can use nagisa.stopwords to easily filter out common Japanese stopwords (such as particles and auxiliary verbs) from tokenized results.

python tutorial_stopwords.py
tutorial_stopwords.py
 1import nagisa
 2
 3text = "日本語のストップワードを簡単に利用できます。"
 4tokens = nagisa.tagging(text)
 5print(tokens.words)
 6# => ['日本', '語', 'の', 'ストップ', 'ワード', 'を', '簡単', 'に', '利用', 'でき', 'ます', '。']
 7
 8# Filter out stopwords from the tokenized result
 9words = [word for word in tokens.words if word not in nagisa.stopwords]
10print(words)
11# => ['日本', '語', 'ストップ', 'ワード', '簡単', '利用', '。']