Awesome
textrankr
Reorder sentences using TextRank algorithm.
- Mostly designed for Korean, but not limited to.
- Check out lexrankr, which is another awesome summarizer!
- Not available for Python 2 anymore (if necessary, use version 0.3).
Installation
pip install textrankr
Tokenizers
Tokenizers are not included. You have to implement one by yourself.
Example:
from typing import List
class MyTokenizer:
def __call__(self, text: str) -> List[str]:
tokens: List[str] = text.split()
return tokens
한국어의 경우 KoNLPy를 사용하는 방법이 있습니다. 아래 예시처럼 phrases
를 쓰게되면 엄밀히는 토크나이저가 아니지만 이게 더 좋은 결과를 주는것 같습니다.
from typing import List
from konlpy.tag import Okt
class OktTokenizer:
okt: Okt = Okt()
def __call__(self, text: str) -> List[str]:
tokens: List[str] = self.okt.phrases(text)
return tokens
Usage
from typing import List
from textrankr import TextRank
mytokenizer: MyTokenizer = MyTokenizer()
textrank: TextRank = TextRank(mytokenizer)
k: int = 3 # num sentences in the resulting summary
summarized: str = textrank.summarize(your_text_here, k)
print(summarized) # gives you some text
# if verbose = False, it returns a list
summaries: List[str] = textrank.summarize(your_text_here, k, verbose=False)
for summary in summaries:
print(summary)
Test
Use docker.
docker build -t textrankr -f Dockerfile .
docker run --rm -it textrankr