Awesome
Jieba PHP
结巴分词 PHP 实现 - The Jieba Chinese Word Segmentation Implemented in PHP
使用 PHP 7.4 中新增的 FFI 对 jieba-rs 进行了包装。
Requirement
PHP >= 7.4,并开启 FFI 扩展
Installation
You can install the package via composer:
composer require binaryoung/jieba-php
Usage
use Binaryoung\Jieba\Jieba;
var_dump(Jieba::cut('PHP是世界上最好的语言!'));
API
array cut(string $sentence, bool $hmm = true)
array cutAll(string $sentence)
array cutForSearch(string $sentence, bool $hmm = true)
array TFIDFExtract(string $sentence, int $topK = 20, array $allowedPOS = [])
array textRankExtract(string $sentence, int $topK = 20, array $allowedPOS = [])
array tokenize(string $sentence, string $mode = 'default', bool $hmm = true)
array tag(string $sentence, bool $hmm = true)
int suggestFrequency(string $segment)
self addWord(string $word, ?int $frequency = null, ?string $tag = null)
self useDictionary(string $path)
Examples
see examples/example.php
composer example
Testing
composer test
Benchmark
composer bench
对比 jukuball/jieba-php,循环 50 次对围城每行文字作为一个句子进行分词,分词算法都采用 HMM 模式。
名称 | 耗时 | 单次耗时 | 内存占用 | 内存峰值 |
---|---|---|---|---|
jukuball/jieba-php | 51.593 | 1.032 | 493.00MB | 515.03MB |
binaryoung/jieba-php | 8.408 | 0.16816 | 10.00MB | 22.01MB |
差值 | ↓513.59% | ↓513.59% | ↓4830.00% | ↓2240.20% |
Changelog
Please see CHANGELOG for more information on what has changed recently.
Contributing
Please see CONTRIBUTING for details.
Security
If you discover any security related issues, please email me instead of using the issue tracker.
Credits
License
The MIT License (MIT). Please see License File for more information.