Commit Graph

3 Commits

Author SHA1 Message Date
ed108d48fa [fix](invert index) fix query use char filter (#24268) 2023-09-14 11:42:47 +08:00
0cd5183556 [Refactor](inverted index) refact tokenize function for inverted index (#22313) 2023-08-02 19:12:22 +08:00
fc2b9db0ad [Feature](inverted index) add tokenize function for inverted index (#21813)
In this PR, we introduce TOKENIZE function for inverted index, it is used as following:
```
SELECT TOKENIZE('I love my country', 'english');
```
It has two arguments, first is text which has to be tokenized, the second is parser type which can be **english**, **chinese** or **unicode**.
It also can be used with existing table, like this:
```
mysql> SELECT TOKENIZE(c,"chinese") FROM chinese_analyzer_test;
+---------------------------------------+
| tokenize(`c`, 'chinese')              |
+---------------------------------------+
| ["来到", "北京", "清华大学"]          |
| ["我爱你", "中国"]                    |
| ["人民", "得到", "更", "实惠"]        |
+---------------------------------------+
```
2023-07-25 15:05:35 +08:00