Commit Graph

75 Commits

Author SHA1 Message Date
d95d6be91e [Feature](tokenizer) add lowercase option for tokenizer (#157) 2023-12-20 14:43:55 +08:00
d5512c9890 [opt](write)use more efficient string_compare from doris (#155) 2023-12-19 19:25:24 +08:00
4bd7d45017 [feature](writer) add null document API to optimize index empty doc (#153) 2023-12-19 12:22:08 +08:00
d6adff12de [fix](index dict) fix index compaction to write the .tis and .tii file structures (#148) 2023-12-14 14:43:17 +08:00
a6a13b8911 [Fix](PFOR) fix DefaultDDEC (#147) 2023-12-13 18:52:49 +08:00
04ed43c3c7 [optimize](reader) optimize the tii, tis file structure (#146) 2023-12-07 20:30:35 +08:00
26206be160 [Fix](bkd) fix estimate_point_count for empty tree (#145) 2023-12-06 17:14:03 +08:00
3c5d1e4a4b [fix](reader)fFig file descriptor (fd) leak issu. (#144) 2023-12-05 18:05:53 +08:00
b90069ec93 [feature](input) Add the file cache interface (#142) 2023-12-01 10:02:52 +08:00
af6f847d17 [Improvement](bkd) improve bkd performance by reuse bkd reader (#140) 2023-11-30 16:27:20 +08:00
6f8a21ffe1 [feature](multi phrase) Add the MultiPhraseQuery interface (#138) 2023-11-22 15:07:40 +08:00
97d40fdf94 [feature](invert index) add the reader buffer as a parameter (#137) 2023-11-21 14:06:12 +08:00
70c1a692bb [feature](cmake) coverage compilation option added (#136) 2023-11-16 17:12:11 +08:00
12bac905df [Fix] return default version for MultiSegmentReader::getIndexVersion (#133)
Co-authored-by: airborne12 <airborne12@gmail.com>
2023-10-26 08:16:20 -05:00
41896c9112 [fix](compiler) fix compiler error (#131) 2023-10-26 08:15:38 -05:00
3596b6d574 [Fix](PFor) make default dec/enc for noavx2 (#130)
Co-authored-by: airborne12 <airborne12@gmail.com>
2023-10-26 08:15:01 -05:00
dd2a9c9292 [Fix](PFOR) fix PFOR 'illegal operand' error for none SSE4.2/AVX cpu (#129) 2023-10-20 02:35:10 -05:00
a950c0ab37 [optimize](chinese) optimize stopwords (#128) 2023-10-07 16:29:32 +08:00
d2328f05b4 [optimize](chinese) fix debug compiler error (#126) 2023-09-25 19:51:44 +08:00
e0fb04c027 [optimize](chinese) optimize chinese tokenizer load (#123) 2023-09-21 16:13:14 +08:00
0be3c4aeb6 [fix](index compaction)ignore doc which dose not exist in destination segment (#125) 2023-09-21 15:29:33 +08:00
3b51f707d4 [fix](reader) fix missing-override compiler error (#121) 2023-09-11 17:27:54 +08:00
021666f9a8 [fix](reader) fix overloaded-virtual compiler error (#119) 2023-09-11 13:16:55 +08:00
2761b1afe4 [Fix](PFOR) fix build error in arm (#120) 2023-09-09 13:29:03 +08:00
fd45366505 [feature](analysis) add tokenizer CharFilter preprocessing interface (#118) 2023-09-08 16:14:54 +08:00
e15a89a562 [Optimize](search) Optimizemultiple terms conjunction query (#117) 2023-09-04 16:38:32 +08:00
179282c609 [Fix](compile) make PFOR function adapt to compile arch (#116) 2023-09-04 11:24:06 +08:00
5754b41bbf [Optimize](chinese) Optimize chinese tokenizer index process (#115)
chinese tokenzier use sDocument
2023-09-01 17:30:33 +08:00
9e60ec666b [fix](keyword) fix the keyword type index length limit (#114) 2023-08-25 17:24:57 +08:00
fa33b52263 [Optimize](search) Optimize implement the new query interface (#113) 2023-08-23 17:08:54 +08:00
dda894af51 [fix] compatible with utf8 and invalid utf8 (#110)
1. supports utf8 and non-utf8 strings
2. optimize string_to_wstring function
2023-08-03 15:58:43 +08:00
6a9171e247 [fix] fix string to wstring buffer overflow (#109) 2023-07-31 18:55:05 +08:00
313ae23c47 [Fix] fix keyword type query is incorrect for CHAR(n) column (#105) 2023-07-25 21:25:03 +08:00
454cdd2c99 [improvement](invert index) chinese word cutall does not cut english and numbers (#104) 2023-07-25 20:40:36 +08:00
ea54f3ae46 [improvement](invert index) Change the standard to lucene9.5 (#103) 2023-07-24 16:12:27 +08:00
5dd6fca31d [fix](jieba) Dictionary opening failure cause output log (#102) 2023-07-19 11:24:54 +08:00
da906ebf3c [Feature] add string to wstring function (#101) 2023-07-13 10:24:35 +08:00
a24fa95aa8 [Fix] fix compile and unitest problems (#100)
1. fix CMake when build clucene test alone.
2. revise and add more chinese unitest.
2023-07-12 15:58:48 +08:00
34c2c6712e [Compile] fix compile problem (#99) 2023-07-11 17:59:35 +08:00
4caf10866a [improvement](keyword) keyword type uses the SDocument process (#97) 2023-07-06 13:09:07 +08:00
103e88a8a3 [fix](jiebafix ) cut word greater than 225 heap-use-after-free (#96) 2023-06-21 11:06:44 +08:00
ae26e078dd [Feature](jieba) jieba add stop words filter (#93) 2023-06-20 23:41:46 +08:00
0a06d9f9da [fix](chinese) fix chinese word cut memory leak (#95) 2023-06-20 23:07:47 +08:00
5428108ff1 [Fix](standard analyzer) change standard analyzer CJK tokenizer, align it to newest standard analzyer mode (#92) 2023-06-20 21:25:20 +08:00
60f5eab7ac [Fix] NoLockFactory does not use static variables (#91) 2023-06-19 14:35:12 +08:00
383dc02905 [Fix] fix clucene gcc compile error (#89) 2023-06-14 16:26:09 +08:00
0f33e06d5c [improvement]optimize reduce index write time (#87) 2023-06-14 11:09:33 +08:00
dae2b5d830 [Fix](PFOR) revert TurboPFOR to last version, and fix some build issue (#88) 2023-06-13 10:49:18 +08:00
6033b8c33c [Fix] fix compiler warning (#79) 2023-05-27 14:57:31 +08:00
a1cc94c0b6 [Enhancement] directly use char* from utf-8 chinese charactor (#78) 2023-05-27 10:50:37 +08:00