9e1fd04733
[Fix](index writer) fix core if docWriter nullptr when IndexWriter doFlush ( #213 )
2024-05-10 20:03:16 +08:00
d3de160871
[fix](inverted index) special characters cause buffer overflow in Unicode tokenization. ( #210 )
2024-04-30 16:25:32 +08:00
c736f8317b
[opt](standard95) the ‘standard95’ tokenizer does not include stop words by default. ( #209 )
2024-04-30 16:21:24 +08:00
9f849a47f7
[fix] fix field data delete values but not make length reset ( #208 )
2024-04-02 17:27:57 +08:00
70d7a7b68b
[Fix](index writer) fix dead lock when closeInternal catches CLuceneError ( #207 )
...
This pull request addresses an issue within the closeInternal method.
When Thread A invokes closeInternal and encounters a CLucene error, the method exits without setting the closing flag back to false.
As a result, if another Thread B subsequently calls closeInternal and enters the waitForClose method, it becomes trapped in an infinite loop.
2024-03-26 15:31:46 +08:00
ff2cd82f9e
Revert "[opt](chinese) chinese tokenizer lowercase interface ( #203 )" ( #204 )
...
This reverts commit cf210eaaadc3ad5d7b27ff2e7b9635ad45cf227b.
2024-03-19 15:08:03 +08:00
f1eccdde78
[Fix](ram) fix ram directory seek last problem ( #202 )
2024-03-18 21:39:25 +08:00
cf210eaaad
[opt](chinese) chinese tokenizer lowercase interface ( #203 )
2024-03-18 17:43:54 +08:00
7ee46851ae
support multi add different field for one doc ( #200 )
2024-03-18 10:25:36 +08:00
ef95e67ae3
Pick fix mow index compaction and revert comparePostings ( #199 )
...
* [fix](write)revert comparePostings due to write core (#195 )
revert https://github.com/apache/doris-thirdparty/pull/156
* [fix](index compaction)Remove INT32_MAX out of destPostingQueues (#198 )
2024-03-12 15:18:39 +08:00
c5ba0a26e9
fix some implicit conversion ( #197 )
2024-03-08 18:02:28 +08:00
df7e5d4017
(fix)[mlk] if sb_getenc is unknown, then sb_stemmer_new API will leak memory ( #193 )
2024-02-18 10:33:21 +08:00
63ae98a8bc
[fix](chinese) fix the issue where the be crashes due to the missing Chinese dict ( #182 )
2024-02-01 18:04:26 +08:00
f4829cc50f
[fix](index compaction) Support merge null_bitmap during index compaction ( #178 )
...
Spport merge null_bitmap during index compaction.
We read the source indices null_bitmap files and write them to new ones according to the translation vector doc id mapping.
2024-01-24 18:42:44 +08:00
1c76e25b55
[fix](MultiPhrase) fix MultiPhraseQuery memory leak ( #175 )
2024-01-10 10:10:10 +08:00
8b305872ea
[Fix](multi segment) fix multisegment doc overflow ( #174 )
2024-01-09 20:53:42 +08:00
d3bedb2d55
[Fix](memory leak) fix memory leak found in fault injection case ( #170 )
2024-01-08 19:13:26 +08:00
1b59ae8184
[opt](position) add position iterator interface ( #169 )
2024-01-03 10:01:29 +08:00
d05cb8154e
[fix](index writer)fix max buffered docs not working ( #167 )
2023-12-27 18:54:39 +08:00
df3ab39ca6
[opt](RAM dir)fix compile error in _RAMDirectory ( #165 )
...
Change `sizeInBytes` from private to public to access out of CLucene.
2023-12-26 18:04:44 +08:00
486ce95095
[unitest](tokenizer) fix chinese tokenizer unitest ( #164 )
2023-12-25 19:40:23 +08:00
38fa525c5b
[fix](compile)Fix compile error in MacOS ( #162 )
2023-12-25 14:45:40 +08:00
d75e5a152a
[Update](unitest) make unitest work for clucene ( #160 )
2023-12-22 18:50:37 +08:00
91da90f18c
[fix](index compaction)support compact multi segments in one index ( #152 ) ( #159 )
2023-12-22 09:43:48 +08:00
d95d6be91e
[Feature](tokenizer) add lowercase option for tokenizer ( #157 )
2023-12-20 14:43:55 +08:00
d5512c9890
[opt](write)use more efficient string_compare from doris ( #155 )
2023-12-19 19:25:24 +08:00
4bd7d45017
[feature](writer) add null document API to optimize index empty doc ( #153 )
2023-12-19 12:22:08 +08:00
d6adff12de
[fix](index dict) fix index compaction to write the .tis and .tii file structures ( #148 )
2023-12-14 14:43:17 +08:00
a6a13b8911
[Fix](PFOR) fix DefaultDDEC ( #147 )
2023-12-13 18:52:49 +08:00
04ed43c3c7
[optimize](reader) optimize the tii, tis file structure ( #146 )
2023-12-07 20:30:35 +08:00
26206be160
[Fix](bkd) fix estimate_point_count for empty tree ( #145 )
2023-12-06 17:14:03 +08:00
3c5d1e4a4b
[fix](reader)fFig file descriptor (fd) leak issu. ( #144 )
2023-12-05 18:05:53 +08:00
b90069ec93
[feature](input) Add the file cache interface ( #142 )
2023-12-01 10:02:52 +08:00
af6f847d17
[Improvement](bkd) improve bkd performance by reuse bkd reader ( #140 )
2023-11-30 16:27:20 +08:00
6f8a21ffe1
[feature](multi phrase) Add the MultiPhraseQuery interface ( #138 )
2023-11-22 15:07:40 +08:00
97d40fdf94
[feature](invert index) add the reader buffer as a parameter ( #137 )
2023-11-21 14:06:12 +08:00
70c1a692bb
[feature](cmake) coverage compilation option added ( #136 )
2023-11-16 17:12:11 +08:00
12bac905df
[Fix] return default version for MultiSegmentReader::getIndexVersion ( #133 )
...
Co-authored-by: airborne12 <airborne12@gmail.com >
2023-10-26 08:16:20 -05:00
41896c9112
[fix](compiler) fix compiler error ( #131 )
2023-10-26 08:15:38 -05:00
3596b6d574
[Fix](PFor) make default dec/enc for noavx2 ( #130 )
...
Co-authored-by: airborne12 <airborne12@gmail.com >
2023-10-26 08:15:01 -05:00
dd2a9c9292
[Fix](PFOR) fix PFOR 'illegal operand' error for none SSE4.2/AVX cpu ( #129 )
2023-10-20 02:35:10 -05:00
a950c0ab37
[optimize](chinese) optimize stopwords ( #128 )
2023-10-07 16:29:32 +08:00
d2328f05b4
[optimize](chinese) fix debug compiler error ( #126 )
2023-09-25 19:51:44 +08:00
e0fb04c027
[optimize](chinese) optimize chinese tokenizer load ( #123 )
2023-09-21 16:13:14 +08:00
0be3c4aeb6
[fix](index compaction)ignore doc which dose not exist in destination segment ( #125 )
2023-09-21 15:29:33 +08:00
3b51f707d4
[fix](reader) fix missing-override compiler error ( #121 )
2023-09-11 17:27:54 +08:00
021666f9a8
[fix](reader) fix overloaded-virtual compiler error ( #119 )
2023-09-11 13:16:55 +08:00
2761b1afe4
[Fix](PFOR) fix build error in arm ( #120 )
2023-09-09 13:29:03 +08:00
fd45366505
[feature](analysis) add tokenizer CharFilter preprocessing interface ( #118 )
2023-09-08 16:14:54 +08:00
e15a89a562
[Optimize](search) Optimizemultiple terms conjunction query ( #117 )
2023-09-04 16:38:32 +08:00