Commit Graph

99 Commits

Author SHA1 Message Date
9e1fd04733 [Fix](index writer) fix core if docWriter nullptr when IndexWriter doFlush (#213) 2024-05-10 20:03:16 +08:00
d3de160871 [fix](inverted index) special characters cause buffer overflow in Unicode tokenization. (#210) 2024-04-30 16:25:32 +08:00
c736f8317b [opt](standard95) the ‘standard95’ tokenizer does not include stop words by default. (#209) 2024-04-30 16:21:24 +08:00
9f849a47f7 [fix] fix field data delete values but not make length reset (#208) 2024-04-02 17:27:57 +08:00
70d7a7b68b [Fix](index writer) fix dead lock when closeInternal catches CLuceneError (#207)
This pull request addresses an issue within the closeInternal method. 
When Thread A invokes closeInternal and encounters a CLucene error, the method exits without setting the closing flag back to false. 
As a result, if another Thread B subsequently calls closeInternal and enters the waitForClose method, it becomes trapped in an infinite loop.
2024-03-26 15:31:46 +08:00
ff2cd82f9e Revert "[opt](chinese) chinese tokenizer lowercase interface (#203)" (#204)
This reverts commit cf210eaaadc3ad5d7b27ff2e7b9635ad45cf227b.
2024-03-19 15:08:03 +08:00
f1eccdde78 [Fix](ram) fix ram directory seek last problem (#202) 2024-03-18 21:39:25 +08:00
cf210eaaad [opt](chinese) chinese tokenizer lowercase interface (#203) 2024-03-18 17:43:54 +08:00
7ee46851ae support multi add different field for one doc (#200) 2024-03-18 10:25:36 +08:00
ef95e67ae3 Pick fix mow index compaction and revert comparePostings (#199)
* [fix](write)revert comparePostings due to write core (#195)

revert https://github.com/apache/doris-thirdparty/pull/156

* [fix](index compaction)Remove INT32_MAX out of destPostingQueues (#198)
2024-03-12 15:18:39 +08:00
Pxl
c5ba0a26e9 fix some implicit conversion (#197) 2024-03-08 18:02:28 +08:00
df7e5d4017 (fix)[mlk] if sb_getenc is unknown, then sb_stemmer_new API will leak memory (#193) 2024-02-18 10:33:21 +08:00
63ae98a8bc [fix](chinese) fix the issue where the be crashes due to the missing Chinese dict (#182) 2024-02-01 18:04:26 +08:00
f4829cc50f [fix](index compaction) Support merge null_bitmap during index compaction (#178)
Spport merge null_bitmap during index compaction.
We read the source indices null_bitmap files and write them to new ones according to the translation vector doc id mapping.
2024-01-24 18:42:44 +08:00
1c76e25b55 [fix](MultiPhrase) fix MultiPhraseQuery memory leak (#175) 2024-01-10 10:10:10 +08:00
8b305872ea [Fix](multi segment) fix multisegment doc overflow (#174) 2024-01-09 20:53:42 +08:00
d3bedb2d55 [Fix](memory leak) fix memory leak found in fault injection case (#170) 2024-01-08 19:13:26 +08:00
1b59ae8184 [opt](position) add position iterator interface (#169) 2024-01-03 10:01:29 +08:00
d05cb8154e [fix](index writer)fix max buffered docs not working (#167) 2023-12-27 18:54:39 +08:00
df3ab39ca6 [opt](RAM dir)fix compile error in _RAMDirectory (#165)
Change `sizeInBytes` from private to public to access out of CLucene.
2023-12-26 18:04:44 +08:00
486ce95095 [unitest](tokenizer) fix chinese tokenizer unitest (#164) 2023-12-25 19:40:23 +08:00
38fa525c5b [fix](compile)Fix compile error in MacOS (#162) 2023-12-25 14:45:40 +08:00
d75e5a152a [Update](unitest) make unitest work for clucene (#160) 2023-12-22 18:50:37 +08:00
91da90f18c [fix](index compaction)support compact multi segments in one index (#152) (#159) 2023-12-22 09:43:48 +08:00
d95d6be91e [Feature](tokenizer) add lowercase option for tokenizer (#157) 2023-12-20 14:43:55 +08:00
d5512c9890 [opt](write)use more efficient string_compare from doris (#155) 2023-12-19 19:25:24 +08:00
4bd7d45017 [feature](writer) add null document API to optimize index empty doc (#153) 2023-12-19 12:22:08 +08:00
d6adff12de [fix](index dict) fix index compaction to write the .tis and .tii file structures (#148) 2023-12-14 14:43:17 +08:00
a6a13b8911 [Fix](PFOR) fix DefaultDDEC (#147) 2023-12-13 18:52:49 +08:00
04ed43c3c7 [optimize](reader) optimize the tii, tis file structure (#146) 2023-12-07 20:30:35 +08:00
26206be160 [Fix](bkd) fix estimate_point_count for empty tree (#145) 2023-12-06 17:14:03 +08:00
3c5d1e4a4b [fix](reader)fFig file descriptor (fd) leak issu. (#144) 2023-12-05 18:05:53 +08:00
b90069ec93 [feature](input) Add the file cache interface (#142) 2023-12-01 10:02:52 +08:00
af6f847d17 [Improvement](bkd) improve bkd performance by reuse bkd reader (#140) 2023-11-30 16:27:20 +08:00
6f8a21ffe1 [feature](multi phrase) Add the MultiPhraseQuery interface (#138) 2023-11-22 15:07:40 +08:00
97d40fdf94 [feature](invert index) add the reader buffer as a parameter (#137) 2023-11-21 14:06:12 +08:00
70c1a692bb [feature](cmake) coverage compilation option added (#136) 2023-11-16 17:12:11 +08:00
12bac905df [Fix] return default version for MultiSegmentReader::getIndexVersion (#133)
Co-authored-by: airborne12 <airborne12@gmail.com>
2023-10-26 08:16:20 -05:00
41896c9112 [fix](compiler) fix compiler error (#131) 2023-10-26 08:15:38 -05:00
3596b6d574 [Fix](PFor) make default dec/enc for noavx2 (#130)
Co-authored-by: airborne12 <airborne12@gmail.com>
2023-10-26 08:15:01 -05:00
dd2a9c9292 [Fix](PFOR) fix PFOR 'illegal operand' error for none SSE4.2/AVX cpu (#129) 2023-10-20 02:35:10 -05:00
a950c0ab37 [optimize](chinese) optimize stopwords (#128) 2023-10-07 16:29:32 +08:00
d2328f05b4 [optimize](chinese) fix debug compiler error (#126) 2023-09-25 19:51:44 +08:00
e0fb04c027 [optimize](chinese) optimize chinese tokenizer load (#123) 2023-09-21 16:13:14 +08:00
0be3c4aeb6 [fix](index compaction)ignore doc which dose not exist in destination segment (#125) 2023-09-21 15:29:33 +08:00
3b51f707d4 [fix](reader) fix missing-override compiler error (#121) 2023-09-11 17:27:54 +08:00
021666f9a8 [fix](reader) fix overloaded-virtual compiler error (#119) 2023-09-11 13:16:55 +08:00
2761b1afe4 [Fix](PFOR) fix build error in arm (#120) 2023-09-09 13:29:03 +08:00
fd45366505 [feature](analysis) add tokenizer CharFilter preprocessing interface (#118) 2023-09-08 16:14:54 +08:00
e15a89a562 [Optimize](search) Optimizemultiple terms conjunction query (#117) 2023-09-04 16:38:32 +08:00