Commit Graph

5733 Commits

Author SHA1 Message Date
e77b98be88 [fix](months_diff) fix wrong result of months_diff (#25577) 2023-10-19 14:29:47 +08:00
3d1206d325 [date](fix) modify push-down predicate for datev1 type (#25571)
For comparison predicate, two arguments must be cast to datetime and push down to storage if either one is date type. This PR disables predicate push-down for this case.
2023-10-19 14:18:27 +08:00
63c89df474 [enhencement](RowsetWriter) Don't delete files when beta rowset writer destructed (#25578) 2023-10-19 09:37:04 +08:00
dbf5787682 [fix](be) Make DorisCallOnce's function exception-safe (#25579) 2023-10-18 22:13:30 +08:00
11fecafb74 [fix](move-memtable) fallback if target table contains inverted index (#25498) 2023-10-18 22:11:59 +08:00
32fc8a1799 [chore](compaction) Do not print the stack trace when the compaction task already exists (#25597) 2023-10-18 21:44:17 +08:00
c21eb315b0 [feature](thrift api) support expr in MemoryScratchSink and make arrow::Schema recalculate with block info (#24603) 2023-10-18 07:51:56 -05:00
9c9fc84f39 [feature](merge-cloud) Abstract BaseTablet for CloudTablet (#24929) 2023-10-18 20:29:04 +08:00
e4a83a22d1 [opt](error msg) Make data codec error clearly when load csv data can't display (#25540)
Co-authored-by: Tanya-W <tanya1218w@163,com>
2023-10-18 16:12:22 +08:00
80e5e72202 [fix](scanner) coredump caused by 'prune_predicates_by_zone_map' (#25555) 2023-10-18 16:11:41 +08:00
d2400d1d7b [feature](profile) profilev2 distinguish Sink and Operator in pipelineX (#25491)
* update

* update
2023-10-18 13:12:29 +08:00
6cb947f72b [refactor](unused code) delete unused method from field.h (#25554)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-10-18 13:11:14 +08:00
64aeeb971b [Fix](partial-update) Correct the alignment process when the table has sequence column and add cases (#25346)
This PR fix the alignment process during publish phase when conflict occurs during concurrent partial updates: if we encounter a row with the same key and larger value in sequence column, it means that there exists another load which introduces a row with the same keys and larger sequence column value published successfully after the commit phase of the current load. We should act as follows:

- If the columns we update include sequence column, we should delete the current row becase the partial update on the current row has been overwritten by the previous one with larger sequence column value.
- Otherwise, we should combine the values of the missing columns in the previous row and the values of the including columns in the current row into a new row.
2023-10-18 11:32:51 +08:00
b0e0a0569a [Fix](row store) Real default value should be used instead of default… (#25230)
Before this PR the default value is not correct, we should use default value in Frontend schema.
2023-10-18 10:13:44 +08:00
47689fd452 [refactor](jni) unified jni framework for java udf (#25302)
Use the unified jni framework to refactor java udf.
The unified jni framework takes VectorTable as the container to transform data between c++ and java, and hide the details of data format conversion.
In addition, the unified framework supports complex and nested types.
The performance of basic types remains consistent, with a 30% improvement in string types and an order of magnitude improvement in complex types.
2023-10-18 09:27:54 +08:00
18c2a13e09 [fix](multi-catalog)fix maxcompute partition filter and session creation (#24911)
add maxcompute partition support
fix maxcompute partition filter
modify maxcompute session create method
2023-10-17 22:36:10 +08:00
b74836050a [chore](config) turnoff fuzzy for enable_simdjson_reader (#25521) 2023-10-17 18:42:11 +08:00
06ff59bc03 [Performance](sink) SIMD the tablet sink valied data function (#25480) 2023-10-17 16:21:08 +08:00
31a5e072e7 [refactor](pipelineX) Simplify set operation (#25502) 2023-10-17 15:11:46 +08:00
1514f78b87 [refactor](partial-update) Split partial update infos from tablet schema (#25147) 2023-10-17 14:21:40 +08:00
c2fe34dec7 [refine](pipelineX) refactor local state (#25448) 2023-10-17 11:23:29 +08:00
5f844486e3 [enhancement](invert index) read columns by index reduce seek time (#24735) 2023-10-17 10:34:33 +08:00
ef7d8aa99a [fix](be)confix bug of converting outer join probe block to nullable (#25492)
_do_evaluate will add temp result column into original table block, so in order to only convert correct columns to be nullable, need call convert_block_to_null before _do_evaluate
2023-10-17 10:10:56 +08:00
cda8fb6b8b [fix](load) return Status when error in RowsetWriter::build (#25381) 2023-10-17 09:40:23 +08:00
f1a5e393c7 [feature](insert) Support group commit insert use new syntax like insert into table_id(xxx) (#25484) 2023-10-17 09:23:09 +08:00
f75ee49cb4 [chore](fmt) Remove stringstream by fmt (#25474)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-10-16 21:31:54 +08:00
59ebbb351e [feature](merge-cloud) Enable write into cache when uploading file to s3 using s3 file writer (#24364) 2023-10-16 21:31:02 +08:00
e9157a3dba [fix](path gc) fix data dir path gc (#25420) 2023-10-16 20:25:20 +08:00
eaf5febc97 [enhancement](cooldown) Improve cooldown logs (#25432) 2023-10-16 20:17:00 +08:00
f9df3bae61 [Enhancement](functions) change some nullable mode and clear some smooth upgrade (#25334) 2023-10-16 19:50:17 +08:00
b2e3ecb81d [opt](load)change load_to_single_tablet tablet search algorithm from random to round-robin (#25256)
At present, `load_to_singlt_tablet` import implementation refers to simple random number remainder, which cannot achieve true averaging. This will lead to uneven disk IO and uneven use of cluster resources. To solve this problem, we are preparing to implement round-robin for each partition tablet imported each time, in order to achieve average load to each tablet.

When generating the load query plan, the tablet index record currently imported is passed to BE.
Add a deamon task in FE to regularly clean up the `loadTabletRecordMap`. The map will get the bucket_number of the partition and update the `load_tablet_index` when `getCurrentLoadTabletIndex`.
2023-10-16 16:43:25 +08:00
Pxl
292ccaeda8 insert default when json array parse failed (#25447)
insert default when json array parse failed
2023-10-16 14:51:26 +08:00
Pxl
d00d029ffb Separate fixed key hash map context creator (#25438)
Separate fixed key hash map context creator
2023-10-16 11:20:30 +08:00
c482c22a74 [case](regresscases) add regress cases for nested type nested type with csv format (#25355)
this pr
1.  fix use podarray push_back() with back() will make heap_use_after_free when podarray is reach capacity which would may make heap free 
2. add cases for csv format for nested types. and csv file has two define which are without quote or just like json text
2023-10-16 11:13:44 +08:00
dfc7d04626 [fix](functions) add quantile_state_empty function signature (#25306) 2023-10-16 11:05:48 +08:00
9649e09aaa [feature](function) support bitmap type in min/max_by agg function (#25430)
support bitmap type in min/max_by agg function
2023-10-16 11:05:32 +08:00
97c0af1a80 [fix](build) aarch64 compilation fix # (#25443)
Issue: #25442

Compilation to include execinfo when building on aarch64
2023-10-16 09:53:50 +08:00
08f305dd79 [chore](build) Fix compilation errors reported by GCC-13 (#25439)
1. Fix lots of compilation errors reported by GCC-13.
2. Fix the workflow BE UT (macOS).
2023-10-15 07:57:36 -05:00
7ea456ef91 [fix](insert) make group commit wal_manager exit elegantly (#25250) 2023-10-14 23:14:06 +08:00
e5ef0aa6d4 [refactor](mysql result format) use new serde framework to tuple convert (#25006) 2023-10-14 19:46:42 +08:00
de03c152ce [fix](thrift)cancel thrift msg max size limit (#25194)
On Thrift 0.14.0+, need use TConfiguration to raise the max message size.
see https://github.com/apache/arrow/pull/11123/files
2023-10-13 20:21:36 +08:00
3e83fb8729 [enhancement](compaction) record base compaction schedule time and status (#25283) 2023-10-13 19:51:55 +08:00
9f67bcf380 [chore](format) fix tablet_meta.cpp (#25410)
fix format error introduced by #25124
The clang format check had a bug before, so PR 25124 can pass the check at that time.
2023-10-13 17:58:54 +08:00
789210bc38 [chore](format) Refactor BaseTablet _full_name by using fmt replacing stringstream (#25400)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-10-13 03:59:03 -05:00
ac8fbdd53c [pipelineX](fix) Fix use-after-free in shuffling (#25409) 2023-10-13 16:57:34 +08:00
37dbda6209 [pipelineX](refactor) Use class template to simplify join (#25369) 2023-10-13 16:51:55 +08:00
Pxl
f4e2eb6564 remove unused code and adjust clang-tidy checks (#25405)
remove unused code and adjust clang-tidy checks
2023-10-13 16:27:37 +08:00
2ec53ff60e [fix](multi-table) fix single stream multi table load can not finish (#25379) 2023-10-13 15:47:16 +08:00
283bd59eba [improvement](scanner) Remove the predicate that is always true for the segment (#25366)
By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
2023-10-13 15:25:38 +08:00
9cc0e9526a [enhancement](merge-on-write) consider version count on size-based cu compaction policy (#25352) 2023-10-13 14:52:21 +08:00