Commit Graph

5298 Commits

Author SHA1 Message Date
58b9bce954 [fix](load) add rowset builder init error handling (#23166) 2023-08-19 17:13:10 +08:00
433a6103ab [Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close (#23182)
Introduced #19389 , removed #20785
2023-08-19 12:13:24 +08:00
0838ff4bf4 [fix](Outfile) fix bug that the fileSize is not correct when outfile is completed (#22951) 2023-08-18 22:31:44 +08:00
26905e36e5 [fix](load) fix nullptr in memtable limiter flush (#23149) 2023-08-18 19:55:53 +08:00
419e922a69 [fix](json)Fix the bug that does not stop when reading json files (#23062)
* [fix](json)Fix the bug that does not stop when reading json files
2023-08-18 18:23:19 +08:00
Pxl
477961dc21 [Chore](agg) refactor of hash map (#22958)
refactor of hash map
2023-08-18 17:59:30 +08:00
f0ad3ef244 [fix](merge-on-write) should use write lock of tablet's header lock in #23047 (#23161) 2023-08-18 17:50:44 +08:00
3d4ec1ac88 [pipeline](exec) support async writer in jdbc sink in pipeline query engine (#23144)
support async writer in jdbc sink in pipeline query engine
2023-08-18 17:07:57 +08:00
1c3cc77a54 [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty (#21236)
* [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty

* add ut

* fix nereids

* fix regression-test
2023-08-18 14:37:49 +08:00
cf368728be [fix](merge-on-write) Fix a typo and remove useless member rowset in CommitTabletTxnInfo (#23151)
Fix a typo in #23078
2023-08-18 14:14:34 +08:00
795006ea3d [fix](multi-catalog) conversion of compatible numerical types (#23113)
Hive support schema change, but doesn't rewrite the parquet file, so the physical type of parquet file may not equal the logical type of table schema.
2023-08-18 14:05:33 +08:00
4f7760a5f4 [bugfix](segment cache) Recycle the fds when drop table (#23081) 2023-08-18 13:31:34 +08:00
e6fe8c05d1 [fix](inverted index change) fix update delete bitmap incompletely when build inverted index on mow table (#23047) 2023-08-18 11:15:39 +08:00
a5ca6cadd6 [Improvement] Optimize count operation for iceberg (#22923)
Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.
2023-08-18 09:57:51 +08:00
de98324ea7 [fix](inverted index change) make mutex for ALTER_INVERTED_INDEX task and STORAGE_MEDIUM_MIGRATE task (#22995) 2023-08-18 08:35:30 +08:00
314f5a5143 [Fix](orc-reader) Fix filling partition or missing column used incorrect row count. (#23096)
[Fix](orc-reader) Fix filling partition or missing column used incorrect row count.

`_row_reader->nextBatch` returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look at `numElements` in the row batch to determine how
many rows were not filtered which will to fill to the block.

In this case, filling partition or missing column used incorrect row count which will cause be crash by `filter.size() != offsets.size()` in filter column step.

When orc lazy materialization is turned off, add `_convert_dict_cols_to_string_cols(block, nullptr)` if `(block->rows() == 0)`.
2023-08-17 23:26:11 +08:00
57568ba472 [fix](be)shouldn't use arena to alloc memory for SingleValueDataString (#23075)
* [fix](be)shouldn't use arena to alloc memory for SingleValueDataString

* format code
2023-08-17 22:18:09 +08:00
29ff7b7964 [fix](merge-on-write) add sentinel mark when do compaction (#23078) 2023-08-17 20:08:01 +08:00
c5c984b79b [refactor](bitmap) using template to reduce duplicate code (#23060)
* [refactor](bitmap) support for batch value insertion

* fix values was not filled for int8 and int16
2023-08-17 18:14:29 +08:00
330f369764 [enhancement](file-cache) limit the file cache handle num and init the file cache concurrently (#22919)
1. the real value of BE config `file_cache_max_file_reader_cache_size` will be the 1/3 of process's max open file number.
2. use thread pool to create or init the file cache concurrently.
    To solve the issue that when there are lots of files in file cache dir, the starting time of BE will be very slow because
    it will traverse all file cache dirs sequentially.
2023-08-17 16:52:08 +08:00
b252c49071 [fix](hash join) fix heap-use-after-free of HashJoinNode (#23094) 2023-08-17 16:29:47 +08:00
e289e03a1a [fix](executor)fix no return with old type in time_round 2023-08-17 15:34:26 +08:00
Pxl
cf1865a1c8 [Bug](scan) fix core dump due to store_path_map (#23084)
fix core dump due to store_path_map
2023-08-17 15:24:43 +08:00
8b51da0523 [Fix](load) fix partiotion Null pointer exception (#22965) 2023-08-17 14:09:47 +08:00
343a6dc29d [improvement](hash join) Return result early if probe side has no data (#23044) 2023-08-17 09:17:09 +08:00
390c52f73a [Improve](complex-type) update for array/map element_at with nested complex type with local tvf (#22927) 2023-08-16 20:47:36 +08:00
a5c73c7a39 [fix](partial update) set io_ctx.reader_type when reading columns for partial update (#22630) 2023-08-16 19:34:39 +08:00
0aa57d159e [Fix](Partial update) Fix wrong position using in segment writer (#22782) 2023-08-16 19:31:06 +08:00
b815cf327a [enhancement](merge-on-write) Add more log info when delete bitmap correctness check failed (#22984) 2023-08-16 17:25:11 +08:00
6cf1efc997 [refactor](load) use smart pointers to manage writers in memtable memory limiter (#23019) 2023-08-16 16:34:57 +08:00
4510e16845 [improvement](delete) support delete predicate on value column for merge-on-write unique table (#21933)
Previously, delete statement with conditions on value columns are only supported on duplicate tables. After we introduce delete sign mechanism to do batch delete, a delete statement with conditions on value columns on unique tables will be transformed into the corresponding insert into ..., __DELETE_SIGN__ select ... statement. However, for unique table with merge-on-write enabled, the overhead of inserting these data can be eliminated. So this PR add the ability to allow delete predicate on value columns for merge-on-write unique tables.
2023-08-16 12:18:05 +08:00
Pxl
d5df3bae25 [Bug](exchange) fix dcheck fail when VDataStreamRecvr input empty block (#22992)
fix dcheck fail when VDataStreamRecvr input empty block
2023-08-16 10:21:19 +08:00
da097629ea [chore](build) Fix the build with MySQL support (#23020) 2023-08-16 09:28:56 +08:00
d3dddeea8a [fix](load) remove incorrect DCHECK in BetaRowsetWriter dtor (#23016)
The DCHECK may not always be right in case of Vertical compaction.
remove it to let DEBUG run.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-08-15 23:55:02 +08:00
61d2f37bdc [fix](jdbc catalog) fix string type insert into odbc table (#22961) 2023-08-15 20:09:38 +08:00
f191736bfe [bug](shuffle) Fix DCHECK failure if exchange node has limit (#22993) 2023-08-15 19:14:37 +08:00
9b2323b7fd [Pipeline](exec) support async writer in pipelien query engine (#22901) 2023-08-15 17:32:53 +08:00
50f66b1246 [fix](pipeline) fix bug of datastream sender when doing BUCKET_SHFFULE_HASH_PARTITIONED shuffle (#22988)
This issue is introduced by #22765, if #22765 is picked to 2.0, then also need to pick this PR.

When shuffle type is BUCKET_SHFFULE_HASH_PARTITIONED, since data of multi buckets maybe sent to the same channel, send eos too early may cause data lost.
2023-08-15 17:30:27 +08:00
f1864d9fcf [fix](function) fix str_to_date with specific format #22981 2023-08-15 15:30:48 +08:00
9b42093742 [feature](agg) Make 'map_agg' support array type as value (#22945) 2023-08-15 14:44:50 +08:00
1d825f57bc [fix](load) expose error root cause msg for load (#22968)
Currently, we only return ambiguous "INTERNAL ERROR" to the user when
load. This commit will no more hide the root cause.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-08-15 13:22:45 +08:00
c2ff940947 [refactor](parquet)change decimal type export as fixed-len-byte on parquet write (#22792)
before the parquet write export decimal as byte-binary,
but can't be import those fied to Hive.
Now, change to export decimal as fixed-len-byte-array in order to import hive directly.
2023-08-15 13:17:50 +08:00
94bf8fb3c5 [performance](executor) optimize time_round function only one arg (#22855) 2023-08-15 13:16:42 +08:00
707a527775 [FIX](map)insert into doris table with array/map type by local tvf (#22955) 2023-08-15 13:11:23 +08:00
Pxl
34399e2965 [Bug](exchange) init _instance_to_rpc_ctx on register_sink (#22976)
init _instance_to_rpc_ctx on register_sink
2023-08-15 13:02:28 +08:00
ce3267fcca [refactor](load) change segcompaction worker interface (#22928) 2023-08-15 11:29:57 +08:00
d431a35721 [Fix](inverted index) fix non-index match function core (#22959) 2023-08-15 11:27:12 +08:00
xy
b5ea3454a6 [Bug](aggregation)fix for map_agg when columns[1] is nullable (#22932)
In the map_agg handler function, added the judgment on columns[1]->is_nullable()
2023-08-15 11:26:03 +08:00
Pxl
3f55d5d4d5 [Chore](excution) change some log fatal and dcheck to exception (#22890)
change some log fatal and dcheck to exception
2023-08-15 10:45:00 +08:00
8318dfa9a3 [fix](datastream sender) fix wrong result of BUCKET_SHFFULE_HASH_PARTITIONED shuffle (#22973)
fix wrong result of BUCKET_SHFFULE_HASH_PARTITIONED shuffle
2023-08-15 10:21:14 +08:00