Commit Graph

1809 Commits

Author SHA1 Message Date
55a6649da9 [fix](testcase) fix test case failure of insert null value into not null column (#20963) 2023-06-20 20:46:07 +08:00
190debaac9 [Improvement](load) single partition load optimize (#20876)
1. When creating a single partition,partition and tablet are not looked up for each row of data
2. Only DISTRIBUTED BY random
2023-06-20 20:29:39 +08:00
c85271d2ae [Fix](orc-reader) Fix filter size mismatch in orc reader. (#20998)
Fix filter size mismatch in orc reader introduced by #20806
2023-06-20 12:27:16 +08:00
923f7edad0 [opt](hudi) using native reader to read the base file with no log file (#20988)
Two optimizations:
1. Insert string bytes directly to remove decoding&encoding process.
2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.
2023-06-20 11:20:21 +08:00
824bc02603 [Function] Support date function: microsecond() (#20044) 2023-06-20 10:32:54 +08:00
26cca5e00a [Enhancement](tvf) Add frontends table-valued-function (#20857) 2023-06-19 13:57:40 +08:00
08fff8923f [improvement](serde) Optimizing the performance of mysql result writter (#20928)
When converting query results into MySQL format, it involves transforming columnar data storage into row-based storage. This process raises the question of choosing between sequential reading and sequential writing. In reality, sequential writing is the better choice for performance optimization.

Test with 9M rows contains more than 20 columns, this patch can reduce the conversion time from 20s to 11s.
2023-06-19 12:29:01 +08:00
fb9fcf460a [fix](leftjoin) fix bug of left and full join with other conjuncts (#20946)
Fix bug of left and full outer join with other conjuncts. When equal matched row count of a probe row exceed batch_size, some times the _join_node->_is_any_probe_match_row_output flag is not set correcty, which result in outputing extra rows for the probe row.
2023-06-19 12:27:06 +08:00
Pxl
85c5d7c6a9 [Chore](materialized-view) add ssb_flat mv test case (#20869)
add ssb_flat mv test case
2023-06-19 10:51:50 +08:00
d6b7640cf0 [fix](inverted index) fix check failed for block erase temp column (#20924) 2023-06-18 19:27:48 +08:00
0a59580aa4 [Enhancement](function) fix compatibility issues of sum/count during upgrade process (#20890)
in order to solve agg of sum/count is not compatibility during the upgrade process.
in PR [refactor](agg_state) refactor agg_state type to support fixed length object type #20370 have changed the serialize type and serialize column of sum/count
before is ColumnVector, now sum/count change to use ColumnFixedLengthObject
so during the upgrade process, will be not compatible if exist Old BE and Newer BE
2023-06-17 12:51:01 +08:00
ab32299ba4 [feature](nereids) Support multi target rf #20714
Support multi target runtime filter, mainly for set operation, such as union/intersect/except.
2023-06-16 20:26:00 +08:00
97135a1cbb [Feature] (json)add json_contains function (#20824) 2023-06-16 15:10:12 +08:00
b7a50a09fe [Opt](orc-reader) Optimize orc reader by dict filtering. (#20806)
Optimize orc reader by dict filtering.  It is similar with #17594.
Test result
**ssb-flat-100**: (3 nodes)
| Query        | before opt           | after opt  |
| ------------- |:-------------:| ---------:|
Q1.1 | 1.239 | 1.145
Q1.2 | 1.254 | 1.128
Q1.3 | 1.931 | 1.644
Q2.1 | 1.359 | 1.006
Q2.2 | 1.229 | 0.674
Q2.3 | 0.934 | 0.427
Q3.1 | 2.226 | 1.712
Q3.2 | 2.042 | 1.562
Q3.3 | 1.631 | 1.021
Q3.4 | 1.618 | 0.732
Q4.1 | 2.294 | 1.858
Q4.2 | 2.511 | 1.961
Q4.3 | 1.736 | 1.446
total | 22.004 | 16.316
2023-06-16 13:11:37 +08:00
420603317b [improve](match) Improve performance for match query without inverted index (#20815) 2023-06-16 10:35:38 +08:00
bb5d36c5cb [Log](load) change VLOG to INFO when write replica failing (#20783) 2023-06-15 16:11:14 +08:00
Pxl
b6835840f7 [Bug](table-function) return InvalidArgument when explode_split meet empty delimiter (#20795)
return InvalidArgument when explode_split meet empty delimiter
2023-06-15 15:17:22 +08:00
Pxl
01e53f4e67 [Bug](materialized-view) fix problems about create mv on ssb_flat q4.1 failed (#20658)
fix problems about create mv on ssb_flat q4.1 failed
2023-06-15 14:38:21 +08:00
Pxl
17a395f5e3 [Bug](runtime-filter) fix runtime filter not register on vdata_gen_scan_node (#20787)
fix runtime filter not register on vdata_gen_scan_node
2023-06-15 14:06:14 +08:00
2151f5d04d [fix](bitmap) fix bug: incorrect orthogonal bitmap result in some cases (#20819) (#20822)
Issue Number: close #20819

If there is only one aggregation (update finalize) phase, result field will not be updated.

This pr is aim to resolve it.
2023-06-15 14:05:24 +08:00
1ce8f13837 [fix](memory) fix mem tracker in NodeChannel rpc callback (#20779) 2023-06-15 10:35:25 +08:00
460399f214 [fix](profile) remove same profile in join node (#20734) 2023-06-15 08:08:39 +08:00
09d187ec77 [improvement](ck jdbc) Optimized reading of datetime and ip types of the ClickHouse JDBC Catalog (#20804) 2023-06-14 23:28:08 +08:00
bb617ee2cc [fix](parquet-reader)fix page v2 header offset (#20814)
fix page v2 header offset.
get correct offset when read next page in file.
2023-06-14 23:27:31 +08:00
31a4f96f01 [refactor](exprcontext) move close to expr context's dector method (#20747)
The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.
2023-06-14 18:01:07 +08:00
Pxl
a0d4f11667 [Bug](function) catch error state in function cast to avoid core dump (#20751)
catch error state in function cast to avoid core dump
2023-06-14 17:34:34 +08:00
0f470fec0e [Bug](topn opt) Fix Two-Phase read when some rowset swept (#20732)
* [Bug](topn opt) Fix Two-Phase read when some rowset swept

If this is a Two-Phase read query, and we need to delay the release of Rowset by row->update_delayed_expired_timestamp() to expand the lifespan of rowsets. This is necessary to avoid data loss during the second phase reading, where some stale rowsets may be swept and result in missing data.
2023-06-14 15:46:29 +08:00
Pxl
9244cb6553 [Chore](runtime-filter) do not make query fail when rf publish failed (#20742)
do not make query fail when rf publish failed
2023-06-13 18:23:46 +08:00
feb21fc9e9 [fix](group_concat) use default seperator ',' instead of ', ' for group_concat, to be consistant with mysql (#20741) 2023-06-13 17:20:29 +08:00
2dddab03a1 [compatibility](schema cache) ensure schema version when using schema cache (#20729)
When FE is old version, be is new version, issue a schema change(add column) and
then query, old version of FE query without schema version could result in reading
stale schema from schema cache
2023-06-13 15:19:26 +08:00
4b15185e25 [improvement](hdfs) add parquet footer cache and hdfs file handle cache (#20544)
1. Add hdfs file handle cache for hdfs file reader

    Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team)
    This is a lru cache that can store multi entries with same key.
    The key is build with {file name + modification time}
    The value is the hdfsFile pointer that point to a certain hdfs file.
    
    This cache is to avoid reopen same hdfs file mutli time, which can save
    query time.
    
    Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number
    of file handle cache, default is 20000.

2. Add file meta cache

	The file meta cache is a lru cache. the key is {file name + modification time},
	the value is the parsed file meta info of the certain file, which can save
	the time of re-parsing file meta everytime.
	Currently, it is only used for caching parquet file footer.
	
The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0
in query profile, which can save time when there are lots of files to read.
2023-06-13 15:13:57 +08:00
Pxl
e010fa8d4f [Chore](runtime filter) remove runtime filter ready_for_publish/publish_finally (#20593) 2023-06-13 11:20:49 +08:00
57656b2459 [Enhancement](java-udf) java-udf module split to sub modules (#20185)
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.

Co-authored-by: lexluo <lexluo@tencent.com>
2023-06-13 09:41:22 +08:00
51bbf17786 [Refactor](Profile) Add and refactor the join profile (#20693) 2023-06-13 09:06:51 +08:00
73ad885e19 [Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679)
After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables.

Support hive3 transactional hive full acid tables.
Hive2 transactional hive full acid tables need to run major compactions.
2023-06-13 08:55:16 +08:00
1433544c56 [fix](case expr) fix coredump of case for null value 3 #20711 2023-06-12 20:58:01 +08:00
9d47c6a871 [fix](columnstring) fix bug of columnstring prefetch (#20698) 2023-06-12 17:03:44 +08:00
99c0592157 [Feature](array-function) Support array_pushback function #17417 (#19988)
Implement array_pushback.

mysql> select array_pushback([1, 2], 3);
+--------------------------------+
| array_pushback(ARRAY(1, 2), 3) |
+--------------------------------+
| [1, 2, 3]                      |
+--------------------------------+
1 row in set (0.01 sec)
2023-06-12 16:51:12 +08:00
ea264ce9de [Opt](join) short circuit probe for join node (#20585)
Support the _short_circuit_for_probe for join node
2023-06-12 16:01:09 +08:00
0b228b3414 [fix](load)Support load json data with default value (#20624)
* support json default value

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-06-12 14:51:31 +08:00
Pxl
7f8c5c81e7 [Feature](agg_state) support agg_state combinator on nereids (#20164)
support agg_state combinator on nereids
2023-06-12 12:49:26 +08:00
14f59bef1d [improvement](profile)add sum/avg rpc time (#20511) 2023-06-12 11:34:49 +08:00
4c340f2851 [Feature] (Multi-Catalog) support query hll column in doris jdbc table - part 1 (#19413)
Issue Number: close #17895
2023-06-12 11:16:19 +08:00
a6f625676b [profile](remove child) child is for node, should not be used to organize counters (#20676)
Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label.

                          -  MemoryUsage:  
                              -  HashTable:  23.98  KB
                              -  SerializeKeyArena:  446.75  KB
Add a new macro ADD_LABEL_COUNTER to add just a label in the profile.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-12 10:00:35 +08:00
Pxl
ab7ac31d89 [Chore](case) fix failed on test_big_pad when enable pipeline engine #20644 2023-06-12 09:15:55 +08:00
a347063390 [fix](case expr) fix coredump of case for null value 2 (#20635)
fix coredump of case for null value 2
2023-06-11 23:08:53 +08:00
9a83d78dfe [Enhancement](hudi) support hudi mor table, step2 follow #19909 (#20570)
PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries.
Key Implementations:
1. Use hudi meta information to generate the table schema, not from hive client.
2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory.
3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog.
4. Read the COW table originally from c++.
5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.
2023-06-10 12:25:53 +08:00
656b9ad3da [enhancement](index) Nereids support no need to read raw data for index column that only in filter conditions (#20605) 2023-06-09 21:54:48 +08:00
93b53cf2f4 [improvement](exception-safe) create and prepare node/sink support exception safe (#20551) 2023-06-09 21:06:59 +08:00
abb2048d5d [performance](executor) remove repeated call within the loop in validate_column 2023-06-09 19:59:25 +08:00