Commit Graph

6355 Commits

Author SHA1 Message Date
fd1db4da3d [agg](profile) fix incorrent profile (#28004) 2023-12-05 20:48:10 +08:00
05adbfdb3d [feature](inverted index) match_phrase_prefix feature added (#27404)
select count() from test_index_match_phrase_prefix where request match_phrase_prefix 'xxx';
2023-12-05 20:15:13 +08:00
ffa4ea66d5 [enhancement](main) donot coredump when be can not start (#27928) 2023-12-05 20:11:24 +08:00
ea275e687a [pipelineX](minor) remove unused code (#28016) 2023-12-05 19:41:40 +08:00
6074cddcf8 [feature](mtmv)add Job and task tvf (#27967)
add:
select * from jobs("type"="mv");
select * from tasks("type"="mv");
select * from jobs("type"="insert");
select * from tasks("type"="insert");

add check priv for mv_infos("database"="xxx");

change JobType MTMV==>MV
2023-12-05 15:12:36 +08:00
3595f21405 [improvement](executor)clear unused cgroup path (#27798)
* clear unused cgroup path

* use C++ api

* add gcc header
2023-12-05 14:18:23 +08:00
c98b80ae6a [Feature](functions) support ignore and nullable functions (#27848)
support ignore and nullable functions
2023-12-05 14:09:32 +08:00
79f6f85cf1 [FIX](serde)fix datetimev2 serde parse from string with scale (#27965) 2023-12-05 13:58:32 +08:00
54fe1a166b [Refactor](scan) refactor scan scheduler to improve performance (#27948)
* [Refactor](scan) refactor scan scheduler to improve performance

* fix pipeline x core
2023-12-05 13:03:16 +08:00
da87fcb477 [bug](function) fix compound expr coredump problem (#27988) 2023-12-05 13:00:14 +08:00
17016b9797 [improvement](decimal) use new way for decimal arithmetic precision promotion (#27787)
* [DNM](decimal) use new way for decimal arithmetic precision promotion

* [improvement](decimal) [DNM](decimal) use new way for decimal arithmetic precision promotion
1. [DNM](decimal) use new way for decimal arithmetic precision promotion
2. throw exception if it overflows for decimal arithmetics
3. throw exception if it overflows when casting among number types

* fix compile error of gcc

* improvement

---------

Co-authored-by: morrySnow <morrysnow@126.com>
2023-12-05 12:54:40 +08:00
358d73a0ae [FIX](complextype) fix empty quote with complex type (#27942) 2023-12-05 12:25:26 +08:00
fd2e60a2db [fix](move-memtable) exclude memtable insert memory in query tracker (#27953) 2023-12-05 12:04:15 +08:00
a13227cf4b [fix](move-memtable) fix sink v2 profile (#27982) 2023-12-05 11:53:18 +08:00
1ed99c4d8a [Improvement](inverted index) improve inverted index bkd performance in high concurrent scenario (#27820)
Improve BKD performance by enable bkd reader cache and improvement of fast compare and visit in compressed values in BKD tree.
2023-12-05 11:39:53 +08:00
75d0beb8cc [fix](move-memtable) only report load stream profile in the end (#27983) 2023-12-05 11:30:54 +08:00
bd9db7423b [fix](move-memtable) free resources before storage engine stop (#27980) 2023-12-05 11:15:05 +08:00
d69cdf8635 [improve](heartbeat) show more info when receive invalid cluster id (#27975) 2023-12-05 11:10:22 +08:00
a06ac930a0 [refactor](memtable) remove unused stream output (#27889)
Co-authored-by: ziyang zhang <zhangziyang@stu.cdut.edu.cn>
2023-12-05 11:10:10 +08:00
2b4c4bb442 [Fix][Opt](parquet-reader) Fix filter push down with decimal types in parquet reader. (#27897)
Fix filter push down with decimal types in parquet reader introduced by #22842
2023-12-04 22:25:39 +08:00
1afdbfe723 [enhance](BE) Refactor TaskWorkerPool (#27555) 2023-12-04 21:46:10 +08:00
b096062680 [feature-wip](arrow-flight)(step6) Support regression test (#27847)
Design Documentation Linked to #25514

Regression test add a new group: arrow_flight_sql,

./run-regression-test.sh -g arrow_flight_sql to run regression-test, can use jdbc:arrow-flight-sql to run all Suites whose group contains arrow_flight_sql.
./run-regression-test.sh -g p0,arrow_flight_sql to run regression-test, can use jdbc:arrow-flight-sql to run all Suites whose group contains arrow_flight_sql, and use jdbc:mysql to run other Suites whose group contains p0 but does not contain arrow_flight_sql.
Requires attention, the formats of jdbc:arrow-flight-sql and jdbc:mysql and mysql client query results are different, for example:

Datatime field type: jdbc:mysql returns 2010-01-02T05:09:06, mysql client returns 2010-01-02 05:09:06, jdbc:arrow-flight-sql also returns 2010-01-02 05:09 :06.
Array and Map field types: jdbc:mysql returns ["ab", "efg", null], {"f1": 1, "f2": "a"}, jdbc:arrow-flight-sql returns ["ab ","efg",null], {"f1":1,"f2":"a"}, which is missing spaces.
Float field type: jdbc:mysql and mysql client returns 6.333, jdbc:arrow-flight-sql returns 6.333000183105469, in query_p0/subquery/test_subquery.groovy.
If the query result is empty, jdbc:arrow-flight-sql returns empty and jdbc:mysql returns \N.
use database; and query should be divided into two SQL executions as much as possible. otherwise the results may not be as expected. For example: USE information_schema; select cast ("0.0101031417" as datetime) The result is 2000-01-01 03:14:1 (constant fold), select cast ("0.0101031417" as datetime) The result is null (no constant fold),
In addition, doris jdbc:arrow-flight-sql still has unfinished parts, such as:

Unsupported data type: Decimal256. INVALID_ARGUMENT: [INTERNAL_ERROR]Fail to convert block data to arrow data, error: [E3] write_column_to_arrow with type Decimal256
Unsupported null value of map key. INVALID_ARGUMENT: [INTERNAL_ERROR]Fail to convert block data to arrow data, error: [E33] Can not write null value of map key to arrow.
Unsupported data type: ARRAY<MAP<TEXT,TEXT>>
jdbc:arrow-flight-sql not support connecting to specify DB name, such asjdbc:arrow-flight-sql://127.0.0.1:9090/{db_name}", In order to be compatible with regression-test, use db_nameis added before all SQLs whenjdbc:arrow-flight-sql` runs regression test.
select timediff("2010-01-01 01:00:00", "2010-01-02 01:00:00");, error java.lang.NumberFormatException: For input string: "-24:00:00"
2023-12-04 19:23:56 +08:00
a7d1e92fc2 [Fix](variant) handle StorageReadOptions to avoid crash in new_column_iterator_with_path (#27936)
In partial update, read variant without `opt` will lead to crash
2023-12-04 17:02:35 +08:00
2022a8ab32 [fix](invert index) fix reader does not close fd (#27918) 2023-12-04 16:44:50 +08:00
4d1aa131ee [Feature](datatype) add be ut codes for IPv4/v6 (#26534)
Add unit test codes for IP types
2023-12-04 15:25:02 +08:00
a6a6892f90 [chore](status code) avoid print stack for DATA_QUALITY_ERROR (#27935)
issue introduced by #27065
2023-12-04 15:04:27 +08:00
48935c14e2 [Improvement](variant) limit the column size on tablet schema (#27399) (#27785)
1. limit the column count to default 2048
2. fix get_inverted_index return nullptr when variant's unique id is -1, using it's parent unique id instead
3. avoid add same path subcolumn duplicately in tablet schema
4. make extracted column unique id -1
2023-12-04 14:47:36 +08:00
Pxl
2b715924c5 [Chore](function) set normal function use_default_implementation_for_constants to default (#27891)
set normal function use_default_implementation_for_constants to default
2023-12-04 14:19:25 +08:00
Pxl
45a49ac059 Bug](column) support insert default for ColumnFixedLengthObject #27927 2023-12-04 12:52:50 +08:00
e62d19d90d [improve](partition) support auto list partition with more columns (#27817)
before the partition by column only have one column.
now remove those limit, could have more columns.
2023-12-04 11:33:18 +08:00
Pxl
e3d2425d47 [Improvement](join) remove insert_indices_from_join and special judge for -1 (#27779)
remove insert_indices_from_join and special judge for -1
2023-12-04 11:03:22 +08:00
d2a99aa03b [refactor](scan) change scan reschedule into scan context (#27766)
* [refactor](scan) change scan reschedule into scan context
2023-12-04 10:25:52 +08:00
a64656748b [Enhancenment](wal) disable group commit when streamload size is too large (#27781) 2023-12-03 23:05:11 +08:00
97d36b4f38 [fix](csv_reader) fix trim_double_quotes behavior change (#27882) 2023-12-03 22:57:55 +08:00
80d2c7ab41 [feature](parquet)support read parquet lzo compress. (#27706) 2023-12-03 09:55:52 +08:00
fc8b32be7a [Opt](multi-catalog) Opt parquet orc reader numeric copy by memcpy() and memset(). (#27545)
Opt parquet orc reader null map decoding by memset().
2023-12-03 09:55:05 +08:00
be30bd1e40 [improvement](spinlock) remove some potential bad spinlock usage (#27904)
* [improvement](spinlock) remove some potential spinlock usage

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-12-02 20:33:54 +08:00
421ab56c3e [pipelineX](improvement) Support local shuffle for join and agg (#27852) 2023-12-02 20:17:18 +08:00
10483ea12c [fix](profile) fix error set with peak_memory_usage in pipeline #27749 2023-12-02 14:12:38 +08:00
2e1ce758f1 [feature](function) support ip function ipv6numtostring(alias inet6_ntoa) (#27342) 2023-12-02 11:48:19 +08:00
54b5d04ff9 [improve](csv_reader) handle csv reader error (#27892) 2023-12-02 10:05:02 +08:00
Pxl
f65103e2a6 [Chore](runtime-filter) unify interfaces of bloom filter and remove some unused code (#27822)
* unify interfaces of bloom filter and remove some unused code
2023-12-02 07:42:55 +08:00
a1a75fcfbd [fix](runtime filter) Fix extremely high CPU usage caused by rf merge #27894 2023-12-02 07:40:52 +08:00
1706699e7e [fix](multi-catalog)support the max compute partition prune (#27154)
1. max compute partition prune,
we just support filter mc partitions by '=',it can filter just one partition
to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported.

2. add max compute row count cache and partitionValues cache

3. add max compute regression case
2023-12-01 22:28:26 +08:00
68525fc112 [feature](profile) add RuntimeFilterInfo in merge profile #27869 2023-12-01 21:42:25 +08:00
7e3d6bc9f1 [Fix](Variant) Implement ColumnObject::update_hash_with_value (#27873) 2023-12-01 20:14:47 +08:00
007506ce42 [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode (#27842) 2023-12-01 17:32:46 +08:00
18338a33b6 [bugfix](mergeprofile) ignore null profile to avoid bug (#27860)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-12-01 16:56:29 +08:00
137f94eac9 [Bug](func) coredump in equal for null in function (#27844) 2023-12-01 15:48:01 +08:00
60bc3be8a2 [Opt](Compression) Opt zstd block decompression by ZSTD_decompressDCtx(). (#27534)
Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression.
It will improve performance but consume more memory. 

Test result: 
- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 5.2 -> 4.6.
2023-12-01 09:10:32 +08:00