Commit Graph

1030 Commits

Author SHA1 Message Date
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
f38ecd349c [enhancement](memory) return error if allocate memory failed during add rows method (#35085)
* return error when add rows failed

* f

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-05-22 00:53:34 +08:00
fb28d0b185 [BUG] fix scan range boundary handling is incorrect (#34832)
fix scan range boundary handling is incorrect
Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>
2024-05-21 13:00:50 +08:00
c0fd98abe5 [Fix](tvf) Fix that tvf reading empty files in compressed formats. (#34926)
1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
2024-05-21 12:59:31 +08:00
42425808a1 [Cherry-Pick](branch-2.1) Pick "Fix multiple replica partial update auto inc data inconsistency problem #34788" (#35056)
* [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788)

* **Problem:** For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas.

**Cause:** Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE.

**Solution:** Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs.

* 2

* [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)
2024-05-20 15:43:46 +08:00
9b5028785d [fix](prepare) fix datetimev2 return err when binary_row_format (#34662)
fix datetimev2 return err when binary_row_format. before pr, Backend return datetimev2 alwary by to_string.
fix datatimev2 return metadata loss scale.
2024-05-18 18:37:41 +08:00
cc00666be6 [opt](inverted index) add inlist condition handling to compound (#34134)
1. Previously, the compound did not support the inlist condition, which could impact performance if an inverted index was created.
2024-05-10 14:35:47 +08:00
39fdc9ba0c [refactor](executor)Rename workload schedule policy #34497 2024-05-08 08:35:20 +08:00
299d069da9 Fix alter policy failed (#33910) 2024-04-22 22:33:24 +08:00
03c3419265 [Refactor](executor)Add workload schedule policy table (#33729) 2024-04-20 20:06:34 +08:00
25358564ca [Fix](compile) Fix gcc compile on master (#33864)
This is imported by #33511. wrongly used

ColumnStr<T> ();

which violate C++20 standard(see https://wg21.cmeerw.net/cwg/issue2237) but still supported by clang up until now(see llvm/llvm-project#58112)
2024-04-19 23:41:37 +08:00
74590e4836 [refine](node) Remove the cse DCHECK from the constructor (#33856)
It's possible that a failure in the fe caused the check to fail, and at that moment, it may not be possible to retrieve the corresponding query ID from be.out.
2024-04-19 23:41:37 +08:00
1300317723 [Exec](join) Support column string64 to avoid join failed in string size overflow the uint32 (#33511) (#33850) 2024-04-18 19:43:08 +08:00
06a155abb0 [branch-2.1](cherry-pick) Pick some partial-update PR from master (#33639)
* [Fix](partial-update) Fix partial update fail when the datetime default value is 'current_time' (#32926)

* Problem: When importing data that includes datetime with a default value of current time for partial column updates, the import fails.
Reason: Partial column updates do not handle the logic for datetime default values.
Solution: During partial column updates, when the default value is set to current time, read the current time from the runtime state and write it into the data.

* [Enhancement](partial update)Add timezone case for partial update timestamp #33177

* [fix](partial update) Support partial update when the date default value is 'current_date'. This PR is a extension of PR #32926. (#33394)
2024-04-17 23:42:12 +08:00
face7c42fd [enhancement](plsql) Support select * from routines (#32866)
Support show of plsql procedure using select * from routines.
2024-04-17 23:42:12 +08:00
1be753ed75 [enhancement](mysql compatible) add user and procs_priv tables to mysql db in all catalogs (#33058)
Issue Number: close #xxx

This PR aims to enhance the compatibility of BI tools (such as Dbeaver, DataGrip) when using the mysql connector to connect to Doris, because some BI tools query some tables in the mysql database. In our tests, the user and procs_priv tables were mainly queried. This PR adds these two tables and adds actual data to the user table. However, please note that most of the fields in the user table are in Doris' own format rather than mysql format, so it can only ensure that the BI tool is querying No error is reported when accessing these tables, which does not guarantee that the data is completely displayed, and the tables under Doris's mysql database do not support data modification.
Thanks to @liujiwen-up for assisting in testing
2024-04-17 23:42:12 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
f7d52b5b1c [feature](expr) add type check when expr prepare (#33330) 2024-04-11 09:31:50 +08:00
b0b5f84e40 [feature](load) support compressed JSON format data for broker load (#30809) 2024-04-10 14:20:53 +08:00
8e19cdd745 [featrue](expr) support common subexpression elimination be part (#32673) 2024-04-10 11:56:21 +08:00
97a2977f2a [improvement](executor)Add tag property for workload group #32874 2024-04-10 11:34:29 +08:00
e574b35833 [Enhancement](partition) Refine some auto partition behaviours (#32737) (#33412)
fix legacy planner grammer
fix nereids planner parsing
fix cases
forbid auto range partition with null column
fix CreateTableStmt with auto partition and some partition items.
1 and 2 are about #31585
doc pr: apache/doris-website#488
2024-04-09 15:51:02 +08:00
fae55e0e46 [Feature](information_schema) add processlist table for information_schema db (#32511) 2024-04-07 23:24:22 +08:00
ad2d20348a [fix](pipeline) fix use error row desc when origin block clear #32803 (#32849)
* fix

* add case
2024-03-26 20:02:46 +08:00
326a264fcd [Improvement](executor)Add spill property for workload group #32554 2024-03-22 16:38:19 +08:00
baf3ae1a93 [refactor](nereids)unify outputTupleDesc and projection be part (#32439) 2024-03-22 16:35:43 +08:00
fd0bc720e9 [opt](information_schema) Add DEFAULT_ENCRYPTION column to schemata table (#32501) 2024-03-22 08:52:16 +08:00
2e564036ef [fix](profile) avoid update profile in deconstructor (#32131)
In previous, the counter in `profile` may be updated when close the file reader.
And the file reader may be closed when the object being deconstruted.
But at that time, the `profile` object may already be deleted, causing NPE and BE will crash.

This PR try to fix this issue:

1. Remove the "profile counter update" logic from all `close()` method.

2. Add a new interface `ProfileCollector`

	It has 2 methods:
	
	- `collect_profile_at_runtime()`

		It can be called at runtime, eg, in every `get_next_block()` method.
		So that the counter in profile can be updated at runtime.
		
	- `collect_profile_before_close()`

		Should be called before the object call `close()`. And it will only be called once.
		
3. Derived from `ProfileCollector`

	All classes which may update the profile counter in `close()` method should extends
	the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()`
	
	And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.
2024-03-21 14:07:22 +08:00
0990014e94 [fix](datetime) fix datetime rounding on BE (#32075) 2024-03-21 14:07:19 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
83ab61ad22 Add QUEUE_START_TIME/QUEUE_END_TIME/QUERY_STATUS column for active_queries (#32259) 2024-03-16 20:53:46 +08:00
258dcfca97 [Refactor](executor)Add information_schema.workload_groups (#32195) (#32314) 2024-03-15 20:46:54 +08:00
df5ec16d7c [Refactor](exectuor)Add schema type table active_queries (#32057)
* Add schema type table active_queries
2024-03-15 17:57:28 +08:00
b031c95324 [Opt](exec) use libbase64 to replace base64 code in doris (#32078)
* [Opt](exec) use libbase64 to replace base64 code in doris
2024-03-14 09:20:50 +08:00
c5390d00bb [Improvement]Add schema table backend_active_tasks (#31945) 2024-03-09 19:55:48 +08:00
7c30cb20fd [Fix](partial update) Fix partial update load false when schema includes auto increment column (#31725)
Problem:
When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs.

Reason:
The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error.

Solution:
Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.
2024-03-06 13:06:27 +08:00
3777ffb43f [enhancement](nereids)support null partition for list partition (#31613) 2024-03-06 13:05:22 +08:00
d8b9909675 [Fix](Status) Handle returned Status correctly #31434 2024-03-01 04:25:43 +08:00
92e3b31f50 [feature](invert index) match_phrase_edge feature added (#31142) 2024-02-29 19:51:18 +08:00
b177b26d39 [branch-2.1](tracing) Pick pipeline tracing and relative bugfix (#31367)
* [Feature](pipeline) Trace pipeline scheduling (part I) (#31027)

* [fix](compile) Fix performance compile fail #31305

* [fix](compile) Fix macOS compilation issues for PURE macro and CPU core identification (#31357)

* [fix](compile) Correct PURE macro definition to fix compilation on macOS

* 2

---------

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2024-02-29 08:42:35 +08:00
8fc9d80479 [compatibility](MySQL) update charset to utf8mb4, collation to utf8mb4_0900_bin (#31046)
Doris's behaviour is more like utf8mb4 and utf8mb4_0900_bin than utf8 and utf8_general_ci
2024-02-21 17:01:39 +08:00
eaaab33f0a [Fix](Top-N opt) evicting quering rowsets in prior to correct use_count (#102) (#30904)
This addresses the scenario where a rowset cannot be removed.
2024-02-16 10:16:40 +08:00
0d32aeeaf6 [improvement](load) Enable lzo & Remove dependency on Markus F.X.J. Oberhumer's lzo library (#30573)
Issue Number: close #29406

1. increase lzop version to 0x1040,
    I set to 0x1040 only for decompressing lzo files compressed by higher version of lzop,
	no change of decompressing logic,
	actully, 0x1040 should have "F_H_FILTER" feature,
	but it mainly for audio and image data, so we do not support it.
2. use orc::lzoDecompress() instead of lzo1x_decompress_safe() to decompress lzo data
3. use crc32c::Extend() instead of lzo_crc32()
4. use olap_adler32() instead of lzo_adler32()
5. thus, remove dependency of Markus F.X.J. Oberhumer's lzo library
6. remove DORIS_WITH_LZO, so lzo file are supported by stream and broker load by default
7. add some regression test
2024-02-05 22:00:24 +08:00
1ac5b45180 [fix](invert index) fixed the issue of insufficient index idx generation during partial column updates. (#30678) 2024-02-01 19:01:08 +08:00
221308f78a [fix](datatype) fix bugs for IPv4/v6 datatype and add some basic regression test cases (#30261) 2024-01-31 23:53:39 +08:00
0433b8730d [Feature](profile)add shuffle send rows/bytes #30456 2024-01-28 18:25:08 +08:00
d191809372 [fix](pipeline) Fix non-prepared execute of UnionOperator (#30355) 2024-01-27 09:11:44 +08:00
ce5ba61640 [refactor](close)Full refactor async writer (#30082)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-23 13:22:15 +08:00
24ed3e4103 [Fix](Expr&code-style) check prepare&open before every VExpr execute (#26673) 2024-01-23 10:09:54 +08:00
1b1e088e83 [fix](exec_node) crashing caused by cancelled query in ExecNode (#30192) 2024-01-23 10:09:54 +08:00