Commit Graph

1014 Commits

Author SHA1 Message Date
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
f7d52b5b1c [feature](expr) add type check when expr prepare (#33330) 2024-04-11 09:31:50 +08:00
b0b5f84e40 [feature](load) support compressed JSON format data for broker load (#30809) 2024-04-10 14:20:53 +08:00
8e19cdd745 [featrue](expr) support common subexpression elimination be part (#32673) 2024-04-10 11:56:21 +08:00
97a2977f2a [improvement](executor)Add tag property for workload group #32874 2024-04-10 11:34:29 +08:00
e574b35833 [Enhancement](partition) Refine some auto partition behaviours (#32737) (#33412)
fix legacy planner grammer
fix nereids planner parsing
fix cases
forbid auto range partition with null column
fix CreateTableStmt with auto partition and some partition items.
1 and 2 are about #31585
doc pr: apache/doris-website#488
2024-04-09 15:51:02 +08:00
fae55e0e46 [Feature](information_schema) add processlist table for information_schema db (#32511) 2024-04-07 23:24:22 +08:00
ad2d20348a [fix](pipeline) fix use error row desc when origin block clear #32803 (#32849)
* fix

* add case
2024-03-26 20:02:46 +08:00
326a264fcd [Improvement](executor)Add spill property for workload group #32554 2024-03-22 16:38:19 +08:00
baf3ae1a93 [refactor](nereids)unify outputTupleDesc and projection be part (#32439) 2024-03-22 16:35:43 +08:00
fd0bc720e9 [opt](information_schema) Add DEFAULT_ENCRYPTION column to schemata table (#32501) 2024-03-22 08:52:16 +08:00
2e564036ef [fix](profile) avoid update profile in deconstructor (#32131)
In previous, the counter in `profile` may be updated when close the file reader.
And the file reader may be closed when the object being deconstruted.
But at that time, the `profile` object may already be deleted, causing NPE and BE will crash.

This PR try to fix this issue:

1. Remove the "profile counter update" logic from all `close()` method.

2. Add a new interface `ProfileCollector`

	It has 2 methods:
	
	- `collect_profile_at_runtime()`

		It can be called at runtime, eg, in every `get_next_block()` method.
		So that the counter in profile can be updated at runtime.
		
	- `collect_profile_before_close()`

		Should be called before the object call `close()`. And it will only be called once.
		
3. Derived from `ProfileCollector`

	All classes which may update the profile counter in `close()` method should extends
	the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()`
	
	And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.
2024-03-21 14:07:22 +08:00
0990014e94 [fix](datetime) fix datetime rounding on BE (#32075) 2024-03-21 14:07:19 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
83ab61ad22 Add QUEUE_START_TIME/QUEUE_END_TIME/QUERY_STATUS column for active_queries (#32259) 2024-03-16 20:53:46 +08:00
258dcfca97 [Refactor](executor)Add information_schema.workload_groups (#32195) (#32314) 2024-03-15 20:46:54 +08:00
df5ec16d7c [Refactor](exectuor)Add schema type table active_queries (#32057)
* Add schema type table active_queries
2024-03-15 17:57:28 +08:00
b031c95324 [Opt](exec) use libbase64 to replace base64 code in doris (#32078)
* [Opt](exec) use libbase64 to replace base64 code in doris
2024-03-14 09:20:50 +08:00
c5390d00bb [Improvement]Add schema table backend_active_tasks (#31945) 2024-03-09 19:55:48 +08:00
7c30cb20fd [Fix](partial update) Fix partial update load false when schema includes auto increment column (#31725)
Problem:
When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs.

Reason:
The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error.

Solution:
Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.
2024-03-06 13:06:27 +08:00
3777ffb43f [enhancement](nereids)support null partition for list partition (#31613) 2024-03-06 13:05:22 +08:00
d8b9909675 [Fix](Status) Handle returned Status correctly #31434 2024-03-01 04:25:43 +08:00
92e3b31f50 [feature](invert index) match_phrase_edge feature added (#31142) 2024-02-29 19:51:18 +08:00
b177b26d39 [branch-2.1](tracing) Pick pipeline tracing and relative bugfix (#31367)
* [Feature](pipeline) Trace pipeline scheduling (part I) (#31027)

* [fix](compile) Fix performance compile fail #31305

* [fix](compile) Fix macOS compilation issues for PURE macro and CPU core identification (#31357)

* [fix](compile) Correct PURE macro definition to fix compilation on macOS

* 2

---------

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2024-02-29 08:42:35 +08:00
8fc9d80479 [compatibility](MySQL) update charset to utf8mb4, collation to utf8mb4_0900_bin (#31046)
Doris's behaviour is more like utf8mb4 and utf8mb4_0900_bin than utf8 and utf8_general_ci
2024-02-21 17:01:39 +08:00
eaaab33f0a [Fix](Top-N opt) evicting quering rowsets in prior to correct use_count (#102) (#30904)
This addresses the scenario where a rowset cannot be removed.
2024-02-16 10:16:40 +08:00
0d32aeeaf6 [improvement](load) Enable lzo & Remove dependency on Markus F.X.J. Oberhumer's lzo library (#30573)
Issue Number: close #29406

1. increase lzop version to 0x1040,
    I set to 0x1040 only for decompressing lzo files compressed by higher version of lzop,
	no change of decompressing logic,
	actully, 0x1040 should have "F_H_FILTER" feature,
	but it mainly for audio and image data, so we do not support it.
2. use orc::lzoDecompress() instead of lzo1x_decompress_safe() to decompress lzo data
3. use crc32c::Extend() instead of lzo_crc32()
4. use olap_adler32() instead of lzo_adler32()
5. thus, remove dependency of Markus F.X.J. Oberhumer's lzo library
6. remove DORIS_WITH_LZO, so lzo file are supported by stream and broker load by default
7. add some regression test
2024-02-05 22:00:24 +08:00
1ac5b45180 [fix](invert index) fixed the issue of insufficient index idx generation during partial column updates. (#30678) 2024-02-01 19:01:08 +08:00
221308f78a [fix](datatype) fix bugs for IPv4/v6 datatype and add some basic regression test cases (#30261) 2024-01-31 23:53:39 +08:00
0433b8730d [Feature](profile)add shuffle send rows/bytes #30456 2024-01-28 18:25:08 +08:00
d191809372 [fix](pipeline) Fix non-prepared execute of UnionOperator (#30355) 2024-01-27 09:11:44 +08:00
ce5ba61640 [refactor](close)Full refactor async writer (#30082)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-23 13:22:15 +08:00
24ed3e4103 [Fix](Expr&code-style) check prepare&open before every VExpr execute (#26673) 2024-01-23 10:09:54 +08:00
1b1e088e83 [fix](exec_node) crashing caused by cancelled query in ExecNode (#30192) 2024-01-23 10:09:54 +08:00
f66f6b2a82 [refactor](close) refactor ispendingfinish logic and close logic to do close more quickly (#30021) 2024-01-23 10:06:05 +08:00
9e30a67a2a [Improve](topn opt) avoid crash when rpc returned row contains duplicated row entry (#29872)
1. Add more info to trace potential bug and avoid crash
2. use correct permutation size to do `column->permute`
2024-01-16 18:40:31 +08:00
ebfbe0c8dd [opt](information_schema) support information_schema in external catalog (#28919)
Add `information_schema` database for all catalog.
This is useful when using BI tools to connect to Doris,
the tools can get meta info from `information_schema`.

This PR mainly changes:

1. There will be a `information_schema` db in each catalog.
2. Each `information_schema` db only store the meta info of the catalog it belongs to.
3. For `information_schema`, the `TABLE_SCHEMA` column's value is the database name.
4. There is a new global variable `show_full_dbname_in_info_schema_db`, default is false, if set to true,
    The `TABLE_SCHEMA` column's value is the like `ctl.db`, because:

	When connect to Doris, the `database` info in connection url will be: `xxx?db=ctl.db`.
	
	And then some BI will try to query `information_schema` with sql like:
	
	`select * from information_schema.columns where TABLE_SCHEMA = "ctl.db"`
	
	So it has to be format as `ctl.db`
	
	eg, the `information_schema.columns` table in external catalog `doris` is like:
	
	```
	mysql> select * from information_schema.columns limit 1\G
	*************************** 1. row ***************************
	           TABLE_CATALOG: doris
	            TABLE_SCHEMA: doris.__internal_schema
	              TABLE_NAME: column_statistics
	             COLUMN_NAME: id
	        ORDINAL_POSITION: 1
	          COLUMN_DEFAULT: NULL
	             IS_NULLABLE: NO
	               DATA_TYPE: varchar
	CHARACTER_MAXIMUM_LENGTH: 4096
	  CHARACTER_OCTET_LENGTH: 16384
	       NUMERIC_PRECISION: NULL
	           NUMERIC_SCALE: NULL
	      DATETIME_PRECISION: NULL
	      CHARACTER_SET_NAME: NULL
	          COLLATION_NAME: NULL
	             COLUMN_TYPE: varchar(4096)
	              COLUMN_KEY:
	                   EXTRA:
	              PRIVILEGES:
	          COLUMN_COMMENT:
	             COLUMN_SIZE: 4096
	          DECIMAL_DIGITS: NULL
	   GENERATION_EXPRESSION: NULL
	                  SRS_ID: NULL
	```
	
6. Modify the behavior of

	- show tables
	- shwo databases
	- show columns
	- show table status

	The above statements may query the `information_schema` db if there is `where` predicate after them
2024-01-12 13:58:19 +08:00
0d691c638b [Feature](profile)Support report runtime workload statistics #29591 2024-01-12 11:59:27 +08:00
bd8113f424 [bugfix](scannerscheduler) should minus num_of_scanners before check should schedule #28926 (#29331)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-03 20:47:35 +08:00
2c4e52e44e [fix](es catalog) only es_query function can push down to ES (#29320)
Issue Number: close #29318 
1. Only push down `es_query` function to ES
2. Add null check where ES query result not have `_source` or `fields` fields.
2023-12-30 09:33:26 +08:00
a525d5c5a3 [refactor](decimal) change type name Decimal128 to Decimal128V2, Decimal128I to Decimal128V3 to avoid confusion (#29265)
change type name Decimal128 to Decimal128V2, Decimal128I to Decimal128V3 to avoid confusion
2023-12-29 10:11:44 +08:00
c75e63a2a5 [Improvement](scan) Use scanner to do projection of scan node (#29124) 2023-12-27 16:00:52 +08:00
7081139bdc [fix](block) fix be core while mutable block merge may cause different row size between columns in origin block (#27943) 2023-12-25 20:35:22 +08:00
1545c36d16 Revert "[bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727)" (#28931)
This reverts commit 4066de375efe6ff8e156a61df4f9316b3d9eaa4e.
2023-12-24 20:37:33 +08:00
4066de375e [bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727) 2023-12-23 11:09:46 +08:00
0b9b1be1f1 [fix](function) Fix from_second functions overflow and wrong result (#28685) 2023-12-22 10:22:49 +08:00
bcc32b5b26 [feature](invert index) match_regexp feature added (#28257) 2023-12-20 14:30:35 +08:00
b142ade69e [refactor](renamefile) rename some files according to the class names (#28606) 2023-12-19 14:10:11 +08:00
73f7b61019 [refactor](scanner) use weak ptr to lock task execution context to avoid core in scanner dctor (#28493)
using weak ptr as a lock between fragment execute thread and scanner thread, to solve the core problem in scanner's dctor to access scannode's profile.
2023-12-18 14:09:32 +08:00
1e5ff40e17 [refactor](group commit) remove future block (#27720)
Co-authored-by: huanghaibin <284824253@qq.com>
2023-12-11 08:41:51 +08:00