Commit Graph

987 Commits

Author SHA1 Message Date
1ac5b45180 [fix](invert index) fixed the issue of insufficient index idx generation during partial column updates. (#30678) 2024-02-01 19:01:08 +08:00
221308f78a [fix](datatype) fix bugs for IPv4/v6 datatype and add some basic regression test cases (#30261) 2024-01-31 23:53:39 +08:00
0433b8730d [Feature](profile)add shuffle send rows/bytes #30456 2024-01-28 18:25:08 +08:00
d191809372 [fix](pipeline) Fix non-prepared execute of UnionOperator (#30355) 2024-01-27 09:11:44 +08:00
ce5ba61640 [refactor](close)Full refactor async writer (#30082)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-23 13:22:15 +08:00
24ed3e4103 [Fix](Expr&code-style) check prepare&open before every VExpr execute (#26673) 2024-01-23 10:09:54 +08:00
1b1e088e83 [fix](exec_node) crashing caused by cancelled query in ExecNode (#30192) 2024-01-23 10:09:54 +08:00
f66f6b2a82 [refactor](close) refactor ispendingfinish logic and close logic to do close more quickly (#30021) 2024-01-23 10:06:05 +08:00
9e30a67a2a [Improve](topn opt) avoid crash when rpc returned row contains duplicated row entry (#29872)
1. Add more info to trace potential bug and avoid crash
2. use correct permutation size to do `column->permute`
2024-01-16 18:40:31 +08:00
ebfbe0c8dd [opt](information_schema) support information_schema in external catalog (#28919)
Add `information_schema` database for all catalog.
This is useful when using BI tools to connect to Doris,
the tools can get meta info from `information_schema`.

This PR mainly changes:

1. There will be a `information_schema` db in each catalog.
2. Each `information_schema` db only store the meta info of the catalog it belongs to.
3. For `information_schema`, the `TABLE_SCHEMA` column's value is the database name.
4. There is a new global variable `show_full_dbname_in_info_schema_db`, default is false, if set to true,
    The `TABLE_SCHEMA` column's value is the like `ctl.db`, because:

	When connect to Doris, the `database` info in connection url will be: `xxx?db=ctl.db`.
	
	And then some BI will try to query `information_schema` with sql like:
	
	`select * from information_schema.columns where TABLE_SCHEMA = "ctl.db"`
	
	So it has to be format as `ctl.db`
	
	eg, the `information_schema.columns` table in external catalog `doris` is like:
	
	```
	mysql> select * from information_schema.columns limit 1\G
	*************************** 1. row ***************************
	           TABLE_CATALOG: doris
	            TABLE_SCHEMA: doris.__internal_schema
	              TABLE_NAME: column_statistics
	             COLUMN_NAME: id
	        ORDINAL_POSITION: 1
	          COLUMN_DEFAULT: NULL
	             IS_NULLABLE: NO
	               DATA_TYPE: varchar
	CHARACTER_MAXIMUM_LENGTH: 4096
	  CHARACTER_OCTET_LENGTH: 16384
	       NUMERIC_PRECISION: NULL
	           NUMERIC_SCALE: NULL
	      DATETIME_PRECISION: NULL
	      CHARACTER_SET_NAME: NULL
	          COLLATION_NAME: NULL
	             COLUMN_TYPE: varchar(4096)
	              COLUMN_KEY:
	                   EXTRA:
	              PRIVILEGES:
	          COLUMN_COMMENT:
	             COLUMN_SIZE: 4096
	          DECIMAL_DIGITS: NULL
	   GENERATION_EXPRESSION: NULL
	                  SRS_ID: NULL
	```
	
6. Modify the behavior of

	- show tables
	- shwo databases
	- show columns
	- show table status

	The above statements may query the `information_schema` db if there is `where` predicate after them
2024-01-12 13:58:19 +08:00
0d691c638b [Feature](profile)Support report runtime workload statistics #29591 2024-01-12 11:59:27 +08:00
bd8113f424 [bugfix](scannerscheduler) should minus num_of_scanners before check should schedule #28926 (#29331)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-03 20:47:35 +08:00
2c4e52e44e [fix](es catalog) only es_query function can push down to ES (#29320)
Issue Number: close #29318 
1. Only push down `es_query` function to ES
2. Add null check where ES query result not have `_source` or `fields` fields.
2023-12-30 09:33:26 +08:00
a525d5c5a3 [refactor](decimal) change type name Decimal128 to Decimal128V2, Decimal128I to Decimal128V3 to avoid confusion (#29265)
change type name Decimal128 to Decimal128V2, Decimal128I to Decimal128V3 to avoid confusion
2023-12-29 10:11:44 +08:00
c75e63a2a5 [Improvement](scan) Use scanner to do projection of scan node (#29124) 2023-12-27 16:00:52 +08:00
7081139bdc [fix](block) fix be core while mutable block merge may cause different row size between columns in origin block (#27943) 2023-12-25 20:35:22 +08:00
1545c36d16 Revert "[bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727)" (#28931)
This reverts commit 4066de375efe6ff8e156a61df4f9316b3d9eaa4e.
2023-12-24 20:37:33 +08:00
4066de375e [bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727) 2023-12-23 11:09:46 +08:00
0b9b1be1f1 [fix](function) Fix from_second functions overflow and wrong result (#28685) 2023-12-22 10:22:49 +08:00
bcc32b5b26 [feature](invert index) match_regexp feature added (#28257) 2023-12-20 14:30:35 +08:00
b142ade69e [refactor](renamefile) rename some files according to the class names (#28606) 2023-12-19 14:10:11 +08:00
73f7b61019 [refactor](scanner) use weak ptr to lock task execution context to avoid core in scanner dctor (#28493)
using weak ptr as a lock between fragment execute thread and scanner thread, to solve the core problem in scanner's dctor to access scannode's profile.
2023-12-18 14:09:32 +08:00
1e5ff40e17 [refactor](group commit) remove future block (#27720)
Co-authored-by: huanghaibin <284824253@qq.com>
2023-12-11 08:41:51 +08:00
05adbfdb3d [feature](inverted index) match_phrase_prefix feature added (#27404)
select count() from test_index_match_phrase_prefix where request match_phrase_prefix 'xxx';
2023-12-05 20:15:13 +08:00
e62d19d90d [improve](partition) support auto list partition with more columns (#27817)
before the partition by column only have one column.
now remove those limit, could have more columns.
2023-12-04 11:33:18 +08:00
10483ea12c [fix](profile) fix error set with peak_memory_usage in pipeline #27749 2023-12-02 14:12:38 +08:00
ce271ff382 [fix](parquet)fix can not read parquet lz4 compress. (#27383)
Fixed the problem of not being able to read parquet lz4 compressed format. By default, it is decompressed according to the Hadoop lz4 format. If it fails, it will fall back to the standard lz4 compression format.
2023-11-29 19:04:53 +08:00
f565f60bc3 [refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587) 2023-11-28 13:02:30 +08:00
6ed0be8e3c [refactor](profilev2) unify the counter name in shuffle operator and normal operator (#27267)
using blocksproduced and rowsproduced to unify the counter name in DataStreamSender and other exec node, or exchange operator and other operators.
blocks produced and rows produced are more easy to understand.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-11-20 14:21:39 +08:00
836cda65d8 [refactor](profilev2) split merged profile to a single runtime profile to make the logic more clear (#27184) 2023-11-19 13:21:50 +08:00
2f41e0c823 [FIX](complextype)fix information schema for complex type (#27203)
when we select in information schema , here do not show complex type information
2023-11-18 11:32:32 +08:00
e29d8cb110 [feature](move-memtable) support pipelineX in sink v2 (#27067) 2023-11-16 15:00:55 +08:00
83edcdead9 [enhancement](random_sink) change tablet search algorithm from random to round-robin for random distribution table (#26611)
1. fix race condition problem when get tablet load index
2. change tablet search algorithm from random to round-robin for random distribution table when load_to_single_tablet set to false
2023-11-15 19:55:31 +08:00
d3fd923447 [opt](pipeline) Return InternalError to FE instead of doing a useless DCHECK in ExecNode #27035
Effect: Client will see error message like below when BE meeting plan logical error.

RROR 1105 (HY000): errCode = 2, detailMessage = ([xxx]())[CANCELLED]Logical error during processing VNewOlapScanNode(dr_case_tag), output of projections 2 mismatches with exec node output 3
2023-11-15 18:15:21 +08:00
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
baae7bf339 [fix](information_schema)fix bug that metadata_name_ids error tableid and append information_schema case. (#26238)
fix bug that  #24059 .
Added some information_schema scanner tests.
files
schema_privileges
table_privileges
partitions
rowsets
statistics
table_constraints

Based on infodb_support_ext_catalog=false, it currently includes tests for all tables under the information_schema database.
2023-11-09 14:07:12 +08:00
95f74f1544 [FIX](complextype)fix shrink in topN for complex type #26609 2023-11-09 10:56:14 +08:00
a3666aa87e [feature](decimal) support decimal256 when creating table (#26308) 2023-11-08 15:21:01 +08:00
16644eff7f [opt](load) optimize the performance of row distribution (#25546)
For non-pipeline non-sinkv2:
before: 14s
now: 6s-
For pipeline + sinkv2:
before: 230ms *48 instances
now: 38ms *48 instances
2023-11-07 10:04:59 +08:00
dd8bcc831c [keyword](decimalv2) Add DecimalV2 keyword (#26283) 2023-11-02 16:27:12 +08:00
e20cab64f4 [improvement](scan) avoid too many scanners for file scan node (#25727)
In previous, when using file scan node(eq, querying hive table), the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num`(default is 48).
And if the query parallelism is N, the total number of scanner would be 48 * N, which is too many.

In this PR, I change the logic, the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num / query parallelism`. So that the total number of scanners
will be up to `doris_scanner_thread_pool_thread_num`.

Reduce the number of scanner can significantly reduce the memory usage of query.
2023-10-29 17:41:31 +08:00
d6c64d305f [chore](log) Add log to trace query execution #25739 2023-10-26 14:09:25 +08:00
693982fd1a [feature](decimal) support decimal256 (#25386) 2023-10-25 15:47:51 +08:00
6b2eed779c [feature](AuditLog) add scanRows scanBytes in auditlog (#25435) 2023-10-25 10:00:35 +08:00
Pxl
2972daaed9 [Bug](status) process error status on es_scroll_parser and compaction_action (#25745)
process error status on es_scroll_parser and compaction_action
2023-10-24 15:51:01 +08:00
a4c9beba85 [fix](move-memtable) fallback if partial update (#25801) 2023-10-24 10:29:59 +08:00
0e0f8090f7 [refactor](text_convert)Use serde to replace text_convert. (#25543)
Remove text_convert and use serde to replace it.
2023-10-24 09:52:43 +08:00
b5ee4a9dbb [enhancement](profilev2) add some fields for profile v2 (#25611)
Add 3 counters for ExecNode:

ExecTime - Total execution time(excluding the execution time of children).
OutputBytes - The total number of bytes output to parent.
BlockCount - The total count of blocks output to parent.
2023-10-23 15:55:40 +08:00
Pxl
2e2d5bcba2 [Improvements](status) catch some error status (#25677)
catch some error status
2023-10-23 10:19:08 +08:00
Pxl
642c149e6a remove datetime_value and move vecdatetime_value to doris namespace (#25695)
remove datetime_value and move vecdatetime_value to doris namespace
2023-10-20 22:08:17 +08:00