doris

Author	SHA1	Message	Date
qiye	7e9ffad933	[fix](ES catalog)Doris cannot parse ES date field without time zone (#24864 ) 1. Add support for Doris to parse ES date field without time zone info. eg: `2023-04-17T23:01:18.151`, this time will be treated as UTC time, since ES assumes that the time zone for time fields without time zones is UTC. 2. Change local time zone convertion from system local time zone to session variable time zone.	2023-10-08 19:28:08 +08:00
Mryange	0631ed61b0	[feature](profilev2) Preliminary support for profilev2. (#24881 ) You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1. set profile_level = 1; such as sql select count() from customer join item on c_customer_sk = i_item_sk profile Simple profile PLAN FRAGMENT 0 OUTPUT EXPRS: count() PARTITION: UNPARTITIONED VRESULT SINK MYSQL_PROTOCAL 7:VAGGREGATE (merge finalize) \| output: count(partial_count())[#44] \| group by: \| cardinality=1 \| TotalTime: avg 725.608us, max 725.608us, min 725.608us \| RowsReturned: 1 \| 6:VEXCHANGE offset: 0 TotalTime: avg 52.411us, max 52.411us, min 52.411us RowsReturned: 8 PLAN FRAGMENT 1 PARTITION: HASH_PARTITIONED: c_customer_sk STREAM DATA SINK EXCHANGE ID: 06 UNPARTITIONED TotalTime: avg 106.263us, max 118.38us, min 81.403us BlocksSent: 8 5:VAGGREGATE (update serialize) \| output: partial_count()[#43] \| group by: \| cardinality=1 \| TotalTime: avg 679.296us, max 739.395us, min 554.904us \| BuildTime: avg 33.198us, max 48.387us, min 28.880us \| ExecTime: avg 27.633us, max 40.278us, min 24.537us \| RowsReturned: 8 \| 4:VHASH JOIN \| join op: INNER JOIN(PARTITIONED)[] \| equal join conjunct: c_customer_sk = i_item_sk \| runtime filters: RF000[bloom] <- i_item_sk(18000/16384/1048576) \| cardinality=17,740 \| vec output tuple id: 3 \| vIntermediate tuple ids: 2 \| hash output slot ids: 22 \| RowsReturned: 18.0K (18000) \| ProbeRows: 18.0K (18000) \| ProbeTime: avg 862.308us, max 1.576ms, min 666.28us \| BuildRows: 18.0K (18000) \| BuildTime: avg 3.8ms, max 3.860ms, min 2.317ms \| \|----1:VEXCHANGE \| offset: 0 \| TotalTime: avg 48.822us, max 67.459us, min 30.380us \| RowsReturned: 18.0K (18000) \| 3:VEXCHANGE offset: 0 TotalTime: avg 33.162us, max 39.480us, min 28.854us RowsReturned: 18.0K (18000) PLAN FRAGMENT 2 PARTITION: HASH_PARTITIONED: c_customer_id STREAM DATA SINK EXCHANGE ID: 03 HASH_PARTITIONED: c_customer_sk TotalTime: avg 753.954us, max 1.210ms, min 499.470us BlocksSent: 64 2:VOlapScanNode TABLE: default_cluster:tpcds.customer(customer), PREAGGREGATION: ON runtime filters: RF000[bloom] -> c_customer_sk partitions=1/1, tablets=12/12, tabletList=1550745,1550747,1550749 ... cardinality=100000, avgRowSize=0.0, numNodes=1 pushAggOp=NONE TotalTime: avg 18.417us, max 41.319us, min 10.189us RowsReturned: 18.0K (18000) --------- Co-authored-by: yiguolei <676222867@qq.com>	2023-10-07 11:16:53 +08:00
bobhan1	642e5cdb69	[Fix](Status) Make `Status` `[[nodiscard]]` and handle returned `Status` correctly (#23395 )	2023-09-29 22:38:52 +08:00
Lijia Liu	864a0f9bcb	[opt](pipeline) Make pipeline fragment context send_report asynchronized (#23142 )	2023-09-28 17:55:53 +08:00
Gabriel	3b4d8b4ac8	[pipelineX](feature) Support schema scan operator (#24850 )	2023-09-25 14:42:25 +08:00
lihangyu	ac55d45f79	[Fix](topn opt) fix heap use after free when shrink in fetch phase (#24774 )	2023-09-22 19:48:05 +08:00
zhiqqqq	09e03247ec	[chore](readability) Better readability of ExecNode.cpp #24733	2023-09-22 08:54:57 +08:00
bobhan1	58ab25ccaa	Revert "[Feature](merge-on-write)Support ignore mode for merge-on-write unique table (#21773 )" (#24731 ) This reverts commit 3ee89aea35726197cb7e94bb4f2c36bc9d50da84.	2023-09-21 21:01:28 +08:00
HappenLee	dc9fa1a4f1	[Refactor](Sink) convert to tablet sink to tablet writer (#24474 )	2023-09-20 14:47:18 +08:00
plat1ko	b9ddcbf729	[feature](merge-cloud) Rewrite code related to IOContext (#24269 )	2023-09-15 19:57:58 +08:00
bobhan1	3ee89aea35	[Feature](merge-on-write)Support ignore mode for merge-on-write unique table (#21773 )	2023-09-14 18:03:51 +08:00
qiye	11afd321cb	[fix](es catalog) fix issue with select and insert from es catalog core (#24318 ) Issue Number: close #24315 The root cause of this issue is that Elasticsearch's long type allows inserting floats and strings. Doris did not handle these cases when doing type conversion. The current strategy is to take the integer before the decimal point if a float or string is found.	2023-09-13 23:07:31 +08:00
daidai	e30c3f3a65	[fix](csv_reader)fix bug that Read garbled files caused be crash. (#24164 ) fix bug that read garbled files caused be crash.	2023-09-13 14:12:55 +08:00
zclllyybb	d3f1388717	[Feature](partitions) Support auto-partition (#24153 ) Co-authored-by: zhangstar333 <2561612514@qq.com>	2023-09-12 15:23:15 +08:00
meiyi	82dc970916	[feature](insert) Support group commit insert (#22829 )	2023-09-08 15:51:03 +08:00
zclllyybb	fdb7a44f57	Revert "[Feature](partitions) Support auto partition" (#24024 ) * Revert "[Feature](partitions) Support auto partition (#23236)" This reverts commit 6c544dd2011d731b8c9c51384c77bcf19c017981. * Update config.h	2023-09-07 17:08:26 +08:00
zclllyybb	6c544dd201	[Feature](partitions) Support auto partition (#23236 ) Co-authored-by: zhangstar333 <2561612514@qq.com>	2023-09-06 16:26:45 +08:00
HappenLee	c74ca15753	[pipeline](sink) Supprt Async Writer Sink of result file sink and memory scratch sink (#23589 )	2023-08-31 22:44:25 +08:00
daidai	e680d42fe7	[feature](information_schema)add metadata_name_ids for quickly get catlogs,db,table and add profiling table in order to Compatible with mysql (#22702 ) add information_schema.metadata_name_idsfor quickly get catlogs,db,table. 1. table struct : ```mysql mysql> desc internal.information_schema.metadata_name_ids; +---------------+--------------+------+-------+---------+-------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +---------------+--------------+------+-------+---------+-------+ \| CATALOG_ID \| BIGINT \| Yes \| false \| NULL \| \| \| CATALOG_NAME \| VARCHAR(512) \| Yes \| false \| NULL \| \| \| DATABASE_ID \| BIGINT \| Yes \| false \| NULL \| \| \| DATABASE_NAME \| VARCHAR(64) \| Yes \| false \| NULL \| \| \| TABLE_ID \| BIGINT \| Yes \| false \| NULL \| \| \| TABLE_NAME \| VARCHAR(64) \| Yes \| false \| NULL \| \| +---------------+--------------+------+-------+---------+-------+ 6 rows in set (0.00 sec) mysql> select * from internal.information_schema.metadata_name_ids where CATALOG_NAME="hive1" limit 1 \G; ************************* 1. row ************************* CATALOG_ID: 113008 CATALOG_NAME: hive1 DATABASE_ID: 113042 DATABASE_NAME: ssb1_parquet TABLE_ID: 114009 TABLE_NAME: dates 1 row in set (0.07 sec) ``` 2. when you create / drop catalog , need not refresh catalog . ```mysql mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************ 1. row ************************* count(): 21301 1 row in set (0.34 sec) mysql> drop catalog hive2; Query OK, 0 rows affected (0.01 sec) mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 10665 1 row in set (0.04 sec) mysql> create catalog hive3 ... mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 21301 1 row in set (0.32 sec) ``` 3. create / drop table , need not refresh catalog . ```mysql mysql> CREATE TABLE IF NOT EXISTS demo.example_tbl ... ; mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 10666 1 row in set (0.04 sec) mysql> drop table demo.example_tbl; Query OK, 0 rows affected (0.01 sec) mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 10665 1 row in set (0.04 sec) ``` 4. you can set query time , prevent queries from taking too long . ``` fe.conf : query_metadata_name_ids_timeout the time used to obtain all tables in one database ``` 5. add information_schema.profiling in order to Compatible with mysql ```mysql mysql> select from information_schema.profiling; Empty set (0.07 sec) mysql> set profiling=1; Query OK, 0 rows affected (0.01 sec) ```	2023-08-31 21:22:26 +08:00
TengJianPing	62c075bf7e	[improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672 )	2023-08-31 14:44:17 +08:00
zy-kkk	030df6db35	[fix](odbc) fix odbc insert string data to sqlserve (#23364 )	2023-08-29 21:47:50 +08:00
Pxl	3049533e63	[Bug](materialized-view) fix core dump on create materialized view when diffrent mv column have same reference base column (#23425 ) * Remove redundant predicates on scan node update fix core dump on create materialized view when diffrent mv column have same reference base column Revert "update" This reverts commit d9ef8dca123b281dc8f1c936ae5130267dff2964. Revert "Remove redundant predicates on scan node" This reverts commit f24931758163f59bfc47ee10509634ca97358676. * update * fix * update * update	2023-08-28 14:40:51 +08:00
Mingyu Chen	40be6a0b05	[fix](hive) do not split compress data file and support lz4/snappy block codec (#23245 ) 1. do not split compress data file Some data file in hive is compressed with gzip, deflate, etc. These kinds of file can not be splitted. 2. Support lz4 block codec for hive scan node, use lz4 block codec instead of lz4 frame codec 4. Support snappy block codec For hadoop snappy 5. Optimize the `count()` query of csv file For query like `select count() from tbl`, only need to split the line, no need to split the column. Need to pick to branch-2.0 after this PR: #22304	2023-08-26 12:59:05 +08:00
Kaijie Chen	2b6d876280	[feature](move-memtable)[6/7] add options to enable memtable on sink node (#23470 ) Co-authored-by: Siyang Tang <82279870+TangSiyang2001@users.noreply.github.com>	2023-08-25 22:32:22 +08:00
Gabriel	9d1f2cd8e0	[Improvement](pipeline) Terminate early for short-circuit join (#23378 )	2023-08-23 19:40:17 +08:00
lihangyu	527293aa41	[refactor](dynamic table) remove dynamic table (#23298 )	2023-08-23 14:15:14 +08:00
Pxl	8ed4045df9	[Chore](primitive-type) remove VecPrimitiveTypeTraits (#22842 )	2023-08-23 08:37:40 +08:00
HappenLee	5c2fae7ce5	[pipeline](exec) Refactor the table sink code in remove unless code (#23223 ) Refactor the table sink code in remove unless code	2023-08-22 20:42:14 +08:00
Gabriel	12075f9853	[pipelineX](projection) Support projection and blocking agg (#23256 )	2023-08-21 22:23:02 +08:00
HappenLee	3d4ec1ac88	[pipeline](exec) support async writer in jdbc sink in pipeline query engine (#23144 ) support async writer in jdbc sink in pipeline query engine	2023-08-18 17:07:57 +08:00
zy-kkk	61d2f37bdc	[fix](jdbc catalog) fix string type insert into odbc table (#22961 )	2023-08-15 20:09:38 +08:00
HappenLee	9b2323b7fd	[Pipeline](exec) support async writer in pipelien query engine (#22901 )	2023-08-15 17:32:53 +08:00
Siyang Tang	b49dc8042d	[feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539 ) ## Proposed changes Refactor thoughts: close #22383 Descriptions about `enclose` and `escape`: #22385 ## Further comments 2023-08-09: It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic. Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph. Trimming escape will be enable after fix: #22411 is merged Cases should be discussed: 1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large? 2. What if an infinite line occurs in the case? Essentially, `1.` is equivalent to this. Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.	2023-08-15 09:23:53 +08:00
HappenLee	a1223218f3	[pipeline](exec) Support shared scan in jdbc and odbc scan node (#22826 ) Support shared scan in jdbc and odbc scan node to improve exec performance	2023-08-10 18:34:45 +08:00
Pxl	56392e21ae	[Bug](decimalv3) fix decimalv3 keyrange set wrong number #22818	2023-08-10 18:15:40 +08:00
daidai	f1db6bd8c1	[feature](hive)append support for struct and map column type on textfile format of hive table (#22347 ) 1. append support for struct and map column type on textfile format of hive table. 2. optimizer code that array column type. ```mysql +------+------------------------------------+ \| id \| perf \| +------+------------------------------------+ \| 1 \| {"key1":"value1", "key2":"value2"} \| \| 1 \| {"key1":"value1", "key2":"value2"} \| \| 2 \| {"name":"John", "age":"30"} \| +------+------------------------------------+ ``` ```mysql +---------+------------------+ \| column1 \| column2 \| +---------+------------------+ \| 1 \| {10, "data1", 1} \| \| 2 \| {20, "data2", 0} \| \| 3 \| {30, "data3", 1} \| +---------+------------------+ ``` Summarizes support for complex types(support assign delimiter) : 1. array< primitive_type > and array< array< ... > > 2. map< primitive_type , primitive_type > 3. Struct< primitive_type , primitive_type ... >	2023-08-10 13:47:58 +08:00
Pxl	591aee528d	[Bug](exchange) change BlockSerializer from unique_ptr to object (#22653 ) change BlockSerializer from unique_ptr to object	2023-08-07 14:47:21 +08:00
Xinyi Zou	96f42ca20a	[fix](memory) Independent count exec node memory profile (#22598 ) Independent count exec node memory profile, after #22582	2023-08-06 10:56:31 +08:00
Tiewei Fang	bad8237850	[BugFix](Es Catalog) fix bug that es catalog will return error when query partial columns (#22423 ) Bug： When the value of some ES column is empty, querying these value in doc_values mode will receive an error. Reson： In doc values mode, these values are empty, We need to determine if the array is empty	2023-08-04 11:28:30 +08:00
amory	86e6f5d039	[FIX](decimal)fix decimal precision (#22364 ) Now we make wrong for decimal parse from string if given string precision is bigger than defined decimal precision, we will return a overflow error, but only digit part is bigger than typed digit length , we should return overflow error when we traverse given string to decimal value	2023-08-03 21:13:58 +08:00
Siyang Tang	e991f607d5	[fix](string-column) fix unescape length error (#22411 )	2023-08-02 12:18:05 +08:00
qiye	b8399148ef	[fix](DOE) es catalog not working with pipeline,datetimev2, array and esquery (#22046 )	2023-08-01 21:45:16 +08:00
zhangstar333	1c6246f7ee	[improve](agg) support distinct agg node (#22169 ) select c_name from customer union select c_name from customer this sql used agg node to get distinct row of c_name, so it's no need to wait for inserted all data to hash map, could output the data which it's inserted into hash map successed.	2023-07-28 13:54:10 +08:00
Tiewei Fang	7d5d416b25	[Fix](EsCatalog) fix be core when query the table of Es catalog with null fields (#22279 )	2023-07-28 09:53:55 +08:00
Qi Chen	8caa5a9ba4	[Fix](mutli-catalog) Fix null partitions error in iceberg tables. (#22185 ) ### Issue when partition has null partitions, it throws error `Failed to fill partition column: t_int=null` ### Resolution - Fix the following null partitions error in iceberg tables by replacing null partition to '\N'. - Add regression test for hive null partition.	2023-07-27 23:57:35 +08:00
Gabriel	23e7423748	[pipeline](refactor) refactor pipeline task schedule logics (#22028 )	2023-07-25 17:18:26 +08:00
赵立伟	d0219062ef	[refactor](be) use std::move to improve performance of push_back #22056	2023-07-24 08:51:28 +08:00
Mryange	6875ef4b8b	[refactor](mem_reuse) refactor mem_reuse in MutableBlock (#21564 )	2023-07-20 22:53:19 +08:00
Lijia Liu	d86c67863d	Remove unused code (#21735 )	2023-07-12 14:48:13 +08:00
daidai	ff42cd9b49	[feature](hive)add read of the hive table textfile format array type (#21514 )	2023-07-11 22:37:48 +08:00

1 2 3 4 5 ...

930 Commits