doris

Author	SHA1	Message	Date
Jibing-Li	f1a64ea09f	[fix](new-scan)Fix new scanner load job bugs (#12903 ) Fix bugs: 1. Fe need to send file format (e.g. parquet, orc ...) to be while processing load jobs using new scanner. 2. Try to get parquet file column type from SchemaElement.type before getting from Logical type and Converted type.	2022-09-24 17:21:19 +08:00
zhannngchen	3bb920ba54	[Enhancement](load) Refine the load channel flush policy on mem limit (#12716 ) 1. Remove single load channel mem limit, only use load channel mgr mem limit 2. Default load channel mgr mem limit from 50% to 80% 3. load channel mgr add soft mem limit. When the soft limit is exceeded, other threads will not hang, only current thread triggers flush 4. When exceed load channel mgr mem limit, find a load channel with the largest mem usage, continue to find a tablet channel with the largest mem usage, and try to flush 1/3 of the mem usage of this tablet channel.	2022-09-24 10:01:13 +08:00
yiguolei	7b230e41a8	[bugfix](scanner) olap scanner compute is wrong (#12857 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-09-24 09:59:59 +08:00
HappenLee	d65756b504	[Bug](bucket shuffle) fix error bucket shuffle join plan in two same table (#12930 )	2022-09-24 09:59:23 +08:00
Xinyi Zou	34d6d36ff5	fix transfer to tracker (#12932 ) ~MemTrackerLimiter() repeated consumption of _untracked_mem, resulting in inaccurate process mem tracker.	2022-09-24 09:01:05 +08:00
jiafeng.zhang	943814a86f	build extension docs failed fix (#12915 ) build extension docs fix	2022-09-23 21:58:02 +08:00
Jeffrey	1cb43b7f38	[fix](frontend) fix peerDependencies error (#12373 ) ```npm install``` problem with peer dependencies in the latest version of npm (v7+) Use ```npm install --legacy-peer-deps``` to fix it. Reference: https://blog.npmjs.org/post/626173315965468672/npm-v7-series-beta-release-and-semver-major	2022-09-23 21:54:52 +08:00
Yongqiang YANG	9dc35ab534	[fix](streamload) set coord for streamLoad (#12744 ) When a stream load is canceled, status is reported to coord.	2022-09-23 20:23:19 +08:00
jakevin	7f5970d62f	[fix](Nereids): add stats in plan. (#12790 ) * [improve](Nereids): add stats for bestPlan and correct fix selectivity	2022-09-23 19:26:49 +08:00
Ashin Gau	5bfdfac387	[feature-wip](parquet-reader) add parquet reader profile (#12797 ) Add profile for parquet reader. New counters: - ParquetFilteredGroups: Filtered row groups by `RowGroup` min-max statistics - ParquetReadGroups: The number of row groups to read - ParquetFilteredRowsByGroup: The number of filtered rows by `RowGroup` min-max statistics - ParquetFilteredRowsByPage: The number of filtered rows by page min-max statistics - ParquetFilteredBytes: The filtered bytes by `RowGroup` min-max statistics - ParquetReadBytes: The total bytes in `ParquetReadGroups`, may be further filtered If a page is skipped as a whole ## Result ``` ┌──────────────────────────────────────────────────────┐ │[0: VFILE_SCAN_NODE] │ │(Active: 1s29ms, non-child: 96.42) │ │ - Counters: │ │ - BytesRead: 0.00 │ │ - FileReadCalls: 1.826K (1826) │ │ - FileReadTime: 510.627ms │ │ - FileRemoteReadBytes: 65.23 MB │ │ - FileRemoteReadCalls: 1.146K (1146) │ │ - FileRemoteReadRate: 128.29331970214844 MB/sec │ │ - FileRemoteReadTime: 508.469ms │ │ - NumDiskAccess: 0 │ │ - NumScanners: 1 │ │ - ParquetFilteredBytes: 0.00 │ │ - ParquetFilteredGroups: 0 │ │ - ParquetFilteredRowsByGroup: 0 │ │ - ParquetFilteredRowsByPage: 6.600003M (6600003)│ │ - ParquetReadBytes: 2.13 GB │ │ - ParquetReadGroups: 20 │ │ - PeakMemoryUsage: 0.00 │ │ - PredicateFilteredRows: 3.399797M (3399797) │ │ - PredicateFilteredTime: 133.302ms │ │ - RowsRead: 3.399997M (3399997) │ │ - RowsReturned: 200 │ │ - RowsReturnedRate: 194 │ │ - TotalRawReadTime(*): 726.566ms │ │ - TotalReadThroughput: 0.0 /sec │ │ - WaitScannerTime: 1s27ms │ └──────────────────────────────────────────────────────┘ ```	2022-09-23 18:42:14 +08:00
HappenLee	f7e3ca29b5	[Opt](Vectorized) Support push down no grouping agg (#12803 ) Support push down no grouping agg	2022-09-23 18:29:54 +08:00
Yongqiang YANG	a7d42b5d81	[fix](streamload&sink) release and allocate memory in the same tracker (#12820 ) 1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker. 2. We can assume brpc allocate memory in doris thread. Above problems leads to wrong result of memtracker.	2022-09-23 17:51:44 +08:00
morrySnow	bd12a49baf	[feature](Nereids) enable bucket shuffle join on fragment without scan node (#12891 ) In the past, with legacy planner, we could only do bucket shuffle join on the join node belonging to the fragment with at least one scan node. But, bucket shuffle join should do on each join node that left child's data distribution satisfy join's demand. In nereids, we have data distribution info on each node. So we could enable bucket shuffle join on fragment without scan node.	2022-09-23 15:01:50 +08:00
morrySnow	c100d24116	[enhancement](Nereids) remove unnecessary ExchangeNode under AssertNumRowsNode (#12841 ) current, we always add exchange under AssertNumRowsNode. Nevertheless, if its child node's partition is unpartitioned, no need to add exchange at all.	2022-09-23 14:50:27 +08:00
ElvinWei	892e53a15b	[fix](test) fix a test failure problem after merging (#12902 )	2022-09-23 14:22:29 +08:00
ElvinWei	e28e30fe71	[Improvement](statistics) collect statistics in parallel and add test cases (#12839 ) This PR mainly improves some functions of the statistics module(#6370)： 1. when collecting partition statistics, filter empty partitions in advance and do not generate statistical tasks. 2. the old statistical update method may have problems when updating statistics in parallel, which has been solved. 3. optimize internal-query. 4. add test cases related to statistics. 5. modify some comments as prompted by CheckStyle.	2022-09-23 11:59:53 +08:00
morrySnow	bb36490d95	[test](Nereids) add TPC-H Q2 as regression test case (#12840 )	2022-09-23 11:00:31 +08:00
Adonis Ling	b7eea72d1d	[feature-wip](MTMV) Support showing and dropping materialized view for multiple tables (#12762 ) Use cases: mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.05 sec) mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.01 sec) mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk; Query OK, 0 rows affected (0.02 sec) mysql> SHOW TABLES; +---------------+ \| Tables_in_dev \| +---------------+ \| mv \| \| t1 \| \| t2 \| +---------------+ 3 rows in set (0.00 sec) mysql> SHOW CREATE TABLE mv; +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Materialized View \| Create Materialized View \| +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| mv \| CREATE MATERIALIZED VIEW `mv` BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND KEY(`mv_pk`) DISTRIBUTED BY HASH(`mv_pk`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ) AS SELECT `t1`.`pk` AS `mv_pk` FROM `default_cluster:dev`.`t1` , `default_cluster:dev`.`t2` WHERE `t1`.`pk` = `t2`.`pk`; \| +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) mysql> DROP MATERIALIZED VIEW mv; Query OK, 0 rows affected (0.01 sec)	2022-09-23 10:36:40 +08:00
yuanyuan8983	d77ea64ae4	[typo](docs) Changing the Jump Address of SparkLoad in BrokerLoad (#12731 )	2022-09-23 09:15:17 +08:00
zhangstar333	617820b1f5	[Refactor](parquet) refactor parquet write to uniform and consistent logic (#12730 )	2022-09-23 09:12:34 +08:00
Liqf	0203b36cc4	[regressiontest](test_with)add with_case test (#12814 )	2022-09-23 09:10:33 +08:00
Stalary	84dd3edd0d	[Bug](view) Show create view support comment #12838	2022-09-23 09:09:44 +08:00
Zhengguo Yang	8fcd8ed8b3	[chore](build) add option to disable -frecord-gcc-switches (#12846 )	2022-09-22 15:38:14 +08:00
ElvinWei	340784e294	[feature-wip](statistics) add statistics module related syntax (#12766 ) This pull request includes some implementations of the statistics(#6370), it adds statistics module related syntax. The current syntax for collecting statistics will not collect statistics (It will collect statistics until test is stable). - `ANALYZE` syntax(collect statistics) ```SQL ANALYZE [[ db_name.tb_name ] [( column_name [, ...] )], ...] [PARTITIONS(...)] [ PROPERTIES(...) ] ``` > db_name.tb_name: collect table and column statistics from tb_name. > column_name: collect column statistics from column_name. > properties: properties of statistics jobs. example： ```SQL ANALYZE; -- collect statistics for all tables in the current database ANALYZE table1(pv, citycode); -- collect pv and citycode statistics for table1 ANALYZE test.table2 PARTITIONS(partition1); -- collect statistics for partition1 of table2 ``` - `SHOW ANALYZE` syntax(show statistics job info) ```SQL SHOW ANALYZE [TABLE \| ID] [ WHERE [STATE = ["PENDING"\|"SCHEDULING"\|"RUNNING"\|"FINISHED"\|"FAILED"\|"CANCELLED"]] ] [ORDER BY ...] [LIMIT limit][OFFSET offset]; ``` - `SHOW TABLE STATS`syntax(show table statistics) ```SQL SHOW TABLE STATS [ db_name.tb_name ] ``` - `SHOW COLUMN STATS` syntax(show column statistics) ```SQL SHOW COLUMN STATS [ db_name.tb_name ] ```	2022-09-22 11:15:00 +08:00
ElvinWei	3fa820ec50	[feature-wip](statistics) collect statistics by sql task (#12765 ) This pull request includes some implementations of the statistics(#6370), it Implements sql-task to collect statistics based on internal-query(#9983). After the ANALYZE statement is parsed, statistical tasks will be generated. The statistical tasks includes mata-task(get statistics from metadata) and sql-task(get statistics by sql query). For sql-task, it will get statistics such as the row_count, the number of null values, and the maximum value by SQL query. For statistical tasks, also include sampling sql-task, which will be implemented in the next pr.	2022-09-22 11:13:35 +08:00
xueweizhang	70ab9cb43e	[feature](http) refactor version info and add new http api for get version info (#12513 ) Refactor version info and add new http api for get version info	2022-09-22 10:53:04 +08:00
Yongqiang YANG	77e423042c	(brpc) donot use pooled brpc (#12754 ) It seems that pooled brpc does not release port timely.	2022-09-22 10:00:26 +08:00
starocean999	57b3c03371	[enhancement](like)pass data to like function in block not in row (#12825 ) The like predicate process data in block perform better than in row. Currently, only not null column is optimized, nullable column will be handled later. SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%'; before: ~680ms after: ~570ms	2022-09-22 09:59:30 +08:00
yiguolei	32551a7263	[bugfix](predicate column) data maybe wrong if not a single page (#12796 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-09-22 09:55:31 +08:00
Lei Zhang	6cd4c9ecb5	[bugfix](fe) Fix test_materialized_view_hll case npt. (#12829 ) when enable light schema change, run test_materialized_view_hll case throw NullPointerException. java.lang.NullPointerException: null at org.apache.doris.analysis.SlotDescriptor.setColumn(SlotDescriptor.java:153) at org.apache.doris.planner.OlapScanNode.updateSlotUniqueId(OlapScanNode.java:399)	2022-09-22 09:50:53 +08:00
Jibing-Li	4b95b4e41d	[feature-wip](file-scanner)Get column type from parquet schema (#12833 ) Get schema from parquet reader. The new VFileScanner need to get file schema (column name to type map) from parquet file while processing load job, this pr is to set the type information for parquet columns.	2022-09-22 09:35:37 +08:00
slothever	1ca6d559e4	[feature-wip](parquet-reader) refactor some arguments for parquet reader (#12771 ) refactor some arguments for parquet reader 1. Add new parquet context to wrap reader arguments 2. Reduced some arguments for function call Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-09-22 09:34:01 +08:00
jakevin	cbadbecd9a	[fix](Nereids) anti join could not be reorder (#12827 )	2022-09-22 09:19:12 +08:00
wxy	1ae7c4e307	[fix](LOAD statement): fix bug for `toSql` func of LoadStmt. (#12648 )	2022-09-22 09:07:46 +08:00
morrySnow	c58e4ca03b	[enhancement](Nereids) turn on all reorder rule that needed by zig-zag tree (#12767 )	2022-09-22 02:35:31 +08:00
jakevin	0dee640a3e	[feature](Nereids): eliminate filter true and add checker. (#12821 )	2022-09-22 02:31:11 +08:00
Gabriel	e21ffac419	[Improvement](dateformat) Improve efficiency for function `date_format` (#12811 )	2022-09-21 22:38:16 +08:00
yuanyuan8983	35f07ede26	[typo](docs)Changing the Jump Address of BrokerLoad in SparkLoad (#12735 ) * [typo](docs)Changing the Jump Address of BrokerLoad in SparkLoad Changing the Jump Address of BrokerLoad in SparkLoad * Update spark-load-manual.md	2022-09-21 22:03:28 +08:00
zy-kkk	b09cc95701	[typo](docs) fix get-starting doc err (#12777 )	2022-09-21 21:58:41 +08:00
morrySnow	1c98c3a8f0	[fix](Nereids) GroupExpression never be optimize if it run with exploration job (#12815 ) Exploration job only do explore, but never call optimize. So the GroupExpression explored by exploration only job will never do implementation.	2022-09-21 21:03:37 +08:00
Jibing-Li	fbdebe2424	[feature-wip](new-scan)Add load counter for VFileScanner (#12812 ) The new scanner (VFileScanner) need a counter to record two values in load job. 1. The number of rows unselected by pre-filter, and 2. The number of rows filtered by unmatched schema or other error. This pr is to implement the counter.	2022-09-21 20:59:13 +08:00
Xinyi Zou	c55d08fa2f	[fix](memtracker) Refactor load channel mem tracker to improve accuracy (#12791 ) The mem hook record tracker cannot guarantee that the final consumption is 0, nor can it guarantee that the memory alloc and free are recorded in a one-to-one correspondence. In the life cycle of a memtable from insert to flush, the memory free of hook is more than that of alloc, resulting in tracker consumption less than 0. In order to avoid the cumulative error of the upper load channel tracker, the memtable tracker consumption is reset to zero on destructor.	2022-09-21 20:16:19 +08:00
Xinyi Zou	b41eaa5ac0	[fix](memtracker) Introduce orphan mem tracker to verify memory tracking accuracy (#12794 ) The mem hook consumes the orphan tracker by default. If the thread does not attach other trackers, by default all consumption will be passed to the process tracker through the orphan tracker. In real time, consumption of all other trackers + orphan tracker consumption = process tracker consumption. Ideally, all threads are expected to attach to the specified tracker, so that "all memory has its own ownership", and the consumption of the orphan mem tracker is close to 0, but greater than 0.	2022-09-21 15:47:10 +08:00
Jerry Hu	8f4bb0f804	[improvement](agg) iterate aggregation data in memory written order (#12704 ) Following the iteration order of the hash table will result in out-of-order access to aggregate states, which is very inefficient. Traversing aggregate states in memory write order can significantly improve memory read efficiency. Test hash table items count: 3.35M Before this optimization: insert keys into column takes 500ms With this optimization only takes 80ms	2022-09-21 14:58:50 +08:00
zhannngchen	27f7ae258d	[Enhancement](load) optimize flush policy to avoid small segments #12706 In current policy, if mem-limit exceeded, load channel will pick tablets that consume most memory, but mem_consumption contains memory in flush, if some delta writer flushing a full memtable(default 200MB), the current memtable might be very small, we should avoid flush such memtable, which can generate a very small segment.	2022-09-21 14:33:05 +08:00
Jibing-Li	ec2b3bf220	[feature-wip](new-scan)Refactor VFileScanner, support broker load, remove unused functions in VScanner base class. (#12793 ) Refactor of scanners. Support broker load. This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner. Work still in progress.	2022-09-21 12:49:56 +08:00
morrySnow	7b46e2400f	[enhancement](Nereids) add all necessary PhysicalDistribute on Join's child to ensure get correct cost (#12483 ) In an earlier PR #11976 , we add shuffle join and bucket shuffle support. But if join's right child's distribution spec satisfied join's require, we do not add distribute on right child. Instead of, do it in plan translator. It is hard to calculate accurate cost in this way, since we some distribute cost do not calculated. In this PR, we introduce a new shuffle type BUCKET, and change the way of add enforce to ensure all necessary distribute will be added in cost and enforcer job.	2022-09-21 12:18:37 +08:00
ChPi	a7993755ae	[typo](docs)rename doc file name (#12783 ) Co-authored-by: chenjie <chenjie@cecdat.com>	2022-09-21 11:25:38 +08:00
jakevin	52a0da1f5c	[improve](Nereids): add check validator during post. (#12702 )	2022-09-21 11:25:04 +08:00
luozenglin	b6e20db997	[fix](outfile) select OBJECT and HLL columns into outfile as null. (#12734 )	2022-09-21 11:24:31 +08:00

1 2 3 4 5 ...

6444 Commits