doris

Author	SHA1	Message	Date
mch_ucchi	1ae9454771	[enhancement](Nereids) planner performance speed up (#12858 ) optimize planner by: 1. reduce duplicated calculation on equals, getOutput, computeOutput eq. 2. getOnClauseUsedSlots: the two side of equalTo is centainly slot, so not need to use List.	2022-09-29 16:01:10 +08:00
Gabriel	34b14a71c8	[Improvement](string) Optimize scanning for string #12911 ~0.2X performance boost for queries containing string predicates	2022-09-29 15:11:16 +08:00
carlvinhust2012	fef1062835	[optimization](array-type) optimize the help docs of array type (#13001 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-09-29 14:36:32 +08:00
DingGeGe	fae7296336	[Enhancement](fe-core) make UT-SelectRollupTest more stable (#13030 )	2022-09-29 14:25:01 +08:00
Gabriel	c2fae109c3	[Improvement](outfile) Support output null in parquet writer (#12970 )	2022-09-29 13:36:30 +08:00
Wilson-β	29fc167548	[Bug](Datax)Fix bug that the dataxwriter will drop column when convert map to json (#13042 ) * fix bug that when value is null，toJSONString will drop this key value.	2022-09-29 11:37:10 +08:00
Yongqiang YANG	6b6d548df9	[enhancement](test) add more p0 cases (#12285 )	2022-09-29 10:45:17 +08:00
starocean999	bc2966ed80	[fix](like)the dictionary column should call get_shrink_value to get correct string value (#13032 ) * [fix](like)the dictionary column should call get_shrink_value to get correct string value	2022-09-29 09:09:36 +08:00
HappenLee	36bf8ad3eb	[Opt](Vec) Support const column check nullable and remove nullable (#13020 )	2022-09-29 08:39:19 +08:00
Adonis Ling	a853dd3c61	[Bug](aarch64) Fix the mmap errors which make BE down during starting up (#13031 )	2022-09-29 08:36:58 +08:00
morrySnow	d53205076e	[feature](Nereids) implicit cast StringLiteral to another side type of BinaryOperator if available (#13038 ) for expression 5 > '1'. before this PR, we normalize it to '5' > '1'. After this PR, we normalize it to 5 > 1 to compatible with legacy planner.	2022-09-28 21:34:25 +08:00
slothever	820ec435ce	[feature-wip](parquet-reader) refactor parquet_predicate (#12896 ) This change serves the following purposes: 1. use ScanPredicate instead of TCondition for external table, it can reuse old code branch. 2. simplify and delete some useless old code 3. use ColumnValueRange to save predicate	2022-09-28 21:27:13 +08:00
minghong	d739aa7c53	[enhancement](Nereids) optimization for star-schema join reorder (#12817 ) the basic idea of star-schema support is: 1. fact_table JOIN dimension_table, if dimension table are filtered, the result can be regarded as applying a filter on fact table. 2. fact_table JOIN dimension_table, if the dimension table is not filtered, the number of join result tuple equals to the number of fact tuples. 3. dimension table JOIN fact table, the number of join result tuple is that of fact table or 2 times of dimension table. If star-schema support is enabled: 1. nereids regard duplicate key(unique key/aggregation key) as primary key 2. nereids try to regard one join key as primary key and another join key as foreign key. 3. if nereids found that no join key is primary key, nereids fall back to normal estimation.	2022-09-28 21:09:55 +08:00
morrySnow	7019166469	[enhancement](Nereids) let BinaryArithmetic's dataType and nullable match with BE (#13015 ) Do type promotion for BinaryArithmetic: - Add - Subtract - Multiply Do always nullable for: - Mod	2022-09-28 20:02:27 +08:00
Mingyu Chen	cd549d8a8f	[improvement](scan) remove concurrency limit if scan has predicate (#13021 ) If a scan node has predicate, we can not limit the concurrency of scanner. Because we don't know how much data need to be scan. If we limit the concurrency, this will cause query to be very slow. For exmple: select * from tbl limit 1, the concurrency will be 1; select * from tbl where k1=1 limit 1, the concurrency will not limit.	2022-09-28 17:07:07 +08:00
luozenglin	28ce1878ca	[fix](planner) fix push down no grouping agg (#12983 ) The value column of the agg does not support zone_map index, fixing the value column pushing down to zone map causes null pointer.	2022-09-28 17:01:01 +08:00
Yongqiang YANG	00c672340d	[improvement](memory) set TCMALLOC_HEAP_LIMIT_MB to control memory consumption of tcmalloc (#12981 )	2022-09-28 15:44:18 +08:00
Gabriel	819aecb26c	[DOC](datev2) Add documents for DateV2 (#12976 )	2022-09-28 14:36:26 +08:00
carlvinhust2012	1b1f13ec84	[optimization](array-type) optimize error prompts when sql parser report error (#12999 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-09-28 14:35:41 +08:00
Xinyi Zou	16bb5cb430	[enhancement](memory) Jemalloc performance optimization and compatibility with MemTracker #12496	2022-09-28 12:04:29 +08:00
Jerry Hu	e627d285e0	[chore](regression-test) add default group(p0) for regression-test (#12977 )	2022-09-28 11:47:19 +08:00
Yongqiang YANG	a79d2e592b	[improvement](test) cache data from s3 to cacheDataPath (#13018 ) Now, regression data is stored in sf1DataPath, which is local or remote. For performance reason, we use local dir for community pipeline, however, we need prepare data for every machine, this process is easy mistake. So we cache data from s3 in local transparently, thus, we just need to config one data source.	2022-09-28 10:43:55 +08:00
morrySnow	eef9367705	[feature](Nereids) use one stage aggregation if available (#12849 ) Currently, we always disassemble aggregation into two stage: local and global. However, in some case, one stage aggregation is enough, there are two advantage of one stage aggregation. 1. avoid unnecessary exchange. 2. have a chance to do colocate join on the top of aggregation. This PR move AggregateDisassemble rule from rewrite stage to optimization stage. And choose one stage or two stage aggregation according to cost.	2022-09-28 10:38:03 +08:00
Gabriel	1ba9e4b568	[Improvement](sort) Reuse memory in sort node (#12921 )	2022-09-28 09:44:35 +08:00
starocean999	339877930d	[fix](join)report 'natural join is not supported' instead of getting wrong result (#13008 ) * [fix](join)report 'natural join is not supported' instead of getting wrong result * add regression test	2022-09-28 09:08:56 +08:00
Pxl	ee3dd423b9	[Bug](function) core dump on substr #13007	2022-09-28 08:54:49 +08:00
Adonis Ling	2dafbda9de	[chore](third-party) Fix compilation errors reported by clang-15 (#13016 ) Add some compile flags to eliminate compilation errors reported by clang-15.	2022-09-27 23:46:43 +08:00
zhannngchen	d8ec53c83f	[enhancement](load) avoid duplicate reduce on same TabletsChannel #12975 In the policy changed by PR #12716, when reaching the hard limit, there might be multiple threads can pick same LoadChannel and call reduce_mem_usage on same TabletsChannel. Although there's a lock and condition variable can prevent multiple threads to reduce mem usage concurrently, but they still can do same reduce-work on that channel multiple times one by one, even it's just reduced.	2022-09-27 22:03:08 +08:00
Mingyu Chen	d80b7b9689	[feature-wip](new-scan) support more load situation (#12953 )	2022-09-27 21:48:32 +08:00
yongjinhou	16f5204cab	fix_md5sum_and_sm3sum (#13009 )	2022-09-27 21:41:14 +08:00
jakevin	9a38a9677a	[feature](Nereids) Eliminate outer join (#12985 ) eliminate outer join if we have non-null predicate on slots of inner side of outer join. TODO: 1. use constant viariable to handle it (we can handle more case like nullsafeEqual ......) 2. using constant folding to handle of null values, is more general and does not require writing long logical judgments 3. handle null safe equals(<=>)	2022-09-27 21:09:25 +08:00
Shuo Wang	57570f2090	[feature](Nereids) Set pre-aggregation status for OLAP table scan (#12785 ) This is the second step for #12303. The previous PR #12464 added the framework to select the rollup index for OLAP table, but pre-aggregation is turned on by default. This PR set pre-aggregation for scan OLAP table. The main steps are as below: 1. Select rollup index when aggregate is present, this is handled by `SelectRollupWithAggregate` rule. Expressions in aggregate functions, grouping expressions, and pushdown predicates would be used to check whether the pre-aggregation should be turned off. 2. When selecting from olap scan table without aggregate plan, it would be handled by `SelectRollupWithoutAggregate`.	2022-09-27 19:12:15 +08:00
Pxl	9607f60845	[Feature](serialize) move block_data_version to fe heart beat (#12667 ) Move block_data_version from be config to fe heart beat	2022-09-27 18:25:54 +08:00
ElvinWei	ba5705a589	[feature-wip](statistics) step6: statistics is available (#8864 ) This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370). Execute these sql such as "`ANALYZE`, `SHOW ANALYZE`, `SHOW TABLE/COLUMN STATS...`" to collect statistics information and query them. The following are the changes in this PR: 1. Added the necessary test cases for statistics. 2. Statistics optimization. To ensure the validity of statistics, statistics can only be updated after the statistics task is completed or manually updated by SQL, and the collected statistics should not be changed in other ways. The reason is to ensure that the statistics are not distorted. 3. Some code or comments have been adjusted to fix checkStyle problem. 4. Remove some code that was previously added because statistics were not available. 5. Add a configuration, which indicates whether to enable the statistics. The current statistics may not be stable, and it is not enabled by default (`enable_cbo_statistics=false`). Currently, it is mainly used for CBO test. See this PR(#12766) syntax, some simple examples of statistics: ```SQL -- enable statistics SET enable_cbo_statistics=true; -- collect statistics for all tables in the current database ANALYZE; -- collect all column statistics for table1 ANALYZE test.table1; -- collect statistics for siteid of table1 ANALYZE test.table1(siteid); ANALYZE test.table1(pv, citycode); -- collect statistics for partition of table1 ANALYZE test.table1 PARTITION(p202208); ANALYZE test.table1 PARTITIONS(p202208, p202209); -- display table statistics SHOW TABLE STATS test.table1; -- display partition statistics of table1 SHOW TABLE STATS test.table1 PARTITION(p202208); -- display column statistics of table1 SHOW COLUMN STATS test.table1; -- display column statistics of partition SHOW COLUMN STATS test.table1 PARTITION(p202208); -- display the details of the statistics jobs SHOW ANALYZE; SHOW ANALYZE idxxxx; ```	2022-09-27 17:24:14 +08:00
Yongqiang YANG	c21ecdd867	[enhancement](test) add tpcds_sf1000 to p2 (#12695 )	2022-09-27 17:12:52 +08:00
Yongqiang YANG	eba71cf5da	[enhancement](test) add tpch_sf10 cases to p2 (#12698 )	2022-09-27 17:12:37 +08:00
Pxl	64988cb3d4	[Enhancement](optimize) optimize for insert_indices_from (#12807 )	2022-09-27 15:49:15 +08:00
zy-kkk	cbdef66757	[test](join)add join case5 #12854	2022-09-27 15:48:36 +08:00
zy-kkk	3dfcfc69ee	[regression-test](join)add join case5 #12854	2022-09-27 15:47:36 +08:00
Liqf	907494760d	[typo](docs)Add bitmap_count doc And Adjustment function list (#12978 )	2022-09-27 14:21:37 +08:00
Adonis Ling	722106805f	[chore](build) Fix compilation errors reported by clang-15 (#13000 ) Add a compile flag -Wno-unused-but-set-variable to build libGeo.a .	2022-09-27 14:04:44 +08:00
TengJianPing	3f99dd5c4b	[function](bitmap) support bitmap_hash64 (#12992 )	2022-09-27 12:16:02 +08:00
starocean999	a6db5e63df	[fix](projection)sort node's unmaterialized slots should be removed from resolvedTupleExprs (#12963 )	2022-09-27 11:46:44 +08:00
Adonis Ling	429ac929fb	[chore](build) Support building from source on ubuntu-22.04 (aarch64) (#12813 ) Support building from source on ubuntu-22.04	2022-09-27 10:29:13 +08:00
zhannngchen	1cc15ccfa3	[feature-wip](unique-key-merge-on-write) fix thread safe issue in BetaRowsetWriter (#12875 )	2022-09-27 10:28:18 +08:00
starocean999	c4341d3d43	[fix](like)prevent null pointer by unimplemented like_vec functions (#12910 ) * [fix](like)prevent null pointer by unimplemented like_vec functions * fix pushed like predicate on dict encoded column bug	2022-09-27 10:02:10 +08:00
pengxiangyu	e040dccbec	[fix](remote)fix bug for delete s3 dir and list s3 dir (#12918 ) * fix bug for delete s3 dir and list s3 dir	2022-09-27 09:54:37 +08:00
Adonis Ling	72b909b5e8	[enhancement](workflow) Enable the shellcheck workflow to comment the PRs (#12633 ) > Due to the dangers inherent to automatic processing of PRs, GitHub’s standard pull_request workflow trigger by default prevents write permissions and secrets access to the target repository. However, in some scenarios such access is needed to properly process the PR. To this end the pull_request_target workflow trigger was introduced. According to the article [Keeping your GitHub Actions and workflows secure](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/) , the trigger condition in `shellcheck.yml` which is `pull_request` can't comment the PR due to the lack of write permissions of the workflow. Despite the `ShellCheck` workflow checkouts the source, but it doesn't build and test the source code. I think it is safe to change the trigger condition from `pull_request` to `pull_request_target` which can make the workflow have write permissions to comment the PR.	2022-09-27 09:08:12 +08:00
Xinyi Zou	b14b178928	[enhancement](memory) Trigger load channel flush based on process physical memory to avoid OOM #12960 When the physical memory of the process reaches 90% of the mem limit, trigger the load channel mgr to brush down The default value of be.conf mem_limit is changed from 90% to 80%, and stability is the priority. Fix deadlock in arena_locks in BufferPool::BufferAllocator::ScavengeBuffers and _lock in DebugString	2022-09-27 09:07:38 +08:00
TengJianPing	df9dcba6db	[regression-case](improve) improve regression test case (#12979 )	2022-09-27 08:53:53 +08:00

1 2 3 4 5 ...

6523 Commits