Commit Graph

8276 Commits

Author SHA1 Message Date
a607c30ad4 [docs] Fe build idea doc (#10996)
* [doc](fe): enhance the fe-idea-dev
* [doc](fe)add solution for m1 mac compile error

Co-authored-by: jackwener <jakevingoo@gmail.com>
2022-07-20 19:03:29 +08:00
b62e3e7aa0 [regression test]Add ssb sf1 test under unique table with zstd (#11004)
Co-authored-by: smallhibiscus <844981280>
2022-07-20 18:59:46 +08:00
0a8ae6aeec Refractor COLLECT_LIST and COLLECT_SET register logic (#10956)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-20 18:02:39 +08:00
1ca00e0107 [tools] add clickbench tools (#11009)
* [tools] add clickbench tools

Co-authored-by: stephen <hello-stephen@qq.com>
2022-07-20 17:59:04 +08:00
e5663f9872 [Bug](array-type) Fix the core dump caused by unaligned __int128 (#11020)
Fix the core dump caused by unaligned __int128 and change DEFAULT_ALIGNMENT
2022-07-20 16:37:27 +08:00
a71822a74d [refactor]remove col_unique_id (#11025) 2022-07-20 16:35:14 +08:00
7bdce8f572 [refactor](policy) refactor some policy create and check logic (#11007)
* [refactor](policy) refactor some policy create and check logic
2022-07-20 16:20:59 +08:00
658a9f7531 [fix](planner)unnecessary cast will be added on children in InPredicate (#11033) 2022-07-20 16:00:26 +08:00
6233b5200e [refactor] (Nereids) rename GroupExpression.getParent() to getOwnerGroup() (#11027)
GroupExpression.getParent() returns the group which contains this expr. This name is missleading especially in tree structures.
So we change the name to getOwnerGroup.
2022-07-20 15:57:59 +08:00
a1c1cfce47 Add some comments for the feature mow (#11028) 2022-07-20 15:35:41 +08:00
ec5471f048 [feature-wip](unique-key-merge-on-write) Implement tablet lookup interface, using rowset-tree, DSIP-018[3/5] (#10938) 2022-07-20 14:52:14 +08:00
9b91f86c38 [Feature](Nereids) Reorder join to eliminate cross join. (#10890)
Try to eliminate cross join via finding join conditions in filters and changing the join orders.
For example:

-- input:
SELECT * FROM t1, t2, t3 WHERE t1.id=t3.id AND t2.id=t3.id

-- output:
SELECT * FROM t1 JOIN t3 ON t1.id=t3.id JOIN t2 ON t2.id=t3.id
This feature is controlled by session variable enable_nereids_reorder_to_eliminate_cross_join with true by default.

Simplify usage of Memo and rewrite rule application.
Before this PR, if we want to apply a rewrite rule to a plan, the code is like the below:

    Memo memo = new Memo();
        memo.initialize(root);

    PlannerContext plannerContext = new PlannerContext(memo, new ConnectContext());
    JobContext jobContext = new JobContext(plannerContext, new PhysicalProperties(), 0);
    RewriteTopDownJob rewriteTopDownJob = new RewriteTopDownJob(memo.getRoot(),
            ImmutableList.of(new AggregateDisassemble().build()), jobContext);
        plannerContext.pushJob(rewriteTopDownJob);
        plannerContext.getJobScheduler().executeJobPool(plannerContext);

    Plan after = memo.copyOut();
After this PR, we could use chain style calling:

    new Memo(plan)
        .newPlannerContext(connectContext)
        .setDefaultJobContext()
        .topDownRewrite(new AggregateDisassemble())
        .getMemo()
        .copyOut();
Rename the session variable enable_nereids to enable_nereids_planner to make it more meaningful.
2022-07-20 13:53:54 +08:00
56e036e68b [feature-wip](multi-catalog) Support runtime filter for file scan node (#11000)
* [feature-wip](multi-catalog) Support runtime filter for file scan node

Co-authored-by: morningman <morningman@apache.org>
2022-07-20 12:36:57 +08:00
a5a50726bf [Ehancement](planner) Rewrite implicit cast to the predicates (#10920)
During the analysis of BinaryPredicate, it will generate a CastExpr if the slot implicitly in the below case:
SELECT * FROM t1 WHERE t1.col1 = '1';
col1 is integer column.

This will prevent the binary predicate from pushing down to OlapScan which would impact the performance.
2022-07-20 12:28:29 +08:00
dc2b709f6f [Bug](compaction) fix uniq key compaction bug that does not count merged rows right(#10971)
When a rowset includes multiple segments, segments rows will be merged in generic_iterator but merged_rows is not maintained. Compaction will failed in check_correctness.
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-07-20 12:07:45 +08:00
989e6d1cf9 [chore]fix clang compile error (#11021) 2022-07-20 08:28:47 +08:00
ba9c7e50aa [doc] missing sidebar for cloudcanal (#10998) 2022-07-19 23:51:12 +08:00
fd2c374426 [fix]Empty string key in aggregation was output as NULL (#11011) 2022-07-19 23:25:28 +08:00
Pxl
95366de7f6 cast array element to same type (#10980)
Fix problem when there are element of different types in an array.
2022-07-19 21:47:10 +08:00
371c7be235 [feature-wip](unique-key-merge-on-write) add segment lookup interface implementation, DSIP-018 (#10922) 2022-07-19 21:14:32 +08:00
d7770db5e2 Revert "[regressiontest] add tpcds_sf1 test (#10852)" (#11008)
This reverts commit d2bee602514e8238dd8ef3d3b9b34fb6171bd26f.
2022-07-19 18:41:53 +08:00
2d90f4b87c [feature-wip](statistics) step4: collect statistics by implementing statistics tasks (#8861)
This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370), it will not affect any existing code and users will not be able to create statistics job.

Now only MetaStatisticsTask that directly collects statistics by reading FE meta is implemented. SQLStatisticsTask is still being implemented, it needs to query BE through FE.

The following is the function implemented by this pr: 
1. Support statistics collection for partitioned and non-partitioned tables. For partitioned tables, the collection of statistics for the specified partition is implemented.
2. When the task is divided, it is divided according to the partition table and the non-partition table. The most fine-grained is to the tablet level. A matetask collects as many statistics as possible. 
3. Add partition statistics (Table -> Partition -> Column). For example, the size of the table, the number of rows, the size of the partition, the number of rows, the maximum and minimum values of the columns, etc.
4. Display and modify partition-level statistics.
 …
2022-07-19 16:22:25 +08:00
ac4ce4d874 Revert "[regression] Add ssb sf1 test under unique table with zstd (#10957)" (#10992)
This reverts commit 216a55c12c0be5c4090523195b2aff9d96c64f65.
2022-07-19 15:44:32 +08:00
d5fa66d9a3 [Enhancement] [Memory] Limit memory usage use process actual physical memory (#10924) 2022-07-19 11:08:39 +08:00
b70274e2af [docs] Changing the symbol of dataX doriswriter table creation statement (#10632)
* Update datax.md
2022-07-19 10:15:27 +08:00
f6cb7a838b [Optimize] Improve performance like/not like filter through pushdown function to storage engine (#10355)
* support like/not like conjuncts push down to storage engine
* vectorized engine support like/not like conjuncts push down to storage engine
* support both evaluate and evaluate_vec method in like predicate
* reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts
* change #ifndef to pragma once as per comments
* change enable_function_pushdown default to false
Co-authored-by: heguangnan <heguangnan@bytedance.com>
2022-07-19 08:33:04 +08:00
d2bee60251 [regressiontest] add tpcds_sf1 test (#10852)
Co-authored-by: smallhibiscus <844981280>
Co-authored-by: stephen <hello-stephen@qq.com>
2022-07-19 08:30:53 +08:00
2acd5efcd8 [improvement-log]print a log when got a lower image version (#10910) 2022-07-19 08:29:58 +08:00
842ff2b1e2 [refactor] Refactor time LUT (#10982) 2022-07-19 08:23:29 +08:00
68b9a2936a [improvement](doe) Step1: Fe generates the DSL and is used to explain (#9895)
For the first step, I will only change FE and then change BE once I make sure the DSL is ok.
2022-07-18 23:20:58 +08:00
e769597fd2 [Improvement] (datetime) support microsecond for date literal (#10917)
* [Improvement] (datetime) support microsecond for date literal

* remove joda dependency
2022-07-18 21:39:39 +08:00
8a366c9ba2 [feature](multi-catalog) read parquet file by start/offset (#10843)
To avoid reading the repeat row group, we should align offsets
2022-07-18 20:51:08 +08:00
60dd322aba [feature-wip](multi-catalog) Optimize threads and thrift interface of FileScanNode (#10942)
FileScanNode in be will launch as many threads as the number of splits.
The thrift interface of FileScanNode is excessive redundant.
2022-07-18 20:50:34 +08:00
a849f5be71 [feature](Nereids): hashCode(), equals() and UT. (#10870)
Add hashCode(), equals() for operator.

Add basic UT for them(need more detail test).

**future ticket**: add hashCode(), equals() and UT for `Expression`
2022-07-18 20:33:10 +08:00
4c161b7e2c [regression-test] add tpch_sf1 test (#10846)
Co-authored-by: stephen <hello-stephen@qq.com>
2022-07-18 20:00:02 +08:00
b185545243 [refactor](Nereids)remove generic type from Rule and Job (#10897) 2022-07-18 19:35:16 +08:00
Pxl
afc1d0c05c [Chore][Compile] fix compile fail on clang (#10837)
fix compile fail on clang because of output int128
2022-07-18 19:21:01 +08:00
899acb6564 [improvement][agg]import sub hashmap (#10937) 2022-07-18 18:36:45 +08:00
b037aca4fd [improvement](dynamic-partition) add replication allocation check for dynamic partition when creating table(#10892) 2022-07-18 18:02:33 +08:00
a2ed4b5c78 [improvement] improvement for light weight schema change (#10860)
* improvement for dynamic schema
not use schema as lru cache key any more.
load segment just use the rowset's original schema not the current read schema.
generate column reader and column iterator using the original schema, using the read schema if it is a new column.
using column unique id as key instead of column ordinals.
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-07-18 17:53:31 +08:00
ba04c983ae [regression-test]Add order by for qt_select1 in test_aggregate_all_functions (#10951) 2022-07-18 17:44:23 +08:00
890fd70620 [improvement] dynamically calculate max rows to read in a batch to avoid oom (#10972) 2022-07-18 17:43:53 +08:00
6736e06679 [feature](udf) Vectorization support remote udaf #10683 (#10685) 2022-07-18 17:15:34 +08:00
9adbd8abbd [feature](resource-tag) support multi tag for a single Backend (#10901) 2022-07-18 16:50:45 +08:00
091e17ecab fix(fe): add , with json_root property in stmt show create routine load for xx_job (#10929)
Fix issue: https://github.com/apache/doris/issues/10928
2022-07-18 16:44:40 +08:00
216a55c12c [regression] Add ssb sf1 test under unique table with zstd (#10957)
* Add ssb sf1 test under unique table with zstd

Co-authored-by: smallhibiscus <844981280>
2022-07-18 16:35:14 +08:00
d9095922d9 [Enhancement] [Memory] add strict memory usage compile option STRICT_MEMORY_USE (#10936)
In the strict memory usage mode of STRICT_MEMORY_USE=ON, when the capacity of the vectorized Hash Table is greater than 2G, it starts to grow when 75% of the capacity is satisfied, the memory usage of the vectorized Join becomes 50% of the previous value.

STRICT_MEMORY_USE=ON` expects BE to use less memory, and gives priority to ensuring stability when the cluster memory is limited.
2022-07-18 16:16:43 +08:00
d199283df0 [Docs] add doc of tablet local debug (#10944)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-07-18 16:02:29 +08:00
006d7c9225 [fix]The spring boot startup banner is lost, and the maven package does not package the pictures in the resources directory (#10955) 2022-07-18 16:00:14 +08:00
234e822b36 [Regression](Array) add more array test (#10770) 2022-07-18 15:27:13 +08:00