Commit Graph

7178 Commits

Author SHA1 Message Date
a45685d028 [fix](regression) concurrent regression cases may fail #14271
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-15 15:46:34 +08:00
Pxl
e298696baf [Chore](env) add error information when DORIS_GCC_HOME not set well (#14249) 2022-11-15 15:45:35 +08:00
87544a017f [fuzztest](fe session variable) add fuzzy test config for fe session variables. (#14272)
Many feature in fe session variable is disabled by default. So that these features do not pass github workflow test actually. I add a fuzzy test config in fe.conf. If it is set to true, then we will use fuzzy session variables for every connection so that every feature developer could set fuzzy values for its config.



Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-11-15 15:43:21 +08:00
f86886f8f5 [Feature](function) Support array_compact function (#14141) 2022-11-15 14:24:37 +08:00
5ae046b208 [bugfix](log) fix wrong print introduced by 49fecd2a6dae #14266 2022-11-15 11:39:05 +08:00
a3062c662c [feature-wip](statistics) support statistics injection and show statistics (#14201)
1. Reduce the configuration options for statistics framework, and add comment for those rest.
2. Move the logic of creation of analysis job to the `StatisticsRepository` which defined all the functions used to interact with internal statistics table
3. Move AnalysisJobScheduler to the statistics package
4. Support display and injections manually for statistics
2022-11-15 11:29:51 +08:00
6cc5ae077e [Improvement](Sequence function) Capitalize const variables (#14270) 2022-11-15 10:41:53 +08:00
89db3fee00 [feature-wip](MTMV)Add show statement for MTMV (#13786)
Use Case

mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1');
mysql > CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk;

mysql> SHOW MTMV JOB;
mysql> SHOW MTMV TASK;
2022-11-15 10:32:47 +08:00
215a4c6e02 [Bug](BHJ) Fix wrong result when use broadcast hash join for naaj (#14253) 2022-11-15 09:40:00 +08:00
cffdeff4ec [fix](memory) Fix memory leak by calling boost::stacktrace (#14269)
boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace.
The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace.
refer to:
boostorg/stacktrace#118
boostorg/stacktrace#111
2022-11-15 08:58:57 +08:00
93e5d8e660 [Vectorized](function) support bitmap_from_array function (#14259) 2022-11-15 01:55:51 +08:00
37fdd011b4 [fix](fe-metric) Prometheus read format error #13831 (#13832)
Co-authored-by: 迟成 <chicheng@meituan.com>
2022-11-14 22:07:00 +08:00
b0ff852d74 [opt](Nereids) right deep tree penalty adjust: use right rowCount, not abs(left - right) (#14239)
in origin algorithm, the penalty is abs(leftRowCount - RightRowCount). this will make some right deep tree escape from penalty, because the substraction is almost zero. Penalty by RightRowCount can avoid this escape.
2022-11-14 16:40:26 +08:00
bea66e6a12 [fix](nereids) cannot generate RF on colocate join and prune useful RF in RF prune (#14234)
1. when we translate colocated join, we lost RF information attached to the right child, and hence BE will not generate those RFs.
2. when a RF is useless, we prune all RFs on the scan node by mistake
2022-11-14 16:36:55 +08:00
8dd2f8b349 [enhancement](nereids) set Ndv=rowCount if ndv is almost equal to rowCount on ColumnStatisitics load (#14238) 2022-11-14 16:30:35 +08:00
bdf7d2779a [fix](Nereids) aggregate always report has 1 row count (#14236)
the data structure of new stats is changed, bug Agg-estimation is not changed
2022-11-14 16:27:55 +08:00
47326f951d [fix](nereids) count(*) reports npe when do filter selectivity estimation (#14235) 2022-11-14 16:11:08 +08:00
cf5e2a2eb6 [fix](nereids) new statistics use wrong default selectivity (#14233)
by default, column selectivity MUST be 1.0, not ZERO
2022-11-14 16:09:17 +08:00
fc70179acb [multi-catalog](fix) the eof of lazy read columns may be not equal to the eof of predicate columns (#14212)
Fix three bugs:
1. The EOF of lazy read columns may be not equal to the EOF of predicate columns.
(for example: If the predicate column has 3 pages, with 400 rows for each, but the last page
is filtered by page index. When batch_size=992, the EOF of predicate column is true.
However, we should set batch_size=800 for lazy read column, so the EOF of lazy read column may be false.)
2. The array column does not count the number of nulls
3. Generate wrong NullMap for array column
2022-11-14 14:37:21 +08:00
7eed5a292c [feature-wip](multi-catalog) Support hive partition cache (#14134) 2022-11-14 14:12:40 +08:00
30f36070b5 [test](multi-catalog)Regression test for external hive parquet table (#13611) 2022-11-14 14:10:10 +08:00
594e3b8224 [feature](Nereids) add circle detector and avoid overlap (#14164) 2022-11-14 14:02:14 +08:00
23a8c7eeb6 (fix)(multi-catalog)(es) Fix error result because not used fields_context (#14229)
Fix error result because not used fields_context
2022-11-14 14:00:55 +08:00
49fecd2a6d [improvement](log) print info of error replicas (#14220) 2022-11-14 11:37:18 +08:00
15eb07b829 [BugFix](file cache) don't clean clone dir when doing _gc_unused_file_caches (#14194)
* use another file_size overload for noexcept

* don't gc clone dir

* use better status
2022-11-14 11:35:08 +08:00
13b1f92c63 [enhancement](Nereids) add output set and output exprid set cache (#14151) 2022-11-14 11:24:57 +08:00
7bb3792d51 [chore](build) Split the compliation units to build them in parallel (#14232) 2022-11-14 10:57:10 +08:00
d55faa7f6a [feature](remote)Only query can use local cache when reading remote files. (#13865)
When calling select on remote files, download cache files to local disk.
When calling alter table on remote files, read files directly from remote storage. So if tablet is too large, it will not take up too many local disk when creating local cache file.
2022-11-14 10:30:15 +08:00
24b51b9035 [fix](compaction) segcompaction coredump if the rowset starts with a big segment (#14174) (#14176)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-11-14 09:54:08 +08:00
139c4a77f1 [enhancement](be)close ExecNode ASAP to release resource earlier (#14203) 2022-11-14 09:41:35 +08:00
8263c34da6 [fix](ctas) use json_object in CTAS get wrong result (#14173)
* [fix](ctas) use json_object in CTAS get wrong result

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-14 09:13:05 +08:00
a179b22937 [fix](schema) Release memory of TabletSchemaPB in RowsetMetaPB #13993 2022-11-14 08:36:30 +08:00
3bc26f773d [hotfix](memtracker) Fix expired DCHECK(_limit != -1); and segment_meta_mem_tracker inelegant end (#14223) 2022-11-13 17:15:29 +08:00
72748c229a update (#14215) 2022-11-13 12:31:42 +08:00
33b50860c7 [improvement](load) release load channel actively when error occurs (#14218) 2022-11-13 12:31:15 +08:00
dd11d5c0a5 [enhancement](memory) Support try catch bad alloc (#14135) 2022-11-13 11:22:56 +08:00
7682c08af0 [improvement](load) reduce memory in batch for small load channels (#14214) 2022-11-12 22:14:01 +08:00
beaf2fcaf6 [feature](partition) support new create partition syntax (#13772)
Create partitions use :
```
PARTITION BY RANGE(event_day)(
        FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 1 YEAR,
        FROM ("2021-11-14") TO ("2022-11-14") INTERVAL 1 MONTH,
        FROM ("2022-11-14") TO ("2023-01-03") INTERVAL 1 WEEK,
        FROM ("2023-01-03") TO ("2023-01-14") INTERVAL 1 DAY,
        PARTITION p_20230114 VALUES [('2023-01-14'), ('2023-01-15'))
)

PARTITION BY RANGE(event_time)(
        FROM ("2023-01-03 12") TO ("2023-01-14 22") INTERVAL 1 HOUR
)
```
can create a year/month/week/day/hour's date partitions in a batch,
also it is compatible with the single partitioning method.
2022-11-12 20:52:37 +08:00
376b4fda9f [fix](scankey) fix extended scan key errors. (#14200)
Issue Number: close #14199
2022-11-12 20:44:09 +08:00
082028b2a2 [test](jdbc postgresql case)add jdbc test case for postgresql (#14162) 2022-11-12 20:43:13 +08:00
78fa167b0a [test](jdbc external table) add jdbc regression test case (#14086) 2022-11-12 20:42:57 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
bf79805a66 [regression-test] sleep longer to void error (#14186) 2022-11-12 11:13:52 +08:00
44eb1cf1c3 [fix](chore) read max_map_count from proc and make notice much more understandable (#14137)
Some users can not use sysctl under non-root in linux, so we read max_map_count from proc.
Notice users that they can change max_map_count under root.
2022-11-11 23:05:54 +08:00
43490a33a5 [feature-array](array-type) Add array function array_with_constant (#14115)
Return array of constants with length num.

```
mysql> select array_with_constant(4, 1223);
+------------------------------+
| array_with_constant(4, 1223) |
+------------------------------+
| [1223, 1223, 1223, 1223]     |
+------------------------------+
1 row in set (0.01 sec)
```
co-authored-by @eldenmoon
2022-11-11 22:08:43 +08:00
0ba13af8ff [feature](running_difference) support running_difference function (#13737) 2022-11-11 21:22:56 +08:00
28ae281936 [chore](cmake) Fix wrong statements (#14187) 2022-11-11 18:22:49 +08:00
43f80e2633 [enhancement](load) Increase batch size of node channel to improve import performance (#13912) 2022-11-11 18:05:36 +08:00
2e29b15c6a [test](array function)add array_range function test (#14123)
* add array_range function test

* add array_range function test
2022-11-11 18:04:33 +08:00
d9913b1317 [Enhancement](Nerieds) Support numbers TableValuedFunction and some bitmap/hll aggregate function (#14169)
## Problem summary
This pr support
1. `numbers` TableValuedFunction for nereids test, like `select * from numbers(number = 10, backend_num = 1)`
2. bitmap/hll aggregate function
3. support find variable length function in function registry, like `coalesce`
4. fix a bug that print nerieds trace will throw exception because use RewriteRule in ApplyRuleJob, e.g: `AggregateDisassemble`, introduced by #13957
2022-11-11 16:29:15 +08:00