Commit Graph

7163 Commits

Author SHA1 Message Date
bdf7d2779a [fix](Nereids) aggregate always report has 1 row count (#14236)
the data structure of new stats is changed, bug Agg-estimation is not changed
2022-11-14 16:27:55 +08:00
47326f951d [fix](nereids) count(*) reports npe when do filter selectivity estimation (#14235) 2022-11-14 16:11:08 +08:00
cf5e2a2eb6 [fix](nereids) new statistics use wrong default selectivity (#14233)
by default, column selectivity MUST be 1.0, not ZERO
2022-11-14 16:09:17 +08:00
fc70179acb [multi-catalog](fix) the eof of lazy read columns may be not equal to the eof of predicate columns (#14212)
Fix three bugs:
1. The EOF of lazy read columns may be not equal to the EOF of predicate columns.
(for example: If the predicate column has 3 pages, with 400 rows for each, but the last page
is filtered by page index. When batch_size=992, the EOF of predicate column is true.
However, we should set batch_size=800 for lazy read column, so the EOF of lazy read column may be false.)
2. The array column does not count the number of nulls
3. Generate wrong NullMap for array column
2022-11-14 14:37:21 +08:00
7eed5a292c [feature-wip](multi-catalog) Support hive partition cache (#14134) 2022-11-14 14:12:40 +08:00
30f36070b5 [test](multi-catalog)Regression test for external hive parquet table (#13611) 2022-11-14 14:10:10 +08:00
594e3b8224 [feature](Nereids) add circle detector and avoid overlap (#14164) 2022-11-14 14:02:14 +08:00
23a8c7eeb6 (fix)(multi-catalog)(es) Fix error result because not used fields_context (#14229)
Fix error result because not used fields_context
2022-11-14 14:00:55 +08:00
49fecd2a6d [improvement](log) print info of error replicas (#14220) 2022-11-14 11:37:18 +08:00
15eb07b829 [BugFix](file cache) don't clean clone dir when doing _gc_unused_file_caches (#14194)
* use another file_size overload for noexcept

* don't gc clone dir

* use better status
2022-11-14 11:35:08 +08:00
13b1f92c63 [enhancement](Nereids) add output set and output exprid set cache (#14151) 2022-11-14 11:24:57 +08:00
7bb3792d51 [chore](build) Split the compliation units to build them in parallel (#14232) 2022-11-14 10:57:10 +08:00
d55faa7f6a [feature](remote)Only query can use local cache when reading remote files. (#13865)
When calling select on remote files, download cache files to local disk.
When calling alter table on remote files, read files directly from remote storage. So if tablet is too large, it will not take up too many local disk when creating local cache file.
2022-11-14 10:30:15 +08:00
24b51b9035 [fix](compaction) segcompaction coredump if the rowset starts with a big segment (#14174) (#14176)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-11-14 09:54:08 +08:00
139c4a77f1 [enhancement](be)close ExecNode ASAP to release resource earlier (#14203) 2022-11-14 09:41:35 +08:00
8263c34da6 [fix](ctas) use json_object in CTAS get wrong result (#14173)
* [fix](ctas) use json_object in CTAS get wrong result

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-14 09:13:05 +08:00
a179b22937 [fix](schema) Release memory of TabletSchemaPB in RowsetMetaPB #13993 2022-11-14 08:36:30 +08:00
3bc26f773d [hotfix](memtracker) Fix expired DCHECK(_limit != -1); and segment_meta_mem_tracker inelegant end (#14223) 2022-11-13 17:15:29 +08:00
72748c229a update (#14215) 2022-11-13 12:31:42 +08:00
33b50860c7 [improvement](load) release load channel actively when error occurs (#14218) 2022-11-13 12:31:15 +08:00
dd11d5c0a5 [enhancement](memory) Support try catch bad alloc (#14135) 2022-11-13 11:22:56 +08:00
7682c08af0 [improvement](load) reduce memory in batch for small load channels (#14214) 2022-11-12 22:14:01 +08:00
beaf2fcaf6 [feature](partition) support new create partition syntax (#13772)
Create partitions use :
```
PARTITION BY RANGE(event_day)(
        FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 1 YEAR,
        FROM ("2021-11-14") TO ("2022-11-14") INTERVAL 1 MONTH,
        FROM ("2022-11-14") TO ("2023-01-03") INTERVAL 1 WEEK,
        FROM ("2023-01-03") TO ("2023-01-14") INTERVAL 1 DAY,
        PARTITION p_20230114 VALUES [('2023-01-14'), ('2023-01-15'))
)

PARTITION BY RANGE(event_time)(
        FROM ("2023-01-03 12") TO ("2023-01-14 22") INTERVAL 1 HOUR
)
```
can create a year/month/week/day/hour's date partitions in a batch,
also it is compatible with the single partitioning method.
2022-11-12 20:52:37 +08:00
376b4fda9f [fix](scankey) fix extended scan key errors. (#14200)
Issue Number: close #14199
2022-11-12 20:44:09 +08:00
082028b2a2 [test](jdbc postgresql case)add jdbc test case for postgresql (#14162) 2022-11-12 20:43:13 +08:00
78fa167b0a [test](jdbc external table) add jdbc regression test case (#14086) 2022-11-12 20:42:57 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
bf79805a66 [regression-test] sleep longer to void error (#14186) 2022-11-12 11:13:52 +08:00
44eb1cf1c3 [fix](chore) read max_map_count from proc and make notice much more understandable (#14137)
Some users can not use sysctl under non-root in linux, so we read max_map_count from proc.
Notice users that they can change max_map_count under root.
2022-11-11 23:05:54 +08:00
43490a33a5 [feature-array](array-type) Add array function array_with_constant (#14115)
Return array of constants with length num.

```
mysql> select array_with_constant(4, 1223);
+------------------------------+
| array_with_constant(4, 1223) |
+------------------------------+
| [1223, 1223, 1223, 1223]     |
+------------------------------+
1 row in set (0.01 sec)
```
co-authored-by @eldenmoon
2022-11-11 22:08:43 +08:00
0ba13af8ff [feature](running_difference) support running_difference function (#13737) 2022-11-11 21:22:56 +08:00
28ae281936 [chore](cmake) Fix wrong statements (#14187) 2022-11-11 18:22:49 +08:00
43f80e2633 [enhancement](load) Increase batch size of node channel to improve import performance (#13912) 2022-11-11 18:05:36 +08:00
2e29b15c6a [test](array function)add array_range function test (#14123)
* add array_range function test

* add array_range function test
2022-11-11 18:04:33 +08:00
d9913b1317 [Enhancement](Nerieds) Support numbers TableValuedFunction and some bitmap/hll aggregate function (#14169)
## Problem summary
This pr support
1. `numbers` TableValuedFunction for nereids test, like `select * from numbers(number = 10, backend_num = 1)`
2. bitmap/hll aggregate function
3. support find variable length function in function registry, like `coalesce`
4. fix a bug that print nerieds trace will throw exception because use RewriteRule in ApplyRuleJob, e.g: `AggregateDisassemble`, introduced by #13957
2022-11-11 16:29:15 +08:00
fe2944d56d [Bug](nljoin) Keep compatibility for nljoin (#14182) 2022-11-11 15:54:55 +08:00
a162dab40a [feature](docs) add docs for SHOW-CATALOG-RECYCLE-BIN (#14185) 2022-11-11 15:54:05 +08:00
74a1e28af3 [Opt](exec) prevent the scan key split whole range (#14088)
prevent the scan key split whole range
2022-11-11 15:46:00 +08:00
015f8ab78d [enhancement](thirdparty) support create stripe reader by column names (#14184)
ORC NextStripeReader now only support read columns by indices, but it is hard to get column indices for complex types.
We patch ORC adapter to support read columns by column names.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-11 15:10:20 +08:00
02a86d2215 [Bug](runtimefilter) Fix concurrent bug in runtime filter #14177
For runtime filter, signal will be called by a thread which is different from the await thread. So there will be a potential race for variable is_ready
2022-11-11 14:16:18 +08:00
7c48168a53 [refactor](Nereids) remove DecimalType, use DecimalV2Type instead (#14166) 2022-11-11 13:58:16 +08:00
b6ba654f5b [Feature](Sequence) Support sequence_match and sequence_count functions (#13785) 2022-11-11 13:38:45 +08:00
5fad4f4c7b [feature](Nereids) replace order by keys by child output if possible (#14108)
To support query like that:
SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY c1 + 1

After rewrite, plan will equal to
SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY a
2022-11-11 13:34:29 +08:00
9b50888aaf [feature](Nereids) prune runtime filters which cannot reduce the tuple number of probe table (#13990)
1. add a post processor: runtime filter pruner 
Doris generates RFs (runtime filter) on Join node to reduce the probe table at scan stage. But some RFs have no effect, because its selectivity is 100%. This pr will remove them.
A RF is effective if
a. the build column value range covers part of that of probe column, OR
b. the build column ndv is less than that of probe column, OR
c. the build column's ColumnStats.selectivity < 1, OR
d. the build column is reduced by another RF, which satisfies above criterions.

2. explain graph
a. add RF info in Join and Scan node
b. add predicate count in Scan node

3. Rename session variable
rename `enable_remove_no_conjuncts_runtime_filter_policy` to `enable_runtime_filter_prune` 

4. fix min/max column stats derive bug
`select max(A) as X from T group by B`  
X.min is A.min, not A.max
2022-11-11 13:13:29 +08:00
118a7dff07 [chore](build) Optimize the compilation time (#14170)
Currently, it takes too much time to build BE from source in workflow environments (P0/P1) which affects the efficiency of daily development.

We can measure the time by executing the following command.

time EXTRA_CXX_FLAGS='-O3' BUILD_TYPE=ASAN ./build.sh --be --fe --clean -j "$(nproc)"
This PR optimizes the compilation time by exploiting the following methods.

Reduce the codegen by removing some useless std::visit.
Disable the optimization for some template functions which are instantiated by std::visit conditionally (except for the RELEASE build).
2022-11-11 12:09:54 +08:00
8e17fcef3f [fix](cast)fix cast to char(N) error (#14168) 2022-11-11 11:27:51 +08:00
883dfa38ab [fix](decimal) change log fatal to log warning to avoid code dump on decimal type (#14150) 2022-11-11 11:22:41 +08:00
de00ade6dd [Docs](README)Update the README.md (#14156)
Add the new release in Readme.md
2022-11-11 11:22:17 +08:00
8812a680fc [fix](metric) fix the bug of not updating the query latency metric #14172 2022-11-11 11:21:17 +08:00
d204c7dc1e [Improvement](profile) Improve readability for runtime filters in profile string (#14165)
* [Improvement](profile) Improve readability for runtime filters in profile string

* update
2022-11-11 11:19:24 +08:00