Commit Graph

8276 Commits

Author SHA1 Message Date
74a1e28af3 [Opt](exec) prevent the scan key split whole range (#14088)
prevent the scan key split whole range
2022-11-11 15:46:00 +08:00
015f8ab78d [enhancement](thirdparty) support create stripe reader by column names (#14184)
ORC NextStripeReader now only support read columns by indices, but it is hard to get column indices for complex types.
We patch ORC adapter to support read columns by column names.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-11 15:10:20 +08:00
02a86d2215 [Bug](runtimefilter) Fix concurrent bug in runtime filter #14177
For runtime filter, signal will be called by a thread which is different from the await thread. So there will be a potential race for variable is_ready
2022-11-11 14:16:18 +08:00
7c48168a53 [refactor](Nereids) remove DecimalType, use DecimalV2Type instead (#14166) 2022-11-11 13:58:16 +08:00
b6ba654f5b [Feature](Sequence) Support sequence_match and sequence_count functions (#13785) 2022-11-11 13:38:45 +08:00
5fad4f4c7b [feature](Nereids) replace order by keys by child output if possible (#14108)
To support query like that:
SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY c1 + 1

After rewrite, plan will equal to
SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY a
2022-11-11 13:34:29 +08:00
9b50888aaf [feature](Nereids) prune runtime filters which cannot reduce the tuple number of probe table (#13990)
1. add a post processor: runtime filter pruner 
Doris generates RFs (runtime filter) on Join node to reduce the probe table at scan stage. But some RFs have no effect, because its selectivity is 100%. This pr will remove them.
A RF is effective if
a. the build column value range covers part of that of probe column, OR
b. the build column ndv is less than that of probe column, OR
c. the build column's ColumnStats.selectivity < 1, OR
d. the build column is reduced by another RF, which satisfies above criterions.

2. explain graph
a. add RF info in Join and Scan node
b. add predicate count in Scan node

3. Rename session variable
rename `enable_remove_no_conjuncts_runtime_filter_policy` to `enable_runtime_filter_prune` 

4. fix min/max column stats derive bug
`select max(A) as X from T group by B`  
X.min is A.min, not A.max
2022-11-11 13:13:29 +08:00
118a7dff07 [chore](build) Optimize the compilation time (#14170)
Currently, it takes too much time to build BE from source in workflow environments (P0/P1) which affects the efficiency of daily development.

We can measure the time by executing the following command.

time EXTRA_CXX_FLAGS='-O3' BUILD_TYPE=ASAN ./build.sh --be --fe --clean -j "$(nproc)"
This PR optimizes the compilation time by exploiting the following methods.

Reduce the codegen by removing some useless std::visit.
Disable the optimization for some template functions which are instantiated by std::visit conditionally (except for the RELEASE build).
2022-11-11 12:09:54 +08:00
8e17fcef3f [fix](cast)fix cast to char(N) error (#14168) 2022-11-11 11:27:51 +08:00
883dfa38ab [fix](decimal) change log fatal to log warning to avoid code dump on decimal type (#14150) 2022-11-11 11:22:41 +08:00
de00ade6dd [Docs](README)Update the README.md (#14156)
Add the new release in Readme.md
2022-11-11 11:22:17 +08:00
8812a680fc [fix](metric) fix the bug of not updating the query latency metric #14172 2022-11-11 11:21:17 +08:00
d204c7dc1e [Improvement](profile) Improve readability for runtime filters in profile string (#14165)
* [Improvement](profile) Improve readability for runtime filters in profile string

* update
2022-11-11 11:19:24 +08:00
1f9fb4dc8b [Bugfix] Fix upgrade from 1.1 coredump (#14163)
When upgrade from 1.1 to master, and then rollback to 1.1, and upgrade to master again, BE will coredump because some rowsets has schema and some rowsets has no schema. In the first time upgrade from 1.1, BE will flush schema in all rowsets and after rollback to 1.1, BE do compaction, and create some new rowset without schema. And the second time upgrade from 1.1, BE coredump because some conditions depend on having all or none of the rowsets.
2022-11-11 10:29:34 +08:00
7782fb63ca [docs](outfile) Add ORC to outfile document (#14153) 2022-11-11 09:42:30 +08:00
6297ef10e9 [enhancement](plugin) import audit logs for slow queries into a separate table (#14100)
* import audit logs for slow queries into a separate table
2022-11-11 09:06:01 +08:00
12652ebb0e [UDF](java udf) using config to enable java udf instead of macro at compile time (#14062)
* [UDF](java udf) useing config to enable java udf instead of macro at compile time
2022-11-11 09:03:52 +08:00
b62e700f4e [fix](doc): remove incubator. (#14159) 2022-11-11 08:58:42 +08:00
e1e63f8354 [feature-wip](statistic) persistence table statistics into olap table (#13883)
1. Supports for persisting collected statistics to a pre-built OLAP table named `column_statistics`.
2. Use a much simpler mechanism to collect statistics: all the gauges are collected in single one SQL for each partition and then the whole column, which defined in class `AnalysisJob`
3. Implement a cache to manage the statistics records in FE

TODO:

1. Use opentelemetry to monitor the execution time of each job
2. Format the internal analysis SQL
3. split SQL to promise the in expr's child count not exceeds the FE limits of generated SQL for deleting expired records
4. Implements show statements
2022-11-10 22:08:08 +08:00
45a3bb87c4 [docs](recover) modify recover doc (#13904) 2022-11-10 20:20:39 +08:00
1ef85ae1f2 [Improvement](join) Support nested loop outer join (#13965) 2022-11-10 19:50:46 +08:00
6c13126e5c [enhancement](Nereids) analyze check input slots must in child's output (#14107) 2022-11-10 19:28:01 +08:00
ae4f2aead7 [fix](nereids) column stats min/max missing (#14091)
in the result of SHOW COLUMN STATS tbl, min/max value is not displayed.
2022-11-10 17:08:44 +08:00
6bd5378f66 [feature-wip](multi-catalog) lazy read for ParquetReader (#13917)
Read predicate columns firstly, and use VExprContext(push-down predicates)
to generate the select vector, which is then applied to read the non-predicate columns.
The data in non-predicate columns may be skipped by select vector, so the value-decode-time can be reduced.
If a whole page can be skipped, the decompress-time can also be reduced.
2022-11-10 16:56:14 +08:00
724cf1cdb8 [chore][build] add instructions to build version string (#14067) 2022-11-10 16:23:34 +08:00
9b5b411112 [fix](schemeChange) fe oom because replicas too many when schema change (#12850) 2022-11-10 16:17:25 +08:00
151a72d158 [feature](Nereids) support circle graph (#14082) 2022-11-10 15:54:21 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
a73f4dfdc1 [fix](memtracker) Fix scanner thread ending after fragment thread causing mem tracker null pointer #14143 2022-11-10 15:42:53 +08:00
4cde9c4765 [enhance](Nereids): add missing hypergraph rule. (#14087) 2022-11-10 15:23:31 +08:00
90bfd87660 [feature](function) add new function uuid() (#14092) 2022-11-10 14:55:41 +08:00
0dfdbe4508 [feature](Nereids): InnerJoinLeftAssociate, InnerJoinRightAssociate and JoinExchange. (#14051) 2022-11-10 12:21:06 +08:00
8c5c6d9d7f [fix](ctas) fix wrong string column length after executing ctas from external table (#14090) 2022-11-10 11:36:56 +08:00
17867e446f [feature](nereids) let user define right deep tree penalty by session variable (#14040)
it is hard for us to find a proper factor for all queries.
default is 0.7
2022-11-10 11:25:02 +08:00
57225d69f3 [Fix] add hll param for if function (#12366)
* [Fix] add hll param for if function

* add ut

Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>
2022-11-10 11:20:58 +08:00
84b969a25c [fix](grouping)the grouping expr should check col name from base table first, then alias (#14077)
* [fix](grouping)the grouping expr should check col name from base table first, then alias

* fix fe ut, the behavior would be same as mysql
2022-11-10 11:10:42 +08:00
994d563f52 [fix](nereids) cannot collect decimal column stats (#13961)
When execute analyze table, doris fails on decimal columns.
The root cause is the scale in decimalV2 is 9, but 2 in schema.
There is no need to check scale for decimalV2, since it is not a float point type.
2022-11-10 11:06:38 +08:00
184cee2d2b [Bug](outfile) Fix wrong decimal format for ORC (#14124) 2022-11-10 11:01:30 +08:00
43eb946543 [feature](table-valued-function)S3 table valued function supports parquet/orc/json file format #14130
S3 table valued function supports parquet/orc/json file format.
For example: parquet format
2022-11-10 10:33:12 +08:00
10df61b5bf [improvement](join) Share hash table in fragments for broadcast join (#13921) 2022-11-10 09:48:34 +08:00
df622d8b7d [Bug](udf) fix java-udaf process string type error and add some tests (#14106) 2022-11-10 09:30:57 +08:00
55cae6202f [typo](docs)add udf doc and optimize udf regression test (#14000) 2022-11-10 09:24:45 +08:00
3690c4dbe7 [fix](load) fix that load channel failed to be released in time (#14119) 2022-11-09 22:38:08 +08:00
Pxl
794a551b0f [Enhancement][fix](profile)() modify some profiles (#14074)
1. add RemainedDownPredicates
2. fix core dump when _scan_ranges is empty
3. fix invalid memory access on vLiteral's debug_string()
4. enlarge mv test wait time
2022-11-09 21:59:28 +08:00
322ac5cf89 [refractor](array) refractor DataTypeArray from_string (#13905)
refractor DataTypeArray from_string, make it more clear;
support ',' and ']' inside string element, for example: ['hello,,,', 'world][]']
support empty elements, such as [,] ==> [0,0]
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-09 16:58:08 +08:00
3117ac9289 [enhancement](Nereids) use post-order to generate runtime filter in RuntimeFilterGenerator (#13949)
change runtime filter generator from pre-order to post-order, it maybe change the quantity of generated runtime filters.
and the ut will be corrected.
2022-11-09 14:28:49 +08:00
b74d0a4747 [feature](table-valued-function) Support desc from s3() and modify the syntax of tvf (#14047)
This pr does two things:

Support desc function s3()
modify the syntax of tvf
2022-11-09 14:12:43 +08:00
f912d4e392 [fix](compile) fix compile error #14103
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-09 14:10:06 +08:00
e692636b4f [performance-wip] (vectorization) Opt HashJoin Performance (#12390) 2022-11-09 14:07:49 +08:00
84bb82acc0 [fix](Nereids) aggregate disassemble generate error output list on GLOBAL phase aggregate (#14079)
we must use localAggregateFunction as key of globalOutputSMap, because we use local output exprs to generate global output in disassembleDistinct
2022-11-09 13:43:12 +08:00