Commit Graph

108 Commits

Author SHA1 Message Date
471db80f69 [Bug](date) Fix invalid date (#16205)
Issue Number: close #15777
2023-01-31 10:08:44 +08:00
f72efd4523 [Improvement](decimal) do not log fatal when precision is invalid (#16207) 2023-01-30 09:54:22 +08:00
Pxl
46347a51d2 [Bug](exec) enable warning on ignoring function return value for vctx (#16157)
* enable warning on ignoring function return value for vctx
2023-01-29 17:23:21 +08:00
e49766483e [refactor](remove unused code) remove many xxxVal structure (#16143)
remove many xxxVal structure
remove BetaRowsetWriter::_add_row
remove anyval_util.cpp
remove non-vectorized geo functions
remove non-vectorized like predicate
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-28 14:17:43 +08:00
79ad74637d [refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136) 2023-01-24 10:45:35 +08:00
199d7d3be8 [Refactor]Merged string_value into string_ref (#15925) 2023-01-22 16:39:23 +08:00
3894de49d2 [Enhancement](topn) support two phase read for topn query (#15642)
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.

TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.

After the second phase read, Block will contain all the data needed for the query
2023-01-19 10:01:33 +08:00
d5a3e8df3a [Exec](opt) Opt the vexplode_split function performance (#15945) 2023-01-17 19:02:57 +08:00
d062ca2944 [refactor](vectorized) remove unnecessary vectorization check (#15984) 2023-01-17 12:21:46 +08:00
Pxl
81bab55d43 [Bug](function) catch function calculation error on aggregate node to avoid core dump (#15903) 2023-01-16 11:21:28 +08:00
0fbdf8e3e1 [Refactor](table function) Decouple vectorized table functions from non-vectorized ones (#15772) 2023-01-12 15:08:21 +08:00
8f31a36429 [feature] support spill to disk for sort node (#15624) 2023-01-11 08:40:58 +08:00
edecc2e706 [feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211)
* [feature-wip](inverted index)inverted index api: reader

* [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL

* [feature-wip](inverted index) Adapt to index meta

* [enhance] add more metrics

* [enhance] add fulltext match query check for column type and index parser

* [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node
2022-12-30 21:48:14 +08:00
520b6d7910 [Improvement](decimalv3) Add a config to check overflow for DECIMALV3 (#15463) 2022-12-30 14:02:24 +08:00
305dd15fea [improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035)
Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data.

Based on that situation, this pr will do:

1. this pr in order to support bitmap index apply compound predicates, query sql like:
select * from tb where a > 'hello' or b < 100;
select * from tb where a > 'hello' or b < 100 or c > 'ok';
select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200);
select * from tb where (not a> 'hello') or b < 100;
...
above sql,column a and b and c has created bitmap_index.

2. this optimization can reduce reading data by index
3. set config enable_index_apply_compound_predicates to use this optimization
2022-12-28 20:08:57 +08:00
fe562bc3e7 [Bug](Agg) fix crash when encountering not supported agg function like last_value(bitmap) (#15257)
The former logic inside aggregate_function_window.cpp would shutdown BE once encountering agg function with complex type like BITMAP. This pr makes it don't crash and would return one more concrete error message which tells the unsupported function signature to user.
2022-12-23 14:23:21 +08:00
b2438b076d [Bug](case function) do not crash if prepare failed (#15113) 2022-12-16 11:02:05 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
fcea89bcf4 [fix](const_expr) fix coredump caused by unsupported cast const expr (#14825) 2022-12-06 10:31:15 +08:00
8c0e13ab51 [improvement](profile) add detail memory counter for exec nodes (#14806)
* [improvement](profile) improve accuraccy of memory usage and add detail memory counter

* fix
2022-12-05 11:51:52 +08:00
3e8b3658c7 [feature-wip](decimalv3) Support basic agg and arithmetic operations for decimal v3 (#14513) 2022-11-29 15:12:41 +08:00
52c6ba051e [feature](jsonb type)refactor JSONB type using column and add testcase (#13778)
1. Refactor JSONB type using ColumnString instead making a copy.
2. Add regression testcase for JSONB load and functions.
2022-11-26 10:06:15 +08:00
4728e75079 [feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340)
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
2022-11-25 15:22:44 +08:00
b04ec41c1d [Vectorized](udaf) fix java-udaf couldn't get jar core dump (#14393)
fix java-udaf couldn't get jar core dump
2022-11-22 20:49:02 +08:00
Pxl
c18a471303 [Optimize](predicate) update inplace on VcompoundPred (#14402)
select count(*) from lineorder where lo_orderkey<100000000 OR lo_orderkey>100000000 AND lo_orderkey<200000000 OR lo_orderkey >200000000;

0.6s -> 0.5s
2022-11-21 09:12:30 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
12652ebb0e [UDF](java udf) using config to enable java udf instead of macro at compile time (#14062)
* [UDF](java udf) useing config to enable java udf instead of macro at compile time
2022-11-11 09:03:52 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
Pxl
794a551b0f [Enhancement][fix](profile)() modify some profiles (#14074)
1. add RemainedDownPredicates
2. fix core dump when _scan_ranges is empty
3. fix invalid memory access on vLiteral's debug_string()
4. enlarge mv test wait time
2022-11-09 21:59:28 +08:00
5a700223fe [fix](function) fix coredump cause by return type mismatch of vectorized repeat function (#13868)
Will not support repeat function during upgrade in vectorized engine.
2022-11-03 09:53:02 +08:00
7b4c2cabb4 [feature](new-scan) support transactional insert in new scan framework (#13858)
Support running transactional insert operation with new scan framework. eg:

admin set frontend config("enable_new_load_scan_node" = "true");
begin;
insert into tbl1 values(1,2);
insert into tbl1 values(3,4);
insert into tbl1 values(5,6);
commit;
Add some limitation to transactional insert

Do not support non-literal value in insert stmt
Fix some issue about array type:

Forbid cast other non-array type to NESTED array type, it may cause BE crash.
Add getStringValueForArray() method for Expr, to get valid string-formatted array type value.
Add useLocalSessionState=true in regression-test jdbc url
without this config, the jdbc driver will send some init cmd each time it connect to server, such as
select @@session.tx_read_only.
But when we use transactional insert, after begin command, Doris do not support any other type of
stmt except for insert, commit or rollback.
So adding this config to let the jdbc NOT send cmd when connecting.
2022-11-03 08:36:07 +08:00
c14277e587 [fix](analytic) fix coredump cause by empty analytic parameter types (#13808)
* fix fe compile error
2022-11-01 17:25:36 +08:00
Pxl
2fab0c45c7 [Feature](runtime-filter) add runtime filter breaking change adapt (#13246)
add runtime filter breaking change adapt
2022-10-28 10:59:28 +08:00
36053d2419 [fix](array-type) fix the be core dump when select the invalid array format (#13514)
1. this pr is used to fix the be core dump when select the invalid array.
2. before the change, we run "select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');" will cause be core dump.
MySQL [example_db]> select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');
ERROR 1105 (HY000): RpcException, msg: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
3. after the change, we run "select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');" will get error message.
MySQL [example_db]> select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');
errCode = 2, detailMessage = No matching function with signature: array_intersect(array<tinyint(4)>, varchar(-1))"
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-10-27 23:11:12 +08:00
6ea9a65bb6 [Opt](vec) opt runtime filter for TPCH Q22 (#13339) 2022-10-17 10:30:07 +08:00
8218cfed40 [Bug](function) Fix constant predicate evaluation (#13346) 2022-10-15 01:05:29 +08:00
4e4f8afa28 [fix](array-type) fix get_data_at for zero element array #13225
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-11 15:41:34 +08:00
Pxl
245490d6b7 [Enhancement](runtime filter) optimize for runtime filter (#12856)
optimize for runtime filter
2022-10-09 14:11:03 +08:00
d80b7b9689 [feature-wip](new-scan) support more load situation (#12953) 2022-09-27 21:48:32 +08:00
59699a4321 [feature](JSON datatype)Support JSON datatype (#10322)
Add `JSON` datatype, following features are implemented by this PR:
1. `CREATE` tables with `JSON` type columns
2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE 
3. `SELECT` JSON columns

Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type)

* add JSONB data storage format type

* fix JsonLiteral resolve bug

* add DataTypeJson case in data_type_factory

* add JSON syntax check in FE

* add operators for jsonb_document, currently not support comparison between any JSON type value

* add ColumnJson and DataTypeJson

* add JsonField to store JsonValue

* add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral

* add push_json for MysqlResultWriter

* JSON column need no zone_map_index

* Revert "JSON column need no zone_map_index"

This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79.

* add JSON writer and reader, ignore zone-map for JSON column

* add json_to_string for DataTypeJson

* add olap_data_convertor for JSON type

* add some enum

* add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions

* fix column_json offsets overflow bug, format code

* remove useless TODOs, add CmpType cases for JSON type

* add license header

* format license

* format be codes

* resolve rebase master conflicts

* fix bugs for CREATE and meta related code

* refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes

* modification be codes along code review advice

* fix rebase conflicts with master

* add unit test for json_value and column_json

* fix rebase error

* rename json to jsonb

* fix some data convert bugs, set Mysql type to JSON
2022-09-25 14:06:49 +08:00
ec2b3bf220 [feature-wip](new-scan)Refactor VFileScanner, support broker load, remove unused functions in VScanner base class. (#12793)
Refactor of scanners. Support broker load.
This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner.
Work still in progress.
2022-09-21 12:49:56 +08:00
c72a19f410 [BugFix](VExprContext) capture error status to prevent incorrect func call which causes coredump #12779 2022-09-21 09:20:16 +08:00
54d1630c42 [Opt](vectorized) speed up hash function compute in hash partition (#12334)
After do the opt of hash function, the compute of siphash in HASH_PARTITION in vdata_stream_sender

Before: 1s800ms
After: 800ms
2022-09-07 10:11:40 +08:00
9a74ad1702 [feature](Nereids)add the ability of projection on each ExecNode and add column prune on OlapScan (#11842)
We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE.
This PR:
- add projections on each ExecNode in BE
- translate PhysicalProject into projections on PlanNode in FE
- do column prune on ScanNode in FE

Co-authored-by: HappenLee <happenlee@hotmail.com>
2022-08-30 16:17:10 +08:00
a16cf0e2c8 [feature-wip](scan) add profile for new olap scan node (#12042)
Copy most of profiles from VOlapScanNode and VOlapScanner to NewOlapScanNode and NewOlapScanner.
Fix some blocking bug of new scan framework.
TODO:

Memtracker
Opentelemetry spen
The new framework is still disabled by default, so it will not effect other feature.
2022-08-30 10:55:48 +08:00
62e3bd338e [refactor](BE) return error status when vslot_ref contains invalid slot_id (#12106)
In current implementation, we detect invalid slot at execute phase. At execute phase, it is hard to get useful information for further debug. This pr moves error detection ahead to prepare phase, so that we can log related tuple descriptors.
2022-08-29 12:07:08 +08:00
17b809210a [Bug](runtime filter) fix bug for late-arrival runtime filters (#12049) 2022-08-26 09:13:10 +08:00
d06edd4b8b [minor](runtime-filter) add DCHECK for runtimefilter bug (#11996)
Not a fix, just add debug info to try find root cause of #11995
2022-08-24 07:53:30 +08:00
dc8f64b3e3 [improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801) 2022-08-22 10:12:06 +08:00