doris

Author	SHA1	Message	Date
Pxl	64988cb3d4	[Enhancement](optimize) optimize for insert_indices_from (#12807 )	2022-09-27 15:49:15 +08:00
starocean999	c4341d3d43	[fix](like)prevent null pointer by unimplemented like_vec functions (#12910 ) * [fix](like)prevent null pointer by unimplemented like_vec functions * fix pushed like predicate on dict encoded column bug	2022-09-27 10:02:10 +08:00
Shane	35076431ab	[fix](column)fix get_shrinked_column misspell (#12961 ) Fix misspell	2022-09-26 17:32:03 +08:00
Gabriel	f879a51ce9	[Improvement](dict) optimize dictionary column (#12852 )	2022-09-25 18:29:10 +08:00
Gabriel	d8e8bc0e69	[Improvement](predicate) Replace for-loop by memcpy (#12867 )	2022-09-25 18:27:59 +08:00
Shane	59699a4321	[feature](JSON datatype)Support JSON datatype (#10322 ) Add `JSON` datatype, following features are implemented by this PR: 1. `CREATE` tables with `JSON` type columns 2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE 3. `SELECT` JSON columns Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type) * add JSONB data storage format type * fix JsonLiteral resolve bug * add DataTypeJson case in data_type_factory * add JSON syntax check in FE * add operators for jsonb_document, currently not support comparison between any JSON type value * add ColumnJson and DataTypeJson * add JsonField to store JsonValue * add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral * add push_json for MysqlResultWriter * JSON column need no zone_map_index * Revert "JSON column need no zone_map_index" This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79. * add JSON writer and reader, ignore zone-map for JSON column * add json_to_string for DataTypeJson * add olap_data_convertor for JSON type * add some enum * add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions * fix column_json offsets overflow bug, format code * remove useless TODOs, add CmpType cases for JSON type * add license header * format license * format be codes * resolve rebase master conflicts * fix bugs for CREATE and meta related code * refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes * modification be codes along code review advice * fix rebase conflicts with master * add unit test for json_value and column_json * fix rebase error * rename json to jsonb * fix some data convert bugs, set Mysql type to JSON	2022-09-25 14:06:49 +08:00
yiguolei	32551a7263	[bugfix](predicate column) data maybe wrong if not a single page (#12796 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-09-22 09:55:31 +08:00
Gabriel	3cfaae0031	[Improvement](sort) Use heap sort to optimize sort node (#12700 )	2022-09-21 10:01:52 +08:00
yiguolei	415721ef20	[enhancement](pred column) improve predicate column insert performance (#12690 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-09-19 10:53:48 +08:00
HappenLee	35b97a5af0	[Opt](hash) Speed up insert from dict data map and not datetime (#12670 ) Speed up dict data read and not datetime. same target #12636	2022-09-17 17:02:43 +08:00
Pxl	d44ec74988	[Enhancement](column) optimize for ColumnString::insert_many_dict_data (#12636 ) optimize for ColumnString::insert_many_dict_data	2022-09-16 10:23:04 +08:00
HappenLee	e413a2b8e9	[Opt](vectorized) Use new way to do hash shffle to speed up query (#12586 )	2022-09-15 11:08:04 +08:00
Pxl	0ead048b93	[Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456 ) 1. remove ColumnString terminating zero 2. add a data_version for pblock 3. change EncryptionMode to enum class	2022-09-14 21:25:22 +08:00
Yongqiang YANG	5dcf933012	[Bug](column) ColumnNullable::replace_column_data should DCHECK size > sel… #12558	2022-09-14 08:42:15 +08:00
camby	56b2fc43d4	[enhancement](array-type) shrink column suffix zero for type ARRAY<CHAR> (#12443 ) In compute level, CHAR type will shrink suffix zeros. To keep the logic the same as CHAR type, we also shrink for ARRAY or ARRAY<ARRAY> types. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-13 23:24:48 +08:00
HappenLee	d913ca5731	[Opt](vectorized) Speed up bucket shuffle join hash compute (#12407 ) * [Opt](vectorized) Speed up bucket shuffle join hash compute	2022-09-13 20:19:22 +08:00
Gabriel	66491ec137	[Improvement](sort) improve partial sort algorithm (#12349 ) * [Improvement](sort) improve partial sort algorithm	2022-09-09 15:44:18 +08:00
camby	26cf2d3742	[enhancement](array-type) avoid abuse of Offset and Offset64 #12378 We already separate Array Offset64 and String Offset(32bit) in PR: #12341 Now we limit: Offset inside IColumn, Offset64 only inside ColumnArray, to avoid abuse of them. If we use the wrong one, it will compile failed.	2022-09-08 14:53:07 +08:00
HappenLee	54d1630c42	[Opt](vectorized) speed up hash function compute in hash partition (#12334 ) After do the opt of hash function, the compute of siphash in HASH_PARTITION in vdata_stream_sender Before: 1s800ms After: 800ms	2022-09-07 10:11:40 +08:00
Gabriel	922b04fdc1	[Improvement](vectorized) change `static_cast` to `assert_cast` for reference (#12379 ) * [Improvement](vectorized) change `static_cast` to `assert_cast` for reference	2022-09-07 09:27:13 +08:00
camby	cf5d194fe1	[enhancement](array-type) Split Array Offsets and String Offsets (#12341 ) In old Doris version string offsets are 32bit, but it is not enough for Array type. If we change string offsets from 32bit to 64bit, there will be problem if we upgrade BE one by one. Because at the same time 32bit Offsets and 64 bit Offsets String will exist at the same time. As a result, we separate the Codes for Array Offsets. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-06 11:18:27 +08:00
Gabriel	90fb3b7783	[Improvement](load) accelerate tablet sink (#12174 )	2022-09-01 10:08:09 +08:00
carlvinhust2012	fba2658a1d	[fix](array-type) fix the be core dump when use collect_list result to insert (#12045 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-26 18:00:43 +08:00
Pxl	620d33a763	[Enchancement](optimize) set result_size_hint to filter_block (#11972 )	2022-08-25 11:42:52 +08:00
TengJianPing	55fdb555be	[bugfix](dict) fix coredump of dict colum range predicate when there is null value (#11967 )	2022-08-23 16:07:48 +08:00
Jerry Hu	dc8f64b3e3	[improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801 )	2022-08-22 10:12:06 +08:00
Gabriel	7d97aa194b	[feature-wip](datev2) Support to use datev2 as partition column (#11618 )	2022-08-12 11:54:01 +08:00
wangbo	4c8cc7f03e	[fix](storage)fix column dict incorrect result (#11694 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-08-12 11:05:57 +08:00
Gabriel	2068bf2dea	[Refactor](predicate) Use primitive type as template argument for predicate (#11647 )	2022-08-11 12:06:44 +08:00
Pxl	ec3c911f97	[Feature][Materialized-View] support materialized view on vectorized engine (#10792 )	2022-08-04 14:07:48 +08:00
lihangyu	667689e9ba	[Fix](array) fix array permute (#11389 )	2022-08-01 22:46:03 +08:00
TengJianPing	2ed46eee64	[bugfix] fix coredump caused by nullable const column compare to non-nullable const column (#11227 )	2022-07-27 12:00:26 +08:00
Pxl	6e98ebba27	[Vectorized] Support sort combinator (#10469 )	2022-07-23 17:58:31 +08:00
Gabriel	babab5d535	[feature-wip] support datetimev2 (#11085 )	2022-07-23 16:07:59 +08:00
HappenLee	fdb4193e1b	[Vectorized][Refactor] Refactor the function of `tuple_is_null`, only do work in hash join node (#11109 )	2022-07-23 11:50:07 +08:00
Jerry Hu	b7c9007776	[improvement][agg]Process aggregated results in the vectorized way (#11084 )	2022-07-22 22:04:43 +08:00
Mingyu Chen	7e3fc0d321	[enhancement](vec) Support outer join for vectorized exec engine (#11068 ) Hash join node adds three new attributes. The following will take an SQL as an example to illustrate the meaning of these three attributes ``` select t1. a from t1 left join t2 on t1. a=t2. b; ``` 1. vOutputTupleDesc：Tuple2(a'') 2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>) 2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')> The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap. This code mainly merges the contents of two PRs: 1. [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323) 2. [Fix](Join) Fix the bug of outer join function under vectorization #9954 The following is the specific description of the first PR In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple. The following is the specific description of the second PR This pr mainly fixes the following problems: 1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function. 2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage. For example： ``` select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1 ``` At this time, the nullable property of column k1 in the `tmp` inline view should be true. In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the correctness of the column nullable property of this tableRef is very important. In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable. In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness. That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result. The vectorized nullable attribute requirements are very strict. Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.) Co-authored-by: EmmyMiao87 Co-authored-by: HappenLee Co-authored-by: morrySnow Co-authored-by: EmmyMiao87 <522274284@qq.com>	2022-07-21 23:39:25 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Jet He	f6cb7a838b	[Optimize] Improve performance like/not like filter through pushdown function to storage engine (#10355 ) * support like/not like conjuncts push down to storage engine * vectorized engine support like/not like conjuncts push down to storage engine * support both evaluate and evaluate_vec method in like predicate * reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts * change #ifndef to pragma once as per comments * change enable_function_pushdown default to false Co-authored-by: heguangnan <heguangnan@bytedance.com>	2022-07-19 08:33:04 +08:00
Pxl	afc1d0c05c	[Chore][Compile] fix compile fail on clang (#10837 ) fix compile fail on clang because of output int128	2022-07-18 19:21:01 +08:00
plat1ko	3bc6655069	[refactor] remove BlockManager (#10913 ) * remove BlockManager * remove deprecated field in tablet meta	2022-07-17 14:10:06 +08:00
Jerry Hu	d245ab76cc	[improvement]Use uint32 instead of size_t to reduce agg key's length (#10832 )	2022-07-14 14:11:55 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
Jerry Hu	277a7dd97e	[bugfix]ColumnDecimal missed some interfaces about pre-serialization (#10751 )	2022-07-11 14:00:58 +08:00
Jerry Hu	e293fbd277	[improvement]pre-serialize aggregation keys (#10700 )	2022-07-09 06:21:56 +08:00
Pxl	0b251481d5	[Enhancement][Storage] refactor Comparison Predicates (#10380 )	2022-07-04 09:22:27 +08:00
Pxl	a9d23ce337	[refactor] remove collator (#10518 )	2022-07-01 10:35:32 +08:00
Jerry Hu	18ad8ebfbb	[improvement]Add reading by rowids to speed up lazy materialization (#10506 )	2022-06-30 21:03:41 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
Gabriel	eebfbd0c91	Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323 )" (#10424 ) This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.	2022-06-25 22:18:08 +08:00

1 2 3

103 Commits