doris

Author	SHA1	Message	Date
Jerry Hu	c8418d13b5	[improvement](config)Use session variable to replace configuration for 'enable_function_pushdown' (#11641 )	2022-08-10 19:25:02 +08:00
carlvinhust2012	df47b6941d	[feature-wip](array-type) support the array type in reverse function (#11213 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-09 20:49:09 +08:00
Tiewei Fang	169996d8e4	[feature](information_schema) add `rowsets` table into information_s… (#11266 ) * [feature](information_schema) add 'segments' table into information_schema	2022-08-09 18:15:54 +08:00
Kang	f9b151744d	optimize topn query if order by columns is prefix of sort keys of table (#10694 ) * [feature](planner): push limit to olapscan when meet sort. * if olap_scan_node's sort_info is set, push sort_limit, read_orderby_key and read_orderby_key_reverse for olap scanner * There is a common query pattern to find latest time serials data. eg. SELECT * from t_log WHERE t>t1 AND t<t2 ORDER BY t DESC LIMIT 100 If the ORDER BY columns is the prefix of the sort key of table, it can be greatly optimized to read much fewer data instead of read all data between t1 and t2. By leveraging the same order of ORDER BY columns and sort key of table, just read the LIMIT N rows for each related segment and merge N rows. 1. set read_orderby_key to true for read_params and _reader_context if olap_scan_node's sort info is set. 2. set read_orderby_key_reverse to true for read_params and _reader_context if is_asc_order is false. 3. rowset reader force merge read segments if read_orderby_key is true. 4. block reader and tablet reader force merge read rowsets if read_orderby_key is true. 5. for ORDER BY DESC, read and compare in reverse order 5.1 segment iterator read backward using a new BackwardBitmapRangeIterator and reverse the result block before return to caller. 5.2 VCollectIterator::LevelIteratorComparator, VMergeIteratorContext return opposite result for _is_reverse order in its compare function. Co-authored-by: jackwener <jakevingoo@gmail.com>	2022-08-09 09:08:44 +08:00
weizuo93	f730a048b1	[feature-wip](load) Support single replica load (#10298 ) During load process, the same operation are performed on all replicas such as sort and aggregation, which are resource-intensive. Concurrent data load would consume much CPU and memory resources. It's better to perform write process (writing data into MemTable and then data flush) on single replica and synchronize data files to other replicas before transaction finished.	2022-08-02 11:44:18 +08:00
luozenglin	1cf57a985d	[fix] Fix the query result error caused by the grouping sets statemen… (#11316 ) * [fix] Fix the query result error caused by the grouping sets statement grouping as an expression	2022-08-01 13:52:18 +08:00
Jerry Hu	0325fa436e	[fix](agg)Add field of 'is_first_phase' in TAggregationNode (#11321 )	2022-08-01 11:49:50 +08:00
plat1ko	a6537a90cd	[Enhancement] Garbage collection of unused data on remote storage backend (#10731 ) * [Feature](cold_on_s3) support unused remote rowset gc * return aborted when skip drop tablet * perform unused remote rowset gc	2022-07-29 14:38:39 +08:00
morrySnow	8edbe39de8	[project-node]add projection thrift (#11309 )	2022-07-29 14:15:06 +08:00
HappenLee	0b1d06bfd6	[Vectorized] Support order by aggregate function (#11187 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-07-28 09:12:58 +08:00
Gabriel	72d2feae99	[feature-wip] Support all date functions for datev2/datetimev2 (#11265 ) * [feature-wip] (datetimev2) support convert_tz function * [feature-wip] Support all date functions for datev2/datetimev2	2022-07-28 08:18:59 +08:00
xy720	5913c7c52c	[feature-wip](array-type) add function array_slice (#11054 ) array_slice function returns a slice of the array.	2022-07-27 18:43:52 +08:00
Stalary	4f3b4c7efc	[Improvement] information_schema.columns support COLUMN KEY (#11228 )	2022-07-27 12:22:17 +08:00
starocean999	f46a801b1b	[FIX]string pad function should be always nullable for both string and varchar type (#11196 )	2022-07-26 17:55:06 +08:00
Gabriel	823088a9eb	[FOLLOW-UP] (datetimev2) complete date function ut and built-in function declaration (#11154 )	2022-07-26 17:48:57 +08:00
starocean999	3e3b2d15d4	[bug]string pad functions should always be nullable (#11140 ) * string pad functions should always be nullable	2022-07-26 10:20:11 +08:00
slothever	84ce7eddf6	[feature-wip](parquet-reader) add thrift file for new parquet reader (#11150 )	2022-07-25 10:11:15 +08:00
Gabriel	babab5d535	[feature-wip] support datetimev2 (#11085 )	2022-07-23 16:07:59 +08:00
HappenLee	fdb4193e1b	[Vectorized][Refactor] Refactor the function of `tuple_is_null`, only do work in hash join node (#11109 )	2022-07-23 11:50:07 +08:00
xy720	3744321f01	[feature-wip](array-type) add function array_union/array_except/array_intersect (#10781 ) Add array_union/array_except/array_intersect function.	2022-07-22 13:50:13 +08:00
Mingyu Chen	7e3fc0d321	[enhancement](vec) Support outer join for vectorized exec engine (#11068 ) Hash join node adds three new attributes. The following will take an SQL as an example to illustrate the meaning of these three attributes ``` select t1. a from t1 left join t2 on t1. a=t2. b; ``` 1. vOutputTupleDesc：Tuple2(a'') 2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>) 2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')> The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap. This code mainly merges the contents of two PRs: 1. [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323) 2. [Fix](Join) Fix the bug of outer join function under vectorization #9954 The following is the specific description of the first PR In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple. The following is the specific description of the second PR This pr mainly fixes the following problems: 1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function. 2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage. For example： ``` select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1 ``` At this time, the nullable property of column k1 in the `tmp` inline view should be true. In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the correctness of the column nullable property of this tableRef is very important. In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable. In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness. That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result. The vectorized nullable attribute requirements are very strict. Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.) Co-authored-by: EmmyMiao87 Co-authored-by: HappenLee Co-authored-by: morrySnow Co-authored-by: EmmyMiao87 <522274284@qq.com>	2022-07-21 23:39:25 +08:00
huangzhaowei	7147a7c290	[feature-wip](multi-catalog) Support s3 storage for file scan node (#10977 ) This is an example of s3 hms_catalog: ```sql CREATE CATALOG hms_catalog properties( "type" = "hms", "hive.metastore.uris"="thrift://localhost:9083", "AWS_ACCESS_KEY" = "your access key", "AWS_SECRET_KEY"="your secret key", "AWS_ENDPOINT"="s3 endpoint", "AWS_REGION"="s3-region", "fs.s3a.paging.maximum"="1000"); ``` All these params are necessary;	2022-07-21 17:38:53 +08:00
Lightman	a71822a74d	[refactor]remove col_unique_id (#11025 )	2022-07-20 16:35:14 +08:00
Ashin Gau	60dd322aba	[feature-wip](multi-catalog) Optimize threads and thrift interface of FileScanNode (#10942 ) FileScanNode in be will launch as many threads as the number of splits. The thrift interface of FileScanNode is excessive redundant.	2022-07-18 20:50:34 +08:00
chenlinzhong	6736e06679	[feature](udf) Vectorization support remote udaf #10683 (#10685 )	2022-07-18 17:15:34 +08:00
carlvinhust2012	2d5aca18fb	[feature-wip](array) add the array_sort function (#10598 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-07-18 10:52:42 +08:00
zxealous	5793cb11d0	[feature-wip] (array-type) function concat_ws support array (#10749 ) Issue #10052 function concat_ws support array	2022-07-17 17:50:39 +08:00
plat1ko	3bc6655069	[refactor] remove BlockManager (#10913 ) * remove BlockManager * remove deprecated field in tablet meta	2022-07-17 14:10:06 +08:00
xy720	cc84cfcc0e	[feature-wip](array-type) add function array_remove (#10385 ) Description: array_remove function remove all elements in array which is equal to the target.	2022-07-15 17:57:49 +08:00
zhannngchen	13e9cb146f	[feature-wip](unique-key-merge-on-write) Add option to enable unique-key-merge-on-write, DSIP-018[5/1] (#10814 ) * Add option in FE * add opt in be * some fix * update * fix code style * fix typo * fix typo * update * code format	2022-07-14 12:10:58 +08:00
zhangstar333	e361eb385e	[vectorized][udf] improvement java-udaf with group by clause (#10296 ) save for file about udaf add bool _destory_deserialize update some code according reviewer change destroy all data at once	2022-07-14 11:23:42 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
Xin Liao	56b55563c6	[feature-wip](unique-key-merge-on-write) add bloom filter index for primary key, DSIP-018[1.2] (#10706 )	2022-07-13 18:58:45 +08:00
Lightman	486cf0ebd4	[Feature] Lightweight schema change of add/drop column (#10136 ) * [Schema Change] support fast add/drop column (#49) * [feature](schema-change) support fast schema change. coauthor: yixiutt * [schema change] Using columns desc from fe to read data. coauthor: Lchangliang * [feature](schema change) schema change optimize for add/drop columns. 1.add uniqueId field for class column. 2.schema change for add/drop columns directly update schema meta Co-authored-by: yixiutt <yixiu@selectdb.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com> [Feature](schema change) fix write and add regression test (#69) Co-authored-by: yixiutt <yixiu@selectdb.com> [schema change] be ssupport that delete use newest schema add delete regression test fix regression case (#107) tmp [feature](schema change) light schema change exclude rollup and agg/uniq/dup key type. [feature](schema change) fe olapTable maxUniqueId write in disk. [feature](schema change) add rpc iface for sc add column. [feature](schema change) add columnsDesc to TPushReq for ligtht sc. resolve the deadlock when schema change (#124) fix columns from fe don't has bitmap_index flag (#134) add update/delete case construct MATERIALIZED schema from origin schema when insert fix not vectorized compaction coredump use segment cache choose newest schema by schema version when compaction (#182) [bugfix](schema change) fix ligth schema change problem. [feature](schema change) light schema change add alter job. (#1) fix be ut [bug] (schema change) unique drop key column should not light schema change [feature](schema change) add schema change regression-test. fix regression test [bugfix](schema change) fix multi alter clauses for light schema change. (#2) [bugfix](schema change) fix multi clauses calculate column unique id (#3) modify PushTask process (#217) [Bugfix](schema change) fix jobId replay cause bdbje exception. [bug](schema change) fix max col unique id repeatitive. (#232) [optimize](schema change) modify pendingMaxColUniqueId generate rule. fix compaction error * fix be ut * fix snapshot load core fix unique_id error (#278) [refact](fe) remove redundant code for light schema change. (#4) [refact](fe) remove redundant code for light schema change. (#4) format fe core format be core fix be ut modify fe meta version fix rebase error flush schema into rowset_meta in old table [refactor](schema change) refact fe light schema change. (#5) delete the change of schemahash and support get max version schema * modify for review * fix be ut * fix schema change test	2022-07-12 19:41:06 +08:00
carlvinhust2012	eb079950cb	[feature-wip] (array-type) add the array_distinct function (#10388 ) * add the array_distinct function * add the support for decimal and update variable names * add docs and regression test for array_distinct function Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-07-12 17:02:42 +08:00
Compilation Success	2084d8bdf3	[feature-wip](unique-key-merge-on-write) Add delete bitmap for DSIP-018 (#10548 ) Add delete bitmap for DSIP-018: Support Merge-On-Write implementation for UNIQUE KEY data model	2022-07-12 16:34:42 +08:00
Gabriel	c51badb1ae	[feature-wip](datev2) add FE functions and fix some bugs (#10767 )	2022-07-11 19:25:31 +08:00
zhannngchen	feeef7e4da	[feature-wip](unique-key-merge-on-write) add interface for segment key bounds, DSIP-018[3/2] (#10655 ) Add interfaces for segment key bounds, key bounds will be used to speed up point lookup on the primary key index of each segment. For the detail, see DSIP-018:https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model KeyBounds will be updated by BetaRowsetWriter, will be used to construct a RowsetTree(based on IntervalTree, will be added through next patch)	2022-07-08 21:39:13 +08:00
Pxl	f58a071605	[Bug][Function] pass intermediate argument list to be (#10650 )	2022-07-08 20:50:05 +08:00
plat1ko	331fa50501	[feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280 ) This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet, and there is no necessary to prohibit loading new data to cooled tablets. Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without perceiving the underlying filesystem. The abstracted `RemoteFileSystem` can try local caching strategies with different granularity, instead of caching segment files as before. To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory. In the future, `FileReader`s and `FileWriter`s should be unified.	2022-07-08 12:18:39 +08:00
jakevin	3ce9e7cfca	[enhance](planner): remove redundant field in sort (#10624 ) SortInfo is in SortNode. But there are some replicated field in SortNode Issue Number: close #10616 Remove the redundant field in `TSortNode` which exist in `TSortInfo`. [API-BREAK] This has changed `Thrift` file.	2022-07-07 22:32:07 +08:00
Pxl	6d092a6d53	set strleft to always_nullable (#10496 )	2022-07-06 17:56:01 +08:00
Jibing-Li	73ba806046	[feature-wip](multi-catalog) Add catalog to information_schema table "columns". (#10592 )	2022-07-05 09:57:19 +08:00
yiguolei	aab7dc956f	[refactor](load) Remove mini load (#10520 )	2022-06-30 23:21:41 +08:00
camby	ec6620ae3e	[feature-wip](array-type) add function arrays_overlap (#10233 )	2022-06-30 08:12:29 +08:00
yiguolei	4ec6e3ee81	[refactor] Remove debug action since it is never used. (#10484 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-06-29 20:37:51 +08:00
huangzhaowei	abd10f0f3e	[feature-wip](multi-catalog) Impl FileScanNode in be (#10402 ) Define a new file scanner node for hms table in be. This file scanner node is different from broker scan node as blow: 1. Broker scan node will define src slot and dest slot, there is two memory copy in it: first is from file to src slot and second from src to dest slot. Otherwise FileScanNode only have one stemp memory copy just from file to dest slot. 2. Broker scan node will read all the filed in the file to src slot and FileScanNode only read the need filed. 3. Broker scan node will convert type into string type for src slot and then use cast to convert to dest slot type, but FileScanNode will have the final type. Now FileScanNode is a standalone code, but we will uniform the file scan and broker scan in the feature.	2022-06-29 11:04:01 +08:00
Tiewei Fang	17eb8c00d3	[feature] add table valued function framework and numbers table valued function (#10214 )	2022-06-28 14:01:57 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
Gabriel	eebfbd0c91	Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323 )" (#10424 ) This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.	2022-06-25 22:18:08 +08:00

1 2 3 4 5 ...

480 Commits