doris

Author	SHA1	Message	Date
jakevin	0dfdbe4508	[feature](Nereids): InnerJoinLeftAssociate, InnerJoinRightAssociate and JoinExchange. (#14051 )	2022-11-10 12:21:06 +08:00
Mingyu Chen	8c5c6d9d7f	[fix](ctas) fix wrong string column length after executing ctas from external table (#14090 )	2022-11-10 11:36:56 +08:00
minghong	17867e446f	[feature](nereids) let user define right deep tree penalty by session variable (#14040 ) it is hard for us to find a proper factor for all queries. default is 0.7	2022-11-10 11:25:02 +08:00
shee	57225d69f3	[Fix] add hll param for if function (#12366 ) * [Fix] add hll param for if function * add ut Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>	2022-11-10 11:20:58 +08:00
starocean999	84b969a25c	[fix](grouping)the grouping expr should check col name from base table first, then alias (#14077 ) * [fix](grouping)the grouping expr should check col name from base table first, then alias * fix fe ut, the behavior would be same as mysql	2022-11-10 11:10:42 +08:00
minghong	994d563f52	[fix](nereids) cannot collect decimal column stats (#13961 ) When execute analyze table, doris fails on decimal columns. The root cause is the scale in decimalV2 is 9, but 2 in schema. There is no need to check scale for decimalV2, since it is not a float point type.	2022-11-10 11:06:38 +08:00
Gabriel	184cee2d2b	[Bug](outfile) Fix wrong decimal format for ORC (#14124 )	2022-11-10 11:01:30 +08:00
Tiewei Fang	43eb946543	[feature](table-valued-function)S3 table valued function supports parquet/orc/json file format #14130 S3 table valued function supports parquet/orc/json file format. For example: parquet format	2022-11-10 10:33:12 +08:00
Jerry Hu	10df61b5bf	[improvement](join) Share hash table in fragments for broadcast join (#13921 )	2022-11-10 09:48:34 +08:00
zhangstar333	df622d8b7d	[Bug](udf) fix java-udaf process string type error and add some tests (#14106 )	2022-11-10 09:30:57 +08:00
Liqf	55cae6202f	[typo](docs)add udf doc and optimize udf regression test (#14000 )	2022-11-10 09:24:45 +08:00
Xin Liao	3690c4dbe7	[fix](load) fix that load channel failed to be released in time (#14119 )	2022-11-09 22:38:08 +08:00
Pxl	794a551b0f	[Enhancement][fix](profile)() modify some profiles (#14074 ) 1. add RemainedDownPredicates 2. fix core dump when _scan_ranges is empty 3. fix invalid memory access on vLiteral's debug_string() 4. enlarge mv test wait time	2022-11-09 21:59:28 +08:00
camby	322ac5cf89	[refractor](array) refractor DataTypeArray from_string (#13905 ) refractor DataTypeArray from_string, make it more clear; support ',' and ']' inside string element, for example: ['hello,,,', 'world][]'] support empty elements, such as [,] ==> [0,0] Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-09 16:58:08 +08:00
mch_ucchi	3117ac9289	[enhancement](Nereids) use post-order to generate runtime filter in RuntimeFilterGenerator (#13949 ) change runtime filter generator from pre-order to post-order, it maybe change the quantity of generated runtime filters. and the ut will be corrected.	2022-11-09 14:28:49 +08:00
Tiewei Fang	b74d0a4747	[feature](table-valued-function) Support `desc from s3()` and modify the syntax of tvf (#14047 ) This pr does two things: Support desc function s3() modify the syntax of tvf	2022-11-09 14:12:43 +08:00
camby	f912d4e392	[fix](compile) fix compile error #14103 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-09 14:10:06 +08:00
WenYao	e692636b4f	[performance-wip] (vectorization) Opt HashJoin Performance (#12390 )	2022-11-09 14:07:49 +08:00
morrySnow	84bb82acc0	[fix](Nereids) aggregate disassemble generate error output list on GLOBAL phase aggregate (#14079 ) we must use localAggregateFunction as key of globalOutputSMap, because we use local output exprs to generate global output in disassembleDistinct	2022-11-09 13:43:12 +08:00
jakevin	b144d2b4f4	[improve](Nereids): remove redundant code, add annotation in Memo. (#14083 )	2022-11-09 13:39:20 +08:00
morrySnow	aff62655c4	[feature](Nereids) binding slot in order by that not show in project (#14042 ) 1. binding slot in order by that not show in project, such as: SELECT c1 FROM t WHERE c2 > 0 ORDER BY c3 2. not check unbound when bind slot reference. Instead, do it in analysis check.	2022-11-09 13:25:41 +08:00
carlvinhust2012	7362460525	[docs](array-type) update the docs to specify how to use array function when import data (#13995 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-11-09 12:21:26 +08:00
Gabriel	a3c5fa8c01	[Compile](join) Boost compiling and linking (#14081 )	2022-11-09 11:27:46 +08:00
ChPi	55ca810445	[fix](Vectorized)fix json_object and json_array function return wrong result on vectorized engine (#13775 ) Issue Number: close #13598	2022-11-09 11:26:55 +08:00
Kang	aec214b4b0	[bug](ColumnDecimal)call set_decimalv2_type when cloning ColumnDecimal (#14061 ) * call set_decimalv2_type when cloning ColumnDecimal * clang format	2022-11-09 11:23:43 +08:00
xueweizhang	572f491756	[fix](ctas) text column type len = 1 when create table as select (#13906 ) Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2022-11-09 09:09:34 +08:00
Adonis Ling	291fa499e9	[fix](JSON) Fail to parse JSONPath (libc++) (#13941 )	2022-11-09 08:58:01 +08:00
Liqf	287c3893b9	[typo](docs)update array type doc #14057	2022-11-09 08:40:38 +08:00
zhengyu	6a1c7fac9d	[enhancement](load) shrink reserved buffer for page builder (#14012 ) (#14014 ) * [enhancement](load) shrink reserved buffer for page builder (#14012) For table with hundreds of text type columns, flushing its memtable may cost huge memory. These memory are consumed when initializing page builder, as it reserves 1MB for each column. So memory consumption grows in proportion with column number. Shrinking the reservation may reduce memory substantially in load process. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> * response to the review Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> * Update binary_plain_page.h * Update binary_dict_page.cpp * Update binary_plain_page.h Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-11-09 08:40:07 +08:00
xueweizhang	a0f136a0bc	[docs](odbc) fix docs for sqlserver odbc table (#14017 ) Signed-off-by: nextdreamblue <zxw520blue1@163.com> Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2022-11-09 08:39:39 +08:00
Mingyu Chen	cd8f0713ea	[refactor](new-scan) remove old vectorized scan node (#14029 )	2022-11-09 08:39:20 +08:00
HappenLee	75b6b267ea	[opt](ssb) Add query hint for the SSB queries (#14089 )	2022-11-09 08:37:31 +08:00
Kang	151842a1fe	[feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430 ) Introduce a SQL syntax for creating inverted index and related metadata changes. ``` -- create table with INVERTED index CREATE TABLE httplogs ( ts datetime, clientip varchar(20), request string, status smallint, size int, INDEX idx_size (size) USING INVERTED, INDEX idx_status (status) USING INVERTED, INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none") ) DUPLICATE KEY(ts) DISTRIBUTED BY RANDOM BUCKETS 10 -- add an INVERTED index to a table CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english"); ```	2022-11-08 23:46:53 +08:00
Tiewei Fang	826cfdaf93	[feature](information_schema) add `backends` information_schema table (#13086 )	2022-11-08 22:15:10 +08:00
Pxl	ae3c513d74	use extern template to date_time_add (#13970 )	2022-11-08 22:11:41 +08:00
luozenglin	115c6bd411	[fix](keyranges) fix the split error of keyranges (#14049 ) fix the split error of keyranges	2022-11-08 22:09:16 +08:00
shee	3f3f2eb098	[Nereids][Improve] infer predicate after push down predicate (#12996 ) This PR implements the function of predicate inference For example: ``` sql select * from student left join score on student.id = score.sid where score.sid > 1 ``` transformed logical plan tree: left join / \ filter(sid >1) filter(id > 1) <---- inferred predicate \| \| scan scan See `InferPredicatesTest` for more cases The logic is as follows: 1. poll up bottom predicate then infer additional predicates for example: select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id 1. poll up bottom predicate select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 2. infer select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 and t2.id = 1 finally transformed sql: select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t2.id = 1 2. put these predicates into `otherJoinConjuncts` , these predicates are processed in the next round of predicate push-down Now only support infer `ComparisonPredicate`. TODO: We should determine whether `expression` satisfies the condition for replacement eg: Satisfy `expression` is non-deterministic	2022-11-08 21:36:17 +08:00
Mingyu Chen	b6f91b6eff	[improvement](profile) support ordinary user to get query profile via http api (#14016 )	2022-11-08 20:39:01 +08:00
Kikyou1997	ecfdf0320d	[fix](statistics) ColumnStatistics was changed unexpectedly when show stats (#14068 ) The logic of show stats would change the internal collected ColumnStat unexpectedly which would cause inaccurate cost and inefficient plan	2022-11-08 20:26:37 +08:00
Yongqiang YANG	a58ac48a6e	[chore](bin) do not set heap limit for tcmalloc until doris does not allocates large unused memory (#13761 ) We set heap limit for tcmalloc to avoid oom introduced by tcmalloc which allocates memory for cache even free memory of a machine is little. However, doris allocates large memory unused in some cases, so tcmalloc would throw an oom exception even ther are a lot free memory in a machine. We can set the limit after we fix the problem again.	2022-11-08 19:26:30 +08:00
minghong	cdc635610b	[enhancement](Nereids) tpch q21 anti and semi join reorder (#14037 ) estimation of anti and semi join need re-work. we just let tpch q21 pass.	2022-11-08 17:21:50 +08:00
morrySnow	54c07f8782	[regression](Nereids) add back tpch regression test cases (#13826 ) 1. add back TPC-H regression test cases 2. fix decimal problem on aggregate function sum and agg introduced by #13764 3. fix memo merge group NPE introduced by #13900	2022-11-08 16:40:46 +08:00
Pxl	df89e46761	[fix](build) fix compile fail on Segment::open (#14058 )	2022-11-08 14:38:40 +08:00
zhangstar333	f7ecb6d79f	[Bug](Bitmap) fix sub_bitmap calculate wrong result to return null (#13978 ) fix sub_bitmap calculate wrong result to return null	2022-11-08 14:10:12 +08:00
Mingyu Chen	1c07a01038	[feature](multi-catalog) Support data on s3-compatible oss and support aliyun DLF (#13994 ) Support Aliyun DLF Support data on s3-compatible object storage, such as aliyun oss. Refactor some interface of catalog, to make it more tidy. Fix bug that the default text format field delimiter of hive should be \x01 Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.	2022-11-08 14:02:41 +08:00
谢健	61d4974ba1	[fix](Nereids) Use simple cost to calculate benefit and avoid unuseless calculation (#14056 ) In GraphSimplifier, we can use simple cost to calculate the benefit. And only when the best neighbor of the apply step is the processing edge, we need to update recursively.	2022-11-08 13:11:38 +08:00
slothever	c2a01e84b4	[feature-wip](multi-catalog) fix page index filter bug (#14015 ) Fix page index filter not take effect when multiple columns Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-11-08 12:10:12 +08:00
HappenLee	63ea233ae2	[thirdpart](lib) Add lock free queue of concurrentqueue (#14045 )	2022-11-08 11:34:23 +08:00
morrySnow	e6b12ce8e8	[feature](Nereids) support query that group by use alias generated in aggregate output (#14030 ) support query having alias in group by list, such as: SELECT c1 AS a, SUM(c2) FROM t GROUP BY a;	2022-11-08 11:02:42 +08:00
Pxl	9d8b4bc176	[Enhancement](Dictionary-codec) update dict once on same segment (#13936 ) update dict once on same segment	2022-11-08 10:59:35 +08:00

1 2 3 4 5 ...

7095 Commits