doris

Author	SHA1	Message	Date
Xin Liao	fb140d0180	[Enhancement](sequence-column) optimize the use of sequence column (#13872 ) When you create the Uniq table, you can specify the mapping of sequence column to other columns. You no longer need to specify mapping column when importing.	2022-11-17 22:39:09 +08:00
spaces-x	1a035e2073	[fix](profile)(AggNode) fix the GetResultsTime is always zero (#14366 ) add scoped_timer in _serialize_with_serialized_key_result	2022-11-17 22:30:21 +08:00
Gabriel	50bfd99b59	[feature](join) support nested loop semi/anti join (#14227 )	2022-11-17 22:20:08 +08:00
HappenLee	d5af4f6558	[Neried](Profile) Add projection timer for neried (#14286 )	2022-11-17 22:17:55 +08:00
Mingyu Chen	8fe5211df4	[improvement](multi-catalog)(cache) invalidate catalog cache when refresh (#14342 ) Invalidate catalog/db/table cache when doing refresh catalog/db/table. Tested table with 10000 partitions. The refresh operation will cost about 10-20 ms.	2022-11-17 20:47:46 +08:00
Jibing-Li	ccf4db394c	[feature-wip](multi-catalog) Collect external table statistics (#14160 ) Collect HMS external table statistic information through external metadata. Insert the result into __internal_schema.column_statistics using insert into SQL.	2022-11-17 20:41:09 +08:00
Ashin Gau	44ee4386f7	[test](multi-catalog)Regression test for external hive orc table (#13762 ) Add regression test for external hive orc table. This PR has generated all basic types support by hive orc, and create a hive external table to touch them in docker environment. Functions to be tested: 1. Ensure that all types are parsed correctly 2. Ensure that the null map of all types are parsed correctly 3. Ensure that the `SearchArgument` of `OrcReader` works well 4. Only select partition columns	2022-11-17 20:36:02 +08:00
Kikyou1997	98956dfa19	[fix](statistics) statistics inaccurate after analyze same table more than once (#14279 ) If a table already been analyzed, then we analyze it again, the new statistics would larger than expected since the incremental would contain the values from table level statistics since the SQL lack the predication for the nullability of part_id	2022-11-17 20:18:14 +08:00
TengJianPing	a382bb95e7	[fix](runtimefilter) fix heap-user-after-free of runtime filter merge (#14362 )	2022-11-17 19:38:45 +08:00
yiguolei	dba19e591c	[cherry-pick](scanner) using avg rowset to calculate batch size instead of using total_bytes since it costs a lot of cpu (#14345 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-17 18:57:21 +08:00
slothever	6da2948283	[feature-wip](multi-catalog) support iceberg v2(step 1) (#13867 ) Support position delete(part of).	2022-11-17 17:56:48 +08:00
Yongqiang YANG	a9e53e5c86	[improvement](test) add conf for pipline (#14254 ) Add conf used by pipline to git, then we will change conf of pipeline via pr like code commit.	2022-11-17 16:05:24 +08:00
morrySnow	af462b07c7	[enhancement](explain) compress descriptor table explain string (#14152 ) 1. compress slot descriptor explain string to one row 2. remove unmaterialized tuple descriptor and slot descriptor before this PR descriptor table explain string is like this: ``` TupleDescriptor{id=0, tbl=lineitem, byteSize=176, materialized=true} SlotDescriptor{id=0, col=l_shipdate, type=DATEV2} parent=0 materialized=true byteSize=4 byteOffset=0 nullIndicatorByte=0 nullIndicatorBit=-1 nullable=false slotIdx=0 SlotDescriptor{id=1, col=l_orderkey, type=BIGINT} parent=0 materialized=true byteSize=8 byteOffset=24 nullIndicatorByte=0 nullIndicatorBit=-1 nullable=false slotIdx=6 ``` after this PR descriptor table explain string is like this: ``` TupleDescriptor{id=2, tbl=lineitem} SlotDescriptor{id=1, col=l_extendedprice, type=DECIMAL(15,2), nullable=false} SlotDescriptor{id=2, col=l_discount, type=DECIMAL(15,2), nullable=false} ```	2022-11-17 15:19:17 +08:00
jiafeng.zhang	a4d4fc8c02	datax doris writer doc fix (#14344 )	2022-11-17 13:08:32 +08:00
minghong	afc9065b51	[test](nereids) add filter estimation ut cases (#14293 ) fix a bug for filter estimation, in pattern of A>10 and A<20.	2022-11-17 11:01:30 +08:00
jiafeng.zhang	0bf6d1fd79	[typo](doc)Datax doris writer doc update (#14328 )	2022-11-17 08:53:55 +08:00
Mingyu Chen	7182f14645	[improvement][fix](multi-catalog) speed up list partition prune (#14268 ) In previous implementation, when doing list partition prune, we need to generation `rangeToId` every time we doing prune. But `rangeToId` is actually a static data that should be create-once-use-every-where. So for hive partition, I created the `rangeToId` and all other necessary data structures for partition prunning in partition cache, so that we can use it directly. In my test, the cost of partition prune for 10000 partitions reduce from 8s -> 0.2s. Aslo add "partition" info in explain string for hive table. ``` \| 0:VEXTERNAL_FILE_SCAN_NODE \| \| predicates: `nation` = '0024c95b' \| \| inputSplitNum=1, totalFileSize=4750, scanRanges=1 \| \| partition=1/10000 \| \| numNodes=1 \| \| limit: 10 \| ``` Bug fix: 1. Fix bug that es scan node can not filter data 2. Fix bug that query es with predicate like `where substring(test2,2) = "ext2";` will fail at planner phase. `Unexpected exception: org.apache.doris.analysis.FunctionCallExpr cannot be cast to org.apache.doris.analysis.SlotRef` TODO: 1. Some problem when quering es version 8: ` Unexpected exception: Index: 0, Size: 0`, will be fixed later.	2022-11-17 08:30:03 +08:00
xu tao	3259fcb790	[typo](docs) fix docs kafka-load.md (#14313 )	2022-11-16 23:17:30 +08:00
wxy	943e014414	[enhancement](decommission) speed up decommission process (#14028 ) (#14006 )	2022-11-16 20:43:07 +08:00
morrySnow	47a6373e0a	[feature](Nereids) support datev2 and datetimev2 type (#14263 ) 1. split DateLiteral and DateTimeLiteral into V1 and V2 2. add a type coercion about DateLikeType: DateTimeV2Type > DateTimeType > DateV2Type > DateType 3. add a rule to remove unnecessary CAST on DateLikeType in ComparisonPredicate	2022-11-16 15:51:48 +08:00
Gabriel	6881989dd9	[Bug](jvm memory) Support multiple java version to get max heap size (#14295 ) `sun.misc.VM.maxDirectMemory` is used in JDK1.8 only. This PR add the interface for JDK11.	2022-11-16 10:58:58 +08:00
Ashin Gau	20634ab7e3	[feature-wip](multi-catalog) support partition&missing columns in parquet lazy read (#14264 ) PR https://github.com/apache/doris/pull/13917 has supported lazy read for non-predicate columns in ParquetReader, but can't trigger lazy read when predicate columns are partition or missing columns. This PR support such case, and fill partition and missing columns in `FileReader`.	2022-11-16 08:43:11 +08:00
yuanyuan8983	442b844b22	[regressiontest](delete)delete-where-in-test (#14036 ) * delete-where-in-test * Update test_delete_where_in.groovy * Update test_delete_where_in.groovy	2022-11-15 18:35:31 +08:00
camby	3ea9d3f2e1	[enhancement](array) support read list(Array) type from orc file (#14132 ) Before this pr, if we try to load ORC file with native list(or array) type data, the be will crash. Because complex types in ORC file include multi real columns, so we need to filter columns by column names. Otherwise we could not read all columns we need. Now arrow release-7.0.0 only support create stripe reader by column index, so we patch it to support create stripe reader by column names. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-15 17:48:17 +08:00
yixiutt	9d70c531a3	[improvement](publish) fix publish timeout in cocurrent load (#14231 ) In concurrent load, some publish timeout happens occasionally. This is cause by meta lock hold by other thread so publish add increase rowset hang for several seconds. StorageEngine::start_delete_unused_rowset will hold gc_mutex and it cost a lot of time, so that add_used_rowset wait lock, and compaction modify_rowset or other tablet method will hold meta_lock and call add_unused_rowset which will make meta_lock occupied for too long, finally makes publish timeout. In this pr, I copy unused_rowsets in lock and delete these rowset without lock, makes gc_mutex more lightweight so meta lock can be acquired immediately in publish thread. My test shows that no publish timeout in concurrent stream load.	2022-11-15 16:39:38 +08:00
zhangstar333	70cc725649	[Vectorized](function) support avg_weighted/percentile_array/topn_wei… (#14209 ) * [Vectorized](function) support avg_weighted/percentile_array/topn_weighted functions * update add to stringRef	2022-11-15 16:38:38 +08:00
huangzhaowei	5badd70db2	[fix](csv-reader) Fix core dump when load text into doris with special delimiter (#14196 )	2022-11-15 16:06:59 +08:00
starocean999	6d2e6d85d3	[enhancement](be)release memory in Node's close() method (#14258 ) * [enhancement](be)release memory in Node's close() method * format code	2022-11-15 15:59:23 +08:00
Adonis Ling	333c6390ee	[fix](be-ut) AddressSanitizer detects container-overflow issues (#14255 ) * [chore] Fix the container-overflow errors detected by address sanitizer * Fix compilation errors	2022-11-15 15:49:55 +08:00
camby	a45685d028	[fix](regression) concurrent regression cases may fail #14271 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-15 15:46:34 +08:00
Pxl	e298696baf	[Chore](env) add error information when DORIS_GCC_HOME not set well (#14249 )	2022-11-15 15:45:35 +08:00
yiguolei	87544a017f	[fuzztest](fe session variable) add fuzzy test config for fe session variables. (#14272 ) Many feature in fe session variable is disabled by default. So that these features do not pass github workflow test actually. I add a fuzzy test config in fe.conf. If it is set to true, then we will use fuzzy session variables for every connection so that every feature developer could set fuzzy values for its config. Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-15 15:43:21 +08:00
abmdocrt	f86886f8f5	[Feature](function) Support array_compact function (#14141 )	2022-11-15 14:24:37 +08:00
Yongqiang YANG	5ae046b208	[bugfix](log) fix wrong print introduced by 49fecd2a6dae #14266	2022-11-15 11:39:05 +08:00
Kikyou1997	a3062c662c	[feature-wip](statistics) support statistics injection and show statistics (#14201 ) 1. Reduce the configuration options for statistics framework, and add comment for those rest. 2. Move the logic of creation of analysis job to the `StatisticsRepository` which defined all the functions used to interact with internal statistics table 3. Move AnalysisJobScheduler to the statistics package 4. Support display and injections manually for statistics	2022-11-15 11:29:51 +08:00
abmdocrt	6cc5ae077e	[Improvement](Sequence function) Capitalize const variables (#14270 )	2022-11-15 10:41:53 +08:00
huangzhaowei	89db3fee00	[feature-wip](MTMV)Add show statement for MTMV (#13786 ) Use Case mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql > CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk; mysql> SHOW MTMV JOB; mysql> SHOW MTMV TASK;	2022-11-15 10:32:47 +08:00
Gabriel	215a4c6e02	[Bug](BHJ) Fix wrong result when use broadcast hash join for naaj (#14253 )	2022-11-15 09:40:00 +08:00
Xinyi Zou	cffdeff4ec	[fix](memory) Fix memory leak by calling boost::stacktrace (#14269 ) boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace. The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace. refer to: boostorg/stacktrace#118 boostorg/stacktrace#111	2022-11-15 08:58:57 +08:00
zhangstar333	93e5d8e660	[Vectorized](function) support bitmap_from_array function (#14259 )	2022-11-15 01:55:51 +08:00
ccoffline	37fdd011b4	[fix](fe-metric) Prometheus read format error #13831 (#13832 ) Co-authored-by: 迟成 <chicheng@meituan.com>	2022-11-14 22:07:00 +08:00
minghong	b0ff852d74	[opt](Nereids) right deep tree penalty adjust: use right rowCount, not abs(left - right) (#14239 ) in origin algorithm, the penalty is abs(leftRowCount - RightRowCount). this will make some right deep tree escape from penalty， because the substraction is almost zero. Penalty by RightRowCount can avoid this escape.	2022-11-14 16:40:26 +08:00
minghong	bea66e6a12	[fix](nereids) cannot generate RF on colocate join and prune useful RF in RF prune (#14234 ) 1. when we translate colocated join, we lost RF information attached to the right child, and hence BE will not generate those RFs. 2. when a RF is useless, we prune all RFs on the scan node by mistake	2022-11-14 16:36:55 +08:00
minghong	8dd2f8b349	[enhancement](nereids) set Ndv=rowCount if ndv is almost equal to rowCount on ColumnStatisitics load (#14238 )	2022-11-14 16:30:35 +08:00
minghong	bdf7d2779a	[fix](Nereids) aggregate always report has 1 row count (#14236 ) the data structure of new stats is changed, bug Agg-estimation is not changed	2022-11-14 16:27:55 +08:00
minghong	47326f951d	[fix](nereids) count(*) reports npe when do filter selectivity estimation (#14235 )	2022-11-14 16:11:08 +08:00
minghong	cf5e2a2eb6	[fix](nereids) new statistics use wrong default selectivity (#14233 ) by default, column selectivity MUST be 1.0, not ZERO	2022-11-14 16:09:17 +08:00
Ashin Gau	fc70179acb	[multi-catalog](fix) the eof of lazy read columns may be not equal to the eof of predicate columns (#14212 ) Fix three bugs: 1. The EOF of lazy read columns may be not equal to the EOF of predicate columns. (for example: If the predicate column has 3 pages, with 400 rows for each, but the last page is filtered by page index. When batch_size=992, the EOF of predicate column is true. However, we should set batch_size=800 for lazy read column, so the EOF of lazy read column may be false.) 2. The array column does not count the number of nulls 3. Generate wrong NullMap for array column	2022-11-14 14:37:21 +08:00
Mingyu Chen	7eed5a292c	[feature-wip](multi-catalog) Support hive partition cache (#14134 )	2022-11-14 14:12:40 +08:00
Jibing-Li	30f36070b5	[test](multi-catalog)Regression test for external hive parquet table (#13611 )	2022-11-14 14:10:10 +08:00

1 2 3 4 5 ...

7207 Commits