doris

Author	SHA1	Message	Date
Kikyou1997	e656dae3f0	[fix](fe) fix leaks of connect context (#14529 ) Remove ConnectContext which built for internal statistics from threadlocal to avoid memory leaks	2022-11-24 13:26:59 +08:00
starocean999	ae4f4b9bf1	[fix](agg)having clause should use column name first then alias (#14408 ) * [fix](agg)having clause should use column name first then alias * fix fe ut	2022-11-24 10:31:58 +08:00
Mingyu Chen	6ccdaf0aaf	[fix](storage-policy) use Long instead of Date to persiste cooldowntime in storage policy (#14532 ) Previously, we use "Date" type for cooldownTime in StoragePolicy. But the serialization method of Date type in Gson is different in java8 and java11, which may cause inconsistent meta error. This PR use Long to save cooldownTime. And notice that in FE, the cooldownTime is saved in milliseconds, and in BE, it is saved in seconds.	2022-11-24 08:32:21 +08:00
Gabriel	496a92b668	[JavaUDF](loader) Fix compatible problem for JAVA 11 (#14519 )	2022-11-23 23:36:39 +08:00
Jibing-Li	404cac42f9	[fix](multi catalog)Fix external table partition name and type inconsistent bug. (#14522 ) Origin code using Set to store hms external table partition columns, which couldn't guarantee the order of the columns. This could cause the column name and column type doesn't match. Using List instead of Set to fix the problem.	2022-11-23 21:40:44 +08:00
morrySnow	8d5eabb64f	[enhancement](Nereids) reduce CostAndEnforcerJob call times (#14442 ) record pruned plan's cost to avoid optimize same GroupExpression more than once.	2022-11-23 16:57:41 +08:00
谢健	45975dd321	[enhancement](Nereids): Change circle detector for better performance (#14438 )	2022-11-23 14:31:14 +08:00
minghong	7a7e714fce	[fix](nereids) width and penalty not derive when do stats derive (#14474 ) a previous pr (#13883) refactor stats derive code, but missed width and penalty.	2022-11-23 14:26:51 +08:00
minghong	fb385dcf23	[opt](nereids) make fragment id in explain get inline with profile (#14421 ) Nereids assign fragment ID in its own way. The fragment Id in explain is different from the fragment id in profile. This difference makes trouble to understand profile. This pr aims to print fragment id in explain the same as that in profile.·	2022-11-23 14:14:20 +08:00
xueweizhang	7955e52b3e	[fix](version) fix recover bug for lower version (#14457 )	2022-11-23 14:05:17 +08:00
xueweizhang	79688c34a1	[feature](catalog) add max num of same name meta information in catalog recycle bin (#14482 )	2022-11-23 14:04:14 +08:00
starocean999	d36b561520	[fix](in)fix in predicate datatype mismatch after union (#14497 )	2022-11-23 09:57:03 +08:00
xueweizhang	2eca51f3ba	[enhancement](broker) broker load support tencent cos (#12801 )	2022-11-22 21:51:15 +08:00
Mingyu Chen	6eeebd47a9	[improvement](doc) add missing documents (#14460 )	2022-11-22 21:42:00 +08:00
Kikyou1997	3360bdf124	[feature-wip](statistics) update cache when analysis job finished (#14370 ) 1. Update cache when analysis job finished 2. Rename `StatisticsStorageInitializer` to `InernalSchemaInitializer`	2022-11-22 21:33:10 +08:00
shee	89c676e597	[Bug] fix bug for grouping set query which where condition is false (#14401 )	2022-11-22 16:03:43 +08:00
yinzhijian	663f7dddcc	[improvement](planner) eliminating useless sort node (#14377 )	2022-11-22 15:13:25 +08:00
shee	730cd1a0c1	[Feature](Nereids) Simplify range of predicate (#14113 ) Simplify range of predicate for example： 1. `a > 1 or a > 2` => `a > 1` 2. `a in (1,2,3) or a (3,4,5)` => `a in (1,2,3,4,5)`	2022-11-21 20:24:03 +08:00
ChenJiaHao	91bd76a902	[enhancement](FE) use forEach() to replace stream().forEach() (#14039 )	2022-11-21 15:40:43 +08:00
谢健	a91fe11b4d	[feature](Nereids) Add random test framework (#14388 )	2022-11-21 15:16:03 +08:00
zy-kkk	ce489cf723	[Feature](JDBC)support clickhouse jdbc external table (#14244 )	2022-11-21 10:33:53 +08:00
xueweizhang	a9a6fdd8c3	[fix](insert) fix insert into table which contains column name prefix mv_ (#14361 )	2022-11-21 10:31:01 +08:00
周翱	4976021bf7	[Enhancement] Doris broker support aliyun-oss #13665 (#14305 )	2022-11-21 10:29:14 +08:00
zhannngchen	3e1e8db173	[fix](exec) fix thread token shutdown (#14418 ) Fix Thread pool token was shut down error. This is because when there are more than 1 fragment of a query on one BE, the thread token maybe reset incorrectly, causing thread token shutdown earlier. cherry-pick from master Introduced from #13021	2022-11-20 00:04:48 +08:00
Gabriel	2c42f0a905	[refactor](decimalv3) Refine code for DecimalV3 (#14394 )	2022-11-19 16:57:17 +08:00
caiconghui	1f2c06dd6e	[enhancement](rewrite) Remove unused wide common factors to improve scan performance in ExtractCommonFactorsRule (#14381 ) * [enhancemeng](sql) Remove unused wide common factors to improve scan performance in ExtractCommonFactorsRule * fix regression test Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2022-11-19 13:23:49 +08:00
FreeOnePlus	f5f2e84e31	[refactor](planner) remove the limit return rows of order by (#12478 ) Originally, Order By Limit returned a maximum of 65535 rows of data by default during the query, but now many businesses do not apply this limit. It is necessary to add larger data after the query statement to complete the full data query, which is extremely inconvenient, so adjustments have been made. At the same time, I added the variable DEFAULT_ORDER_BY_LIMIT to the SessionVariable, the default value is -1, if the user does not use the LIMIT keyword or the LIMIT value is a negative integer, the default query return value is Long.MAX_VALUE. If the corresponding maximum query value is set, the number of data items is returned according to the maximum query value or the value followed by the LIMIT keyword.	2022-11-19 12:45:44 +08:00
gnehil	1b6e872a8a	[improvement](common) table name length exceeds limit error message (#14368 ) For the table name check, the regular match error and the length exceeds the limit, both of which display the message "Incorrect table name 'xxx'. Table name regex is 'xxx'". Obviously, the message cannot clearly point out what kind of error it is. So it is a better way to separate the two error messages.	2022-11-19 11:36:08 +08:00
924060929	63a2344e68	[Enhancement](Nereids) Refactor AggregateFunction and support explain plan (#14380 ) # Proposed changes - Refactor AggregateFunction 1. AggregateFunction implement ComputeSignature 3. Add a CustomSignature to dynamic compute signature, we can check input type and compute implicit cast type in the `customSignature` method 2. Add PartialAggType to record some type information before disassemble aggregate 4. Refine and create a custom catalog function when translate AggregateFunction, without `finalizeForNereids` - Support explain plan 1. explain parsed plan select ... 5. explain analyzed plan select ... 6. explain rewritten/logical plan select ... 7. explain optimized/physical plan select ... 8. explain all plan select ...	2022-11-18 23:40:33 +08:00
minghong	c4bade71c8	[refactor](nereids) remove ColumnStatistics.UNKNOWN from StatsDerive (#14343 ) ColumnStatistics.UNKNOWN can be replaced by ColumnStatistics.DEFAULT	2022-11-18 23:40:00 +08:00
Xin Liao	a82896f420	[fix](broker-load) fix that broker load don not set be exec version and limit node channel memory (#14399 )	2022-11-18 23:38:37 +08:00
xueweizhang	68da6bccb7	[fix](type) fix DECIMAL scale when cast function on fe (#12877 ) before: MySQL [test]> select cast('135.759999999' as DECIMAL(10,3)); +----------------------------------------+ \| CAST('135.759999999' AS DECIMAL(10,3)) \| +----------------------------------------+ \| 135.759999999 \| +----------------------------------------+ 1 row in set (0.00 sec) now: MySQL [stage]> select cast('135.759999999' as DECIMAL(10,3)); +----------------------------------------+ \| CAST('135.759999999' AS DECIMAL(10,3)) \| +----------------------------------------+ \| 135.759 \| +----------------------------------------+ 1 row in set (0.01 sec)	2022-11-18 19:36:14 +08:00
Mingyu Chen	2c4236fd24	[improvement](ctas) use string type for varchar/char/string (#14382 ) When executing create table as select stmt, the varchar/char/string type of column in created table will be unified to string type. Because when select from external table (mysql/pg, etc), the length of varchar in external database is calculated by "char" length, not "byte" length. So if there is a column with varchar(10) in external table, then there will be a same varchar(10) in created table. But the byte length of data in external table may be larger than 10, causing failure of CTAS. Change to string will not impact performance of the capacity of disk storage. And notice that if a string type column is the first column, it will be changed to varchar(65535), because we do not allow string type column as sort key column.	2022-11-18 14:20:13 +08:00
Tiewei Fang	a1d02f36ac	[feature](table-valued-function) support `hdfs()` tvf (#14213 ) This pr does two things: 1. support `hdfs()` table valued function. 2. add regression test	2022-11-18 14:17:02 +08:00
morrySnow	7952bce03f	[compatibility](Nereids) process escape in string literal (#14294 )	2022-11-18 11:24:00 +08:00
谢健	9e25aa8d3e	[feature](Nereids): Add subgraph enumerator #14291 Add subgraph enumerator to find the best plan For DPHyp, we need an enumerator for all csg-cmp pairs to find the best plan	2022-11-18 10:33:30 +08:00
morrySnow	da0b09caea	[fix](Nereids) DateTimeType migrate to DateType is wrong when hour, minute and second all zero (#14327 ) 1. fix DateTimeType migrate to DateType is wrong when hour, minute and second all zero 2. add TPC-H regression test with DATEV2 type	2022-11-18 01:38:03 +08:00
Xin Liao	fb140d0180	[Enhancement](sequence-column) optimize the use of sequence column (#13872 ) When you create the Uniq table, you can specify the mapping of sequence column to other columns. You no longer need to specify mapping column when importing.	2022-11-17 22:39:09 +08:00
Gabriel	50bfd99b59	[feature](join) support nested loop semi/anti join (#14227 )	2022-11-17 22:20:08 +08:00
Mingyu Chen	8fe5211df4	[improvement](multi-catalog)(cache) invalidate catalog cache when refresh (#14342 ) Invalidate catalog/db/table cache when doing refresh catalog/db/table. Tested table with 10000 partitions. The refresh operation will cost about 10-20 ms.	2022-11-17 20:47:46 +08:00
Jibing-Li	ccf4db394c	[feature-wip](multi-catalog) Collect external table statistics (#14160 ) Collect HMS external table statistic information through external metadata. Insert the result into __internal_schema.column_statistics using insert into SQL.	2022-11-17 20:41:09 +08:00
Ashin Gau	44ee4386f7	[test](multi-catalog)Regression test for external hive orc table (#13762 ) Add regression test for external hive orc table. This PR has generated all basic types support by hive orc, and create a hive external table to touch them in docker environment. Functions to be tested: 1. Ensure that all types are parsed correctly 2. Ensure that the null map of all types are parsed correctly 3. Ensure that the `SearchArgument` of `OrcReader` works well 4. Only select partition columns	2022-11-17 20:36:02 +08:00
Kikyou1997	98956dfa19	[fix](statistics) statistics inaccurate after analyze same table more than once (#14279 ) If a table already been analyzed, then we analyze it again, the new statistics would larger than expected since the incremental would contain the values from table level statistics since the SQL lack the predication for the nullability of part_id	2022-11-17 20:18:14 +08:00
slothever	6da2948283	[feature-wip](multi-catalog) support iceberg v2(step 1) (#13867 ) Support position delete(part of).	2022-11-17 17:56:48 +08:00
morrySnow	af462b07c7	[enhancement](explain) compress descriptor table explain string (#14152 ) 1. compress slot descriptor explain string to one row 2. remove unmaterialized tuple descriptor and slot descriptor before this PR descriptor table explain string is like this: ``` TupleDescriptor{id=0, tbl=lineitem, byteSize=176, materialized=true} SlotDescriptor{id=0, col=l_shipdate, type=DATEV2} parent=0 materialized=true byteSize=4 byteOffset=0 nullIndicatorByte=0 nullIndicatorBit=-1 nullable=false slotIdx=0 SlotDescriptor{id=1, col=l_orderkey, type=BIGINT} parent=0 materialized=true byteSize=8 byteOffset=24 nullIndicatorByte=0 nullIndicatorBit=-1 nullable=false slotIdx=6 ``` after this PR descriptor table explain string is like this: ``` TupleDescriptor{id=2, tbl=lineitem} SlotDescriptor{id=1, col=l_extendedprice, type=DECIMAL(15,2), nullable=false} SlotDescriptor{id=2, col=l_discount, type=DECIMAL(15,2), nullable=false} ```	2022-11-17 15:19:17 +08:00
minghong	afc9065b51	[test](nereids) add filter estimation ut cases (#14293 ) fix a bug for filter estimation, in pattern of A>10 and A<20.	2022-11-17 11:01:30 +08:00
Mingyu Chen	7182f14645	[improvement][fix](multi-catalog) speed up list partition prune (#14268 ) In previous implementation, when doing list partition prune, we need to generation `rangeToId` every time we doing prune. But `rangeToId` is actually a static data that should be create-once-use-every-where. So for hive partition, I created the `rangeToId` and all other necessary data structures for partition prunning in partition cache, so that we can use it directly. In my test, the cost of partition prune for 10000 partitions reduce from 8s -> 0.2s. Aslo add "partition" info in explain string for hive table. ``` \| 0:VEXTERNAL_FILE_SCAN_NODE \| \| predicates: `nation` = '0024c95b' \| \| inputSplitNum=1, totalFileSize=4750, scanRanges=1 \| \| partition=1/10000 \| \| numNodes=1 \| \| limit: 10 \| ``` Bug fix: 1. Fix bug that es scan node can not filter data 2. Fix bug that query es with predicate like `where substring(test2,2) = "ext2";` will fail at planner phase. `Unexpected exception: org.apache.doris.analysis.FunctionCallExpr cannot be cast to org.apache.doris.analysis.SlotRef` TODO: 1. Some problem when quering es version 8: ` Unexpected exception: Index: 0, Size: 0`, will be fixed later.	2022-11-17 08:30:03 +08:00
wxy	943e014414	[enhancement](decommission) speed up decommission process (#14028 ) (#14006 )	2022-11-16 20:43:07 +08:00
morrySnow	47a6373e0a	[feature](Nereids) support datev2 and datetimev2 type (#14263 ) 1. split DateLiteral and DateTimeLiteral into V1 and V2 2. add a type coercion about DateLikeType: DateTimeV2Type > DateTimeType > DateV2Type > DateType 3. add a rule to remove unnecessary CAST on DateLikeType in ComparisonPredicate	2022-11-16 15:51:48 +08:00
Gabriel	6881989dd9	[Bug](jvm memory) Support multiple java version to get max heap size (#14295 ) `sun.misc.VM.maxDirectMemory` is used in JDK1.8 only. This PR add the interface for JDK11.	2022-11-16 10:58:58 +08:00

1 2 3 4 5 ...

3128 Commits