doris

Author	SHA1	Message	Date
ElvinWei	9a916cffe4	[Optimization](String) Optimize the injection of statistics. #18401 1. Remove useless partition statistics injection. 2. Adding judgment logic to avoid exception during numerical transformation.	2023-04-06 11:42:11 +08:00
minghong	f73189860f	[tpch](nereids) trustable join condition (#18272 ) (affects tpch q14/7/9) 1. equation estimation confidence level For equation, if any side is almost unique, its estimation confidence is high, we call it trustable condition. if a join contains more than one un-trustable condition, we only use the one whose selectivity is biggest in order to avoid error propagation. 2. like expression estimation factor: 0.2 give a separate default shrink ratio for like operator, default ratio is 0.2 3. disable fat-child-penalty set HEAVY_OPERATOR_PUNISH_FACTOR=1 this change affect tpch q15. This factor should be adaptive to the implementation of BE.	2023-04-06 11:20:47 +08:00
Pxl	76d76f672c	[Chore](build) enchancement for backend build time usage (#18344 )	2023-04-06 11:13:21 +08:00
slothever	d0219180a9	[feature-wip](multi-catalog)add properties converter (#18005 ) Refactor properties of each cloud , use property converter to convert properties accessing fe metadata and be data. user docs #18287	2023-04-06 09:55:30 +08:00
minghong	60bad33e7e	[fix](nereids) explain shape refactor #18399 previous pr 18296 has a bug when parse SHAPE_PLAN.	2023-04-06 08:55:05 +08:00
abmdocrt	1ec400c786	[fix](SSL) fix ssl connection buffer overflow (#18359 )	2023-04-05 08:42:41 +08:00
Jibing-Li	ea60d65384	[Improvement](multi catalog)Move split size config to session variable (#18355 ) Move split size config to session variable. Before, it was in Config class, user need to restart FE after change it.	2023-04-05 01:02:47 +08:00
gitccl	7f8d92656e	[fix](streamload) fix stream load failed when enable profile (#18364 ) #18015 enables stream load profile log, however be will encounter rpc fail when loading tpch data(see #18291). This is because when `is_report_success` is true, be will reportExecStatus to fe, but fe cannot find QueryInfo in `coordinatorMap`, thus it will return error to be.	2023-04-05 01:01:46 +08:00
xueweizhang	d8b293de07	[fix](multi-catalog) add catalog info for show proc (#18276 ) Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-04-04 22:49:22 +08:00
huangzhaowei	7c36bef6bc	[Feature-Wip](MySQL Load)Show load warning for my sql load (#18224 ) 1. Support the show load warnings for mysql load to get the detail error message. 2. Fix fillByteBufferAsync not mark the load as finished in same data load 3. Fix drain data only in client mode.	2023-04-04 22:44:48 +08:00
morrySnow	e29fc3b46b	[fix](chore) fix compile failed in JdbcExecutor and revert #18306 since be crash randomly (#18371 ) fix 2 problems: 1. PR #18187 use the api resizeColumn in JNINativeMethod has been removed by #17960 2. revert PR #18306 to fix pipeline core when load	2023-04-04 20:04:28 +08:00
Ashin Gau	66bfd18601	[opt](file_reader) add prefetch buffer to read csv&json file (#18301 ) Co-authored-by: ByteYue <[yj976240184@gmail.com](mailto:yj976240184@gmail.com)> This PR is an optimization for https://github.com/apache/doris/pull/17478: 1. Change the buffer size of `LineReader` to 4MB to align with the size of prefetch buffer. 2. Lazily prefetch data in the first read to prevent wasted reading. 3. S3 block size is 32MB only, which is too small for a file split. Set 128MB as default file split size. 4. Add `_end_offset` for prefetch buffer to prevent wasted reading. The query performance of reading data on object storage is improved by more than 3x+.	2023-04-04 19:05:22 +08:00
minghong	3fc8c19735	[improve](nereids)compute statsRange.length() according to the column datatype (#18331 ) we map date/datetime/V2 to double. this map reserves date order, but it does not reserve range length. For example, from 1990-01-01 to 1991-01-01, there are 12 months. for filter `A < 1990-02-01`, the selectivity should be `1/12`. if we compute this filter by their corresponding double value, `sel = (19900201 - 19900101) / (19910101 - 19900101) = 100/10000 = 1/100` the error is about 10 times. This pr aims to fix this error. Describe your changes. Solution: convert double to its corresponding dataType(date/datev2), then compute the range length with respect to its datatype.	2023-04-04 14:20:34 +08:00
zhangstar333	54dbb4af67	[vectorzied](jdbc) refactor jdbc table read array type (#18187 ) jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object it's difficult to maintain and read the code, so change all database's array result to string, then add a cast function from string to doris array type	2023-04-04 11:57:04 +08:00
yongkang.zhong	6231ca80f7	[improve](clickhouse catalog) Add `"` wrap select column for the sql query clickhouse jdbc (#18352 )	2023-04-04 10:19:24 +08:00
minghong	3e7a9424e4	[feature](nereids) explain shape plan (#18296 ) `explain shape plan select ...` only print plan shape related information, including - node name - join type, join condition - filter condition - agg phase It is painful to maintain regression cases using explain since there are a lot of mutable information, like slot id. By this pr, we could use explain shape plan in regression cases. for exmaple: this is tpch q2 +-----------------------------------------------------------------------------------------------------------+ \| Explain String \| +-----------------------------------------------------------------------------------------------------------+ \| PhysicalTopN \| \| --PhysicalDistribute \| \| ----PhysicalTopN \| \| ------PhysicalProject \| \| --------filter((cast(ps_supplycost as DECIMAL(27, 9)) = min(ps_supplycost) OVER(PARTITION BY p_partkey))) \| \| ----------PhysicalWindow \| \| ------------PhysicalQuickSort \| \| --------------PhysicalProject \| \| ----------------hashJoin[INNER_JOIN](supplier.s_suppkey = partsupp.ps_suppkey) \| \| ------------------PhysicalProject \| \| --------------------hashJoin[INNER_JOIN](part.p_partkey = partsupp.ps_partkey) \| \| ----------------------PhysicalProject \| \| ------------------------PhysicalOlapScan[partsupp] \| \| ----------------------PhysicalProject \| \| ------------------------filter((part.p_size = 15)(p_type like '%BRASS')) \| \| --------------------------PhysicalOlapScan[part] \| \| ------------------PhysicalDistribute \| \| --------------------hashJoin[INNER_JOIN](supplier.s_nationkey = nation.n_nationkey) \| \| ----------------------PhysicalOlapScan[supplier] \| \| ----------------------PhysicalDistribute \| \| ------------------------hashJoin[INNER_JOIN](nation.n_regionkey = region.r_regionkey) \| \| --------------------------PhysicalProject \| \| ----------------------------PhysicalOlapScan[nation] \| \| --------------------------PhysicalDistribute \| \| ----------------------------PhysicalProject \| \| ------------------------------filter((region.r_name = 'EUROPE')) \| \| --------------------------------PhysicalOlapScan[region] \| +-----------------------------------------------------------------------------------------------------------+	2023-04-04 09:44:15 +08:00
xueweizhang	798d2e5160	[fix](catalog) all properties should be checked when create unpartitioned table (#18149 ) all properties should be checked when create unpartitioned table like partitioned table. Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-04-04 08:53:45 +08:00
ZhangYu0123	8b85c55117	[vectorized](function) Support array_shuffle and shuffle function. (#18116 ) --------- Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-04-04 08:53:13 +08:00
starocean999	88c5e64c4a	[fix](nereids) fix bug of SelectMaterializedIndexWithAggregate rule (#18265 ) 1. create a project node to adjust the output column position when a mv is selected in olap scan node 2. pass SlotReference's column info when call Alias's toSlot() method 3. should compare plan's logical properties when compare two plans after rewrite	2023-04-03 22:32:43 +08:00
yongkang.zhong	fe9d2b00fc	[test](jdbc catalog) add clickhouse jdbc catalog base type test (#18007 )	2023-04-03 20:18:36 +08:00
Gabriel	96a64dc9e8	[Improvement](pipeline) Use bloom runtime filter by default for pipeline engine (#18177 )	2023-04-03 15:31:48 +08:00
yongjinhou	aff260c06f	[Enhancement](HttpServer) Support https interface (#16834 ) 1. Organize http documents 2. Add http interface authentication for FE 3. Support https interface for FE 4. Provide authentication interface 5. Add http interface authentication for BE 6. Support https interface for BE	2023-04-03 14:18:17 +08:00
Mingyu Chen	ecd3fd07f6	[feature](colocate) support cross database colocate join (#18152 )	2023-04-03 14:03:42 +08:00
Jibing-Li	e260dca7a1	[Improvement](multi catalog)Change hive metastore cache split value type to Doris defined Split. Fix split file length -1 bug (#18319 ) HiveMetastoreCache type for file split was Hadoop InputSplit. In this pr, change it to Doris defined Split This change could avoid convert it every time. Also fix the explain verbose result return -1 for split file length.	2023-04-03 13:54:28 +08:00
Liqf	961f5d1bb7	[feature](function)Add St_Angle/St_Azimuth function (#18293 ) Add St_Angle/St_azimuth function： St_Angle： Enter three point, which represent two intersecting lines. Returns the angle between these lines. Point 2 and point 1 represent the first line and point 2 and point 3 represent the second line. The angle between these lines is in radians, in the range [0, 2pi). The angle is measured clockwise from the first line to the second line. ` mysql> SELECT ST_Angle(ST_Point(1, 0),ST_Point(0, 0),ST_Point(0, 1)); +----------------------------------------------------------------------+ \| st_angle(st_point(1.0, 0.0), st_point(0.0, 0.0), st_point(0.0, 1.0)) \| +----------------------------------------------------------------------+ \| 4.71238898038469 \| +----------------------------------------------------------------------+ 1 row in set (0.04 sec) ` St_azimuth： Enter two point, and returns the azimuth of the line segment formed by points 1 and 2. The azimuth is the angle in radians measured between the line from point 1 facing true North to the line segment from point 1 to point 2. ` mysql> SELECT st_azimuth(ST_Point(0, 0),ST_Point(1, 0)); +----------------------------------------------------+ \| st_azimuth(st_point(0.0, 0.0), st_point(1.0, 0.0)) \| +----------------------------------------------------+ \| 1.5707963267948966 \| +----------------------------------------------------+ 1 row in set (0.04 sec)	2023-04-03 13:01:59 +08:00
Pxl	e77833bfa1	[Bug](materialized-view) fix where clause persistence replay incorrect (#18228 ) fix where clause persistence replay incorrect	2023-04-03 12:49:01 +08:00
AKIRA	ce4dc681be	[test](stats) Test framework for stats estimation on TPCH-1G dataset (#18267 ) Implement a test framework for stats estimation on TPCH-1G dataset to ensure accuracy	2023-04-03 11:01:57 +08:00
WenYao	2bce4db81a	[Enchancement](mysql-compatable) add regression-test for MySQLdump #18208 add regression-test for like this: mysqldump -h127.0.0.1 -P9030 -uroot --no-tablespaces --databases > /backup/mysqldump/test.db To prevent errors Unknown table 'column_statistics' in information_schema (1109), the table information_schema.column_statistics was added.	2023-04-03 09:49:07 +08:00
minghong	b9381570d6	[feature](nereids) semi and anti join estimation (#18129 ) in this pr, we add a new algorithm to estimate semi/anti join row count. In original alg., we reduce row count from cross join. usually, this is not good. for example, L left semi join R on L.a=R.a suppose L is larger than R, and ndv(L.a) < ndv(R.a) the estimated row count is rowcount(R) * rowcount(L) / ndv(R.a). in most cases, the estimated row count is larger than rowcount(L). in new alg, we use ndv(R.a)/originalNdv(R.a) to estimate result rowCount. the basic idea is as following: 1. Suppose ndv(R.a) reduced from m to n. 2. Assume that the value space of L.a is the same as R.a if R.a is not filtered.(this assumption is also hold in original alg.) regard `L left join R` as a filter applied on L, that is, if L.a is in R.a, then this tuple stays in result. R.a shrinks to m/n, so L.a also shrinks to m/n	2023-04-03 09:11:10 +08:00
Mingyu Chen	7131c60e05	[fix](audit-log) fixslow query missing in audit log (#18317 ) #17738 changed the column name in audit log, causing "slow_query" will not be recorded in fe.audit.log	2023-04-03 08:52:14 +08:00
mch_ucchi	4fcd93ac00	[Enhancement](Nereids)add datelikev2 type support for fold constant. #18275 add datelikev2 type support for fold constant. date_add / years_add / mouths_add / days_add / hours_add / minutes_add / seconds_add and xxx_sub.	2023-04-03 08:47:47 +08:00
Jack Drogon	7d49d9cf99	[improvement](dynamic partition) Fix dynamic partition no bucket (#18300 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-04-02 15:51:21 +08:00
slothever	97aab138aa	[fix](parquet-reader) reset value idx in bool rle decoder and support iceberg datetime(3) (#18245 ) 1. Fix value idx in bool rle decoder 2. Iceberg table support datetimev2(3). In the previous version, we converted hive timestamp to datetimev2(0) default.	2023-04-01 21:00:01 +08:00
jakevin	9e087622ab	[fix](Nereids): fix JoinReorderContext in withXXX() of LogicalJoin. (#18299 )	2023-04-01 16:51:27 +08:00
abmdocrt	365867a867	[feature](SSL) default enable SSL MySQL connection to FE (#18285 )	2023-03-31 21:31:23 +08:00
Mingyu Chen	7e61a85331	[refactor](libhdfs) introduce hadoop libhdfs (#18204 ) 1. Introduce hadoop libhdfs 2. For Linux-X86 platform, use the hadoop libhdfs 3. For other platform, use libhdfs3, because currently we don't have hadoop libhdfs binary for other platform Co-authored-by: adonis0147 <adonis0147@gmail.com>	2023-03-31 18:41:39 +08:00
mch_ucchi	3ea98b65df	[Fix](Nereids) fix nereids failed to parse set operation with query in parenthesis (#18062 ) sql like the format (q1, q2, q3 is a query): ``` sql (q1) UNION ALL (q2) UNION ALL (q3) ORDER BY keys ``` cannot be parsed by nereids, because order will be recognized as an alias of query, we add queryOrganization to avoid it.	2023-03-31 15:55:52 +08:00
morrySnow	1a56c56e90	[fix](planner) lateral view do not support lower case table name config (#18165 ) TableFunctionNode lower_case_table_names set to 1 and 2	2023-03-31 13:42:24 +08:00
yongkang.zhong	1c2f95b887	[improve](clickhouse jdbc) support clickhouse jdbc 4.x version (#18258 ) In clickhouse's 4.x version of jdbc, some UInt types use special Java types, so I adapted Doris's ClickHouse JDBC External ``` com.clickhouse.data.value.UnsignedByte; com.clickhouse.data.value.UnsignedInteger; com.clickhouse.data.value.UnsignedLong; com.clickhouse.data.value.UnsignedShort; ```	2023-03-31 13:40:10 +08:00
gitccl	20b3bdb000	[vectorized](function) support array_first_index function (#18175 ) mysql> select array_first_index(x->x+1>3, [2, 3, 4]); +-------------------------------------------------------------------+ \| array_first_index(array_map([x] -> x(0) + 1 > 3, ARRAY(2, 3, 4))) \| +-------------------------------------------------------------------+ \| 2 \| +-------------------------------------------------------------------+ mysql> select array_first_index(x -> x is null, [null, 1, 2]); +----------------------------------------------------------------------+ \| array_first_index(array_map([x] -> x(0) IS NULL, ARRAY(NULL, 1, 2))) \| +----------------------------------------------------------------------+ \| 1 \| +----------------------------------------------------------------------+ mysql> select array_first_index(x->power(x,2)>10, [1, 2, 3, 4]); +---------------------------------------------------------------------------------+ \| array_first_index(array_map([x] -> power(x(0), 2.0) > 10.0, ARRAY(1, 2, 3, 4))) \| +---------------------------------------------------------------------------------+ \| 4 \| +---------------------------------------------------------------------------------+	2023-03-31 12:51:29 +08:00
Pxl	307170030c	[Bug](materialized-view) fix core dump when create mv have case different with base table (#18206 ) fix core dump when create mv have case different with base table	2023-03-31 12:32:09 +08:00
zhangstar333	1b2aaab2f2	[vectorized](bug) fix some case in enable fold constant (#17997 ) fix some case in enable fold constant	2023-03-31 11:41:31 +08:00
Pxl	e7bcd970f5	[Bug](materialized-view) fix isDisableTuplesMVRewriter rreturn true when expr is literal (#18246 ) fix isDisableTuplesMVRewriter rreturn true when expr is literal	2023-03-31 11:30:47 +08:00
jakevin	8e15388074	[fix](Nereids): use CBO rule instead of using rewrite rule. (#18256 )	2023-03-31 11:23:26 +08:00
Kang	4e1e0ce06d	[bugfix](topn) fix topn optimzation wrong result for NULL values (#18121 ) 1. add PassNullPredicate to fix topn wrong result for NULL values 2. refactor RuntimePredicate to avoid using TCondition 3. refactor using ordering_exprs in fe and vsort_node	2023-03-31 10:01:34 +08:00
minghong	1abb19d0fd	filter estimation refactor (#18170 )	2023-03-31 08:49:38 +08:00
abmdocrt	a88e80f8ee	[fix](ssl)refactor some SSL info logs to debug logs (#18234 )	2023-03-31 08:41:02 +08:00
AKIRA	b5ea299697	[fix](planner) Fix agg on inlineview which with constant slot (#18201 ) Since slot that reference to constant has been marked as constant expr either, just add condition check to make sure such slot wouldn't be eliminated as constant from group exprs	2023-03-30 23:54:37 +08:00
jakevin	28793b6441	[fix](Nereids): fix copyIn() in Memo when useless project with groupplan (#18223 )	2023-03-30 23:49:21 +08:00
Ashin Gau	d6b0fe9072	[feature](jni) jni table scanner framework (#17960 ) A framework that read data from jni scanner, which can support the data source from java ecosystem(java API). ## Java Interface Java scanner should extends `org.apache.doris.jni.JniScanner`, implements the following methods: ``` // Initialize JniScanner public abstract void open() throws IOException; // Close JniScanner and release resources public abstract void close() throws IOException; // Scan data and save as vector table public abstract int getNext() throws IOException; ``` See demo usage in `org.apache.doris.jni.MockJniScanner` ## c++ interface C++ reader should use `doris::JniConnector` to get data from `org.apache.doris.jni.JniScanner`. See demo usage in `doris::MockJniReader`. ## Pushed-down predicates Java scanner can get pushed-down predicates by `org.apache.doris.jni.vec.ScanPredicate`. ## Remaining works: 1. Implement complex nested types. 2. Read hudi MOR table as the end-to-end demo usage.	2023-03-30 23:47:45 +08:00

... 29 30 31 32 33 ...

5755 Commits