doris

Author	SHA1	Message	Date
yixiutt	87e83081ff	[test](compaction) add delete test (#18335 )	2023-04-04 12:28:19 +08:00
yixiutt	0cada3f81d	[Enhancement](compaction) return error instead of core when ctx not valid (#18363 )	2023-04-04 12:27:13 +08:00
zhangstar333	54dbb4af67	[vectorzied](jdbc) refactor jdbc table read array type (#18187 ) jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object it's difficult to maintain and read the code, so change all database's array result to string, then add a cast function from string to doris array type	2023-04-04 11:57:04 +08:00
Xin Liao	418ea0a24e	[fix](merge-on-write) fix that failed to capture_consistent_rowsets when full clone (#18346 ) When full clone, if the max version of the local table is less than or equal to the max version of the clone table, there is no need to calculate the delete bitmap again.	2023-04-04 10:39:28 +08:00
Mingyu Chen	2a301eb437	[deps](arrow) update arrow download link (#18360 )	2023-04-04 10:39:04 +08:00
zhangstar333	50e6c4216a	[vectorized](function) suppoort date_trunc function truncate week mode (#18334 ) support date_trunc could truncate week eg: select date_trunc('2023-4-3 19:28:30', 'week');	2023-04-04 10:24:26 +08:00
Gabriel	a724443eb9	[Improvement](predicate) optimize short-circuit predicates (#18278 ) For scan node with no vectorized predicate, the input column for the first short-circuit predicate is dense and we don't need to access the selector column. This PR improve performance by ~30% on TPCH Q3.	2023-04-04 10:21:41 +08:00
yongkang.zhong	6231ca80f7	[improve](clickhouse catalog) Add `"` wrap select column for the sql query clickhouse jdbc (#18352 )	2023-04-04 10:19:24 +08:00
Lightman	af80e65094	[Improve](FileCahe) Support the file cache profile in olap scan node and Update the profile (#17710 ) We want to use file cache for caching cold data in S3. When reading them, we want to know where the data come from and the time taken to read the datas. So we support the metrics in olap scan node. And for clearing the information, i also update the fields about the metrics.	2023-04-04 10:18:30 +08:00
minghong	3e7a9424e4	[feature](nereids) explain shape plan (#18296 ) `explain shape plan select ...` only print plan shape related information, including - node name - join type, join condition - filter condition - agg phase It is painful to maintain regression cases using explain since there are a lot of mutable information, like slot id. By this pr, we could use explain shape plan in regression cases. for exmaple: this is tpch q2 +-----------------------------------------------------------------------------------------------------------+ \| Explain String \| +-----------------------------------------------------------------------------------------------------------+ \| PhysicalTopN \| \| --PhysicalDistribute \| \| ----PhysicalTopN \| \| ------PhysicalProject \| \| --------filter((cast(ps_supplycost as DECIMAL(27, 9)) = min(ps_supplycost) OVER(PARTITION BY p_partkey))) \| \| ----------PhysicalWindow \| \| ------------PhysicalQuickSort \| \| --------------PhysicalProject \| \| ----------------hashJoin[INNER_JOIN](supplier.s_suppkey = partsupp.ps_suppkey) \| \| ------------------PhysicalProject \| \| --------------------hashJoin[INNER_JOIN](part.p_partkey = partsupp.ps_partkey) \| \| ----------------------PhysicalProject \| \| ------------------------PhysicalOlapScan[partsupp] \| \| ----------------------PhysicalProject \| \| ------------------------filter((part.p_size = 15)(p_type like '%BRASS')) \| \| --------------------------PhysicalOlapScan[part] \| \| ------------------PhysicalDistribute \| \| --------------------hashJoin[INNER_JOIN](supplier.s_nationkey = nation.n_nationkey) \| \| ----------------------PhysicalOlapScan[supplier] \| \| ----------------------PhysicalDistribute \| \| ------------------------hashJoin[INNER_JOIN](nation.n_regionkey = region.r_regionkey) \| \| --------------------------PhysicalProject \| \| ----------------------------PhysicalOlapScan[nation] \| \| --------------------------PhysicalDistribute \| \| ----------------------------PhysicalProject \| \| ------------------------------filter((region.r_name = 'EUROPE')) \| \| --------------------------------PhysicalOlapScan[region] \| +-----------------------------------------------------------------------------------------------------------+	2023-04-04 09:44:15 +08:00
xueweizhang	798d2e5160	[fix](catalog) all properties should be checked when create unpartitioned table (#18149 ) all properties should be checked when create unpartitioned table like partitioned table. Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-04-04 08:53:45 +08:00
ZhangYu0123	8b85c55117	[vectorized](function) Support array_shuffle and shuffle function. (#18116 ) --------- Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-04-04 08:53:13 +08:00
Qi Chen	eb0fd0017e	[Fix](orc-reader) Fix the scale of decimal column is incorrect when query orc tables. (#18324 ) The scale of decimal column is incorrect when query orc tables.	2023-04-04 08:50:47 +08:00
wangbo	fc407f4afe	[improvement](executor) Reduce ScannnerCtx Scheduling times (#18306 ) * remove sche in scan operator	2023-04-03 22:54:34 +08:00
starocean999	88c5e64c4a	[fix](nereids) fix bug of SelectMaterializedIndexWithAggregate rule (#18265 ) 1. create a project node to adjust the output column position when a mv is selected in olap scan node 2. pass SlotReference's column info when call Alias's toSlot() method 3. should compare plan's logical properties when compare two plans after rewrite	2023-04-03 22:32:43 +08:00
Jerry Hu	1e51af0784	[fix](scan) Avoid using incorrect cache code in ComparisonPredicate (#18332 ) * [fix](scan) Avoid using incorrect cache code in ComparisonPredicate * recovery the regression test	2023-04-03 20:37:35 +08:00
Xinyi Zou	dd78001cc1	[fix](memory) Fix memtable flush mem tracker #18330	2023-04-03 20:37:14 +08:00
yongkang.zhong	fe9d2b00fc	[test](jdbc catalog) add clickhouse jdbc catalog base type test (#18007 )	2023-04-03 20:18:36 +08:00
ZhangYu0123	b627088e8c	[Optimization](String) Optimize q20 q21 q22 q23 LIKE_SUBSTRING (like '%xxx%') (#18309 ) Optimize q20, q21, q22, q23 LIKE_SUBSTRING (like '%xxxx%'). Idea is from clickhouse stringsearcher: Stringsearcher is about 10%~20% faster than volnitsky algorithm when needle size is less than 10 using two chars at beginning search in SIMD . Stringsearcher is faster than volnitsky algorithm, when needle size is less than 21. The changes are as follows: Using first two chars of needle at beginning search. We can compare two chars of needle and [n:n+17) chars in haystack in SIMD in one loop. Filter efficiency will be higher. When env support SIMD, we use stringsearcher. Test result in clickbench: q20 is about 15% up. q20: SELECT COUNT() FROM hits WHERE URL LIKE '%google%'; q21, q22 is about 1%~5% up. q21: SELECT SearchPhrase, MIN(URL), COUNT() AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10; q22: SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT() AS c, COUNT(DISTINCT UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10; q23 is about 30%~40% up and not stable. q23: SELECT FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;	2023-04-03 18:09:15 +08:00
yongkang.zhong	eb6dbc03e0	[typo](docs) add regression test doc & fix api doc (#18329 )	2023-04-03 17:40:41 +08:00
ZhangYu0123	d4688620e9	[opt](array) optimize array_sortby using qsort instead of bubble sort #18311	2023-04-03 17:10:51 +08:00
Gabriel	96a64dc9e8	[Improvement](pipeline) Use bloom runtime filter by default for pipeline engine (#18177 )	2023-04-03 15:31:48 +08:00
Gabriel	368a2f7ace	[Bug](decimal) Fix string to decimal (#18282 )	2023-04-03 15:30:48 +08:00
caoliang-web	3078ee1854	[regression](decimalv3)Add decimal type as filter condition in regression test (#17160 ) Add decimal type as filter condition in regression test	2023-04-03 14:20:09 +08:00
yongjinhou	aff260c06f	[Enhancement](HttpServer) Support https interface (#16834 ) 1. Organize http documents 2. Add http interface authentication for FE 3. Support https interface for FE 4. Provide authentication interface 5. Add http interface authentication for BE 6. Support https interface for BE	2023-04-03 14:18:17 +08:00
Mingyu Chen	ecd3fd07f6	[feature](colocate) support cross database colocate join (#18152 )	2023-04-03 14:03:42 +08:00
Jibing-Li	e260dca7a1	[Improvement](multi catalog)Change hive metastore cache split value type to Doris defined Split. Fix split file length -1 bug (#18319 ) HiveMetastoreCache type for file split was Hadoop InputSplit. In this pr, change it to Doris defined Split This change could avoid convert it every time. Also fix the explain verbose result return -1 for split file length.	2023-04-03 13:54:28 +08:00
Xin Liao	6677841b7e	[fix](merge-on-write) fix that failed to capture_consistent_rowsets when revise tablet meta (#18283 ) Should modify _timestamped_version_tracker firstly before capture_consistent_rowsets when update delete bitmap in revise_tablet_meta.	2023-04-03 13:02:34 +08:00
Liqf	961f5d1bb7	[feature](function)Add St_Angle/St_Azimuth function (#18293 ) Add St_Angle/St_azimuth function： St_Angle： Enter three point, which represent two intersecting lines. Returns the angle between these lines. Point 2 and point 1 represent the first line and point 2 and point 3 represent the second line. The angle between these lines is in radians, in the range [0, 2pi). The angle is measured clockwise from the first line to the second line. ` mysql> SELECT ST_Angle(ST_Point(1, 0),ST_Point(0, 0),ST_Point(0, 1)); +----------------------------------------------------------------------+ \| st_angle(st_point(1.0, 0.0), st_point(0.0, 0.0), st_point(0.0, 1.0)) \| +----------------------------------------------------------------------+ \| 4.71238898038469 \| +----------------------------------------------------------------------+ 1 row in set (0.04 sec) ` St_azimuth： Enter two point, and returns the azimuth of the line segment formed by points 1 and 2. The azimuth is the angle in radians measured between the line from point 1 facing true North to the line segment from point 1 to point 2. ` mysql> SELECT st_azimuth(ST_Point(0, 0),ST_Point(1, 0)); +----------------------------------------------------+ \| st_azimuth(st_point(0.0, 0.0), st_point(1.0, 0.0)) \| +----------------------------------------------------+ \| 1.5707963267948966 \| +----------------------------------------------------+ 1 row in set (0.04 sec)	2023-04-03 13:01:59 +08:00
Pxl	e77833bfa1	[Bug](materialized-view) fix where clause persistence replay incorrect (#18228 ) fix where clause persistence replay incorrect	2023-04-03 12:49:01 +08:00
zhangstar333	94e3472050	[bug](function) fix count equal function return incorrect value (#18200 ) fix count equal function return incorrect value	2023-04-03 11:20:36 +08:00
AKIRA	ce4dc681be	[test](stats) Test framework for stats estimation on TPCH-1G dataset (#18267 ) Implement a test framework for stats estimation on TPCH-1G dataset to ensure accuracy	2023-04-03 11:01:57 +08:00
WenYao	2bce4db81a	[Enchancement](mysql-compatable) add regression-test for MySQLdump #18208 add regression-test for like this: mysqldump -h127.0.0.1 -P9030 -uroot --no-tablespaces --databases > /backup/mysqldump/test.db To prevent errors Unknown table 'column_statistics' in information_schema (1109), the table information_schema.column_statistics was added.	2023-04-03 09:49:07 +08:00
TengJianPing	7cd8f7c9ba	[fix](grouping) fix coredump of grouping function for outer join (#18292 ) Result of functions grouping and grouping_id is always not nullable, but outer join will convert the result column to nullable when necessary, which will cause mismatch of column type and column object when executing unctions grouping and grouping_id.	2023-04-03 09:35:31 +08:00
Xin Liao	b66e9f8906	[fix](load) handle null map right in OlapDataConvertor (#18236 ) The offset of _nullmap and _value are inconsistent in OlapDataConvertor, so the obtained null flag is incorrect when calling get_ data_ at function. When the key column or sequence column has null values, the encoding of the short key index or primary key index may be wrong. This was introduced by #10883 #10925.	2023-04-03 09:14:05 +08:00
minghong	b9381570d6	[feature](nereids) semi and anti join estimation (#18129 ) in this pr, we add a new algorithm to estimate semi/anti join row count. In original alg., we reduce row count from cross join. usually, this is not good. for example, L left semi join R on L.a=R.a suppose L is larger than R, and ndv(L.a) < ndv(R.a) the estimated row count is rowcount(R) * rowcount(L) / ndv(R.a). in most cases, the estimated row count is larger than rowcount(L). in new alg, we use ndv(R.a)/originalNdv(R.a) to estimate result rowCount. the basic idea is as following: 1. Suppose ndv(R.a) reduced from m to n. 2. Assume that the value space of L.a is the same as R.a if R.a is not filtered.(this assumption is also hold in original alg.) regard `L left join R` as a filter applied on L, that is, if L.a is in R.a, then this tuple stays in result. R.a shrinks to m/n, so L.a also shrinks to m/n	2023-04-03 09:11:10 +08:00
Xinyi Zou	4b914c196a	[fix](expr pushdown) Fix VRuntimeFilterWrapper cannot get children #18289	2023-04-03 09:09:52 +08:00
ZhangYu0123	03e49b986d	[fix](conf) fix be JAVA_OPTS conf #18305 Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-04-03 09:07:13 +08:00
Mingyu Chen	03fc41ea51	[doc](catalog) add faq for hive catalog (#18298 )	2023-04-03 09:01:49 +08:00
Yongqiang YANG	8011bdb30d	[improvement](test) print exception when streamload fails (#18315 )	2023-04-03 08:56:54 +08:00
Mingyu Chen	7131c60e05	[fix](audit-log) fixslow query missing in audit log (#18317 ) #17738 changed the column name in audit log, causing "slow_query" will not be recorded in fe.audit.log	2023-04-03 08:52:14 +08:00
mch_ucchi	4fcd93ac00	[Enhancement](Nereids)add datelikev2 type support for fold constant. #18275 add datelikev2 type support for fold constant. date_add / years_add / mouths_add / days_add / hours_add / minutes_add / seconds_add and xxx_sub.	2023-04-03 08:47:47 +08:00
Yongqiang YANG	ff66efd7d0	[improvement](test) print response of streamload (#18313 ) We need reponse text to reason failures of streamload.	2023-04-02 20:08:28 +08:00
Yongqiang YANG	419aa4f12a	[fix](thrift_server) do not check started state in ThriftServer::join (#18314 ) started may be set to false when server thread is stopped.	2023-04-02 19:24:41 +08:00
morrySnow	04929ff6d4	[fix](doc) suggest use window function to replace running_difference (#18281 )	2023-04-02 16:35:10 +08:00
Jack Drogon	7d49d9cf99	[improvement](dynamic partition) Fix dynamic partition no bucket (#18300 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-04-02 15:51:21 +08:00
slothever	97aab138aa	[fix](parquet-reader) reset value idx in bool rle decoder and support iceberg datetime(3) (#18245 ) 1. Fix value idx in bool rle decoder 2. Iceberg table support datetimev2(3). In the previous version, we converted hive timestamp to datetimev2(0) default.	2023-04-01 21:00:01 +08:00
jakevin	9e087622ab	[fix](Nereids): fix JoinReorderContext in withXXX() of LogicalJoin. (#18299 )	2023-04-01 16:51:27 +08:00
abmdocrt	365867a867	[feature](SSL) default enable SSL MySQL connection to FE (#18285 )	2023-03-31 21:31:23 +08:00
Xinyi Zou	5e7ea5e305	[fix](memory) Fix `bthread_setspecific` log fatal on UBSAN build (#18274 )	2023-03-31 19:46:53 +08:00

1 2 3 4 5 ...

9698 Commits