doris

Author	SHA1	Message	Date
yixiutt	0cada3f81d	[Enhancement](compaction) return error instead of core when ctx not valid (#18363 )	2023-04-04 12:27:13 +08:00
zhangstar333	54dbb4af67	[vectorzied](jdbc) refactor jdbc table read array type (#18187 ) jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object it's difficult to maintain and read the code, so change all database's array result to string, then add a cast function from string to doris array type	2023-04-04 11:57:04 +08:00
Xin Liao	418ea0a24e	[fix](merge-on-write) fix that failed to capture_consistent_rowsets when full clone (#18346 ) When full clone, if the max version of the local table is less than or equal to the max version of the clone table, there is no need to calculate the delete bitmap again.	2023-04-04 10:39:28 +08:00
zhangstar333	50e6c4216a	[vectorized](function) suppoort date_trunc function truncate week mode (#18334 ) support date_trunc could truncate week eg: select date_trunc('2023-4-3 19:28:30', 'week');	2023-04-04 10:24:26 +08:00
Gabriel	a724443eb9	[Improvement](predicate) optimize short-circuit predicates (#18278 ) For scan node with no vectorized predicate, the input column for the first short-circuit predicate is dense and we don't need to access the selector column. This PR improve performance by ~30% on TPCH Q3.	2023-04-04 10:21:41 +08:00
Lightman	af80e65094	[Improve](FileCahe) Support the file cache profile in olap scan node and Update the profile (#17710 ) We want to use file cache for caching cold data in S3. When reading them, we want to know where the data come from and the time taken to read the datas. So we support the metrics in olap scan node. And for clearing the information, i also update the fields about the metrics.	2023-04-04 10:18:30 +08:00
ZhangYu0123	8b85c55117	[vectorized](function) Support array_shuffle and shuffle function. (#18116 ) --------- Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-04-04 08:53:13 +08:00
Qi Chen	eb0fd0017e	[Fix](orc-reader) Fix the scale of decimal column is incorrect when query orc tables. (#18324 ) The scale of decimal column is incorrect when query orc tables.	2023-04-04 08:50:47 +08:00
wangbo	fc407f4afe	[improvement](executor) Reduce ScannnerCtx Scheduling times (#18306 ) * remove sche in scan operator	2023-04-03 22:54:34 +08:00
Jerry Hu	1e51af0784	[fix](scan) Avoid using incorrect cache code in ComparisonPredicate (#18332 ) * [fix](scan) Avoid using incorrect cache code in ComparisonPredicate * recovery the regression test	2023-04-03 20:37:35 +08:00
Xinyi Zou	dd78001cc1	[fix](memory) Fix memtable flush mem tracker #18330	2023-04-03 20:37:14 +08:00
ZhangYu0123	b627088e8c	[Optimization](String) Optimize q20 q21 q22 q23 LIKE_SUBSTRING (like '%xxx%') (#18309 ) Optimize q20, q21, q22, q23 LIKE_SUBSTRING (like '%xxxx%'). Idea is from clickhouse stringsearcher: Stringsearcher is about 10%~20% faster than volnitsky algorithm when needle size is less than 10 using two chars at beginning search in SIMD . Stringsearcher is faster than volnitsky algorithm, when needle size is less than 21. The changes are as follows: Using first two chars of needle at beginning search. We can compare two chars of needle and [n:n+17) chars in haystack in SIMD in one loop. Filter efficiency will be higher. When env support SIMD, we use stringsearcher. Test result in clickbench: q20 is about 15% up. q20: SELECT COUNT() FROM hits WHERE URL LIKE '%google%'; q21, q22 is about 1%~5% up. q21: SELECT SearchPhrase, MIN(URL), COUNT() AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10; q22: SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT() AS c, COUNT(DISTINCT UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10; q23 is about 30%~40% up and not stable. q23: SELECT FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;	2023-04-03 18:09:15 +08:00
ZhangYu0123	d4688620e9	[opt](array) optimize array_sortby using qsort instead of bubble sort #18311	2023-04-03 17:10:51 +08:00
Gabriel	368a2f7ace	[Bug](decimal) Fix string to decimal (#18282 )	2023-04-03 15:30:48 +08:00
Xin Liao	6677841b7e	[fix](merge-on-write) fix that failed to capture_consistent_rowsets when revise tablet meta (#18283 ) Should modify _timestamped_version_tracker firstly before capture_consistent_rowsets when update delete bitmap in revise_tablet_meta.	2023-04-03 13:02:34 +08:00
Liqf	961f5d1bb7	[feature](function)Add St_Angle/St_Azimuth function (#18293 ) Add St_Angle/St_azimuth function： St_Angle： Enter three point, which represent two intersecting lines. Returns the angle between these lines. Point 2 and point 1 represent the first line and point 2 and point 3 represent the second line. The angle between these lines is in radians, in the range [0, 2pi). The angle is measured clockwise from the first line to the second line. ` mysql> SELECT ST_Angle(ST_Point(1, 0),ST_Point(0, 0),ST_Point(0, 1)); +----------------------------------------------------------------------+ \| st_angle(st_point(1.0, 0.0), st_point(0.0, 0.0), st_point(0.0, 1.0)) \| +----------------------------------------------------------------------+ \| 4.71238898038469 \| +----------------------------------------------------------------------+ 1 row in set (0.04 sec) ` St_azimuth： Enter two point, and returns the azimuth of the line segment formed by points 1 and 2. The azimuth is the angle in radians measured between the line from point 1 facing true North to the line segment from point 1 to point 2. ` mysql> SELECT st_azimuth(ST_Point(0, 0),ST_Point(1, 0)); +----------------------------------------------------+ \| st_azimuth(st_point(0.0, 0.0), st_point(1.0, 0.0)) \| +----------------------------------------------------+ \| 1.5707963267948966 \| +----------------------------------------------------+ 1 row in set (0.04 sec)	2023-04-03 13:01:59 +08:00
Pxl	e77833bfa1	[Bug](materialized-view) fix where clause persistence replay incorrect (#18228 ) fix where clause persistence replay incorrect	2023-04-03 12:49:01 +08:00
zhangstar333	94e3472050	[bug](function) fix count equal function return incorrect value (#18200 ) fix count equal function return incorrect value	2023-04-03 11:20:36 +08:00
TengJianPing	7cd8f7c9ba	[fix](grouping) fix coredump of grouping function for outer join (#18292 ) Result of functions grouping and grouping_id is always not nullable, but outer join will convert the result column to nullable when necessary, which will cause mismatch of column type and column object when executing unctions grouping and grouping_id.	2023-04-03 09:35:31 +08:00
Xin Liao	b66e9f8906	[fix](load) handle null map right in OlapDataConvertor (#18236 ) The offset of _nullmap and _value are inconsistent in OlapDataConvertor, so the obtained null flag is incorrect when calling get_ data_ at function. When the key column or sequence column has null values, the encoding of the short key index or primary key index may be wrong. This was introduced by #10883 #10925.	2023-04-03 09:14:05 +08:00
Xinyi Zou	4b914c196a	[fix](expr pushdown) Fix VRuntimeFilterWrapper cannot get children #18289	2023-04-03 09:09:52 +08:00
Yongqiang YANG	419aa4f12a	[fix](thrift_server) do not check started state in ThriftServer::join (#18314 ) started may be set to false when server thread is stopped.	2023-04-02 19:24:41 +08:00
slothever	97aab138aa	[fix](parquet-reader) reset value idx in bool rle decoder and support iceberg datetime(3) (#18245 ) 1. Fix value idx in bool rle decoder 2. Iceberg table support datetimev2(3). In the previous version, we converted hive timestamp to datetimev2(0) default.	2023-04-01 21:00:01 +08:00
Xinyi Zou	5e7ea5e305	[fix](memory) Fix `bthread_setspecific` log fatal on UBSAN build (#18274 )	2023-03-31 19:46:53 +08:00
Mingyu Chen	7e61a85331	[refactor](libhdfs) introduce hadoop libhdfs (#18204 ) 1. Introduce hadoop libhdfs 2. For Linux-X86 platform, use the hadoop libhdfs 3. For other platform, use libhdfs3, because currently we don't have hadoop libhdfs binary for other platform Co-authored-by: adonis0147 <adonis0147@gmail.com>	2023-03-31 18:41:39 +08:00
yiguolei	a77921d767	[refactor](typesystem) remove unused rpc common file and using function rpc (#18270 ) rpc common is duplicate, all its method is included in function rpc. So that I remove it. get_field_type is never used, remove it. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-31 18:13:25 +08:00
Jerry Hu	22a705543b	[fix](string_ref) Incorrect result caused by the improperly comparing of StringRef on macOS with Apple silicon or using non-avx2 #18264 On macOS systems with Apple silicon, the '==' operator of StringRef uses string_compare, which takes StringRef as a C-String with null-terminated chars.	2023-03-31 15:11:11 +08:00
Xin Liao	c3e2269c4c	[fix](merge-on-write) fix that missed rows don't match merged rows for base compaction (#18262 )	2023-03-31 15:06:51 +08:00
yiguolei	1027abe0d3	[enhancement](query exec) should print error status when query meet error (#18247 ) If BE is in heavy load, the query may failed, but BE will try to connect to FE using thrift, if FE is also in heavy load the thrift connection will failed. And the status is rewritten at line 342, and the actual failure reason for the query is lost. Should print the error status every time during update. Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-31 14:08:24 +08:00
yongkang.zhong	1c2f95b887	[improve](clickhouse jdbc) support clickhouse jdbc 4.x version (#18258 ) In clickhouse's 4.x version of jdbc, some UInt types use special Java types, so I adapted Doris's ClickHouse JDBC External ``` com.clickhouse.data.value.UnsignedByte; com.clickhouse.data.value.UnsignedInteger; com.clickhouse.data.value.UnsignedLong; com.clickhouse.data.value.UnsignedShort; ```	2023-03-31 13:40:10 +08:00
gitccl	20b3bdb000	[vectorized](function) support array_first_index function (#18175 ) mysql> select array_first_index(x->x+1>3, [2, 3, 4]); +-------------------------------------------------------------------+ \| array_first_index(array_map([x] -> x(0) + 1 > 3, ARRAY(2, 3, 4))) \| +-------------------------------------------------------------------+ \| 2 \| +-------------------------------------------------------------------+ mysql> select array_first_index(x -> x is null, [null, 1, 2]); +----------------------------------------------------------------------+ \| array_first_index(array_map([x] -> x(0) IS NULL, ARRAY(NULL, 1, 2))) \| +----------------------------------------------------------------------+ \| 1 \| +----------------------------------------------------------------------+ mysql> select array_first_index(x->power(x,2)>10, [1, 2, 3, 4]); +---------------------------------------------------------------------------------+ \| array_first_index(array_map([x] -> power(x(0), 2.0) > 10.0, ARRAY(1, 2, 3, 4))) \| +---------------------------------------------------------------------------------+ \| 4 \| +---------------------------------------------------------------------------------+	2023-03-31 12:51:29 +08:00
Pxl	307170030c	[Bug](materialized-view) fix core dump when create mv have case different with base table (#18206 ) fix core dump when create mv have case different with base table	2023-03-31 12:32:09 +08:00
zhangstar333	1b2aaab2f2	[vectorized](bug) fix some case in enable fold constant (#17997 ) fix some case in enable fold constant	2023-03-31 11:41:31 +08:00
zclllyybb	f800ba8f4c	[Exec](opt) Optimize function call for const columns (#18212 )	2023-03-31 11:36:21 +08:00
lihangyu	35bae25568	[Improve](row store) add more profile info in log for point query and make row column page size more configurable (#18181 ) save about 20% FE cpu cost for point query with prepared statement which table contains 100 columns	2023-03-31 10:58:59 +08:00
camby	7d92bf095a	[fix](expr) refractor create_tree_from_thrift to avoid stack overflow (#18214 )	2023-03-31 10:38:20 +08:00
Kang	4e1e0ce06d	[bugfix](topn) fix topn optimzation wrong result for NULL values (#18121 ) 1. add PassNullPredicate to fix topn wrong result for NULL values 2. refactor RuntimePredicate to avoid using TCondition 3. refactor using ordering_exprs in fe and vsort_node	2023-03-31 10:01:34 +08:00
HappenLee	8be43857ef	[feature](executor) Add memory limit for pip_scanner_context (#18238 ) Co-authored-by: wangbo <506340561@qq.com>	2023-03-31 09:36:57 +08:00
Xinyi Zou	e5793249cd	[opt](hashtable) Modify default filled strategy to 75% (#18242 )	2023-03-31 09:28:11 +08:00
lihangyu	e0f6083e73	[refactor](dynamic table) add `get_type_as_tprimitive_type` and `get_type_as_primitive_type` in IDataType to get `PrimitiveType` and `TPrimitiveType` (#18260 )	2023-03-31 09:03:06 +08:00
Ashin Gau	d6b0fe9072	[feature](jni) jni table scanner framework (#17960 ) A framework that read data from jni scanner, which can support the data source from java ecosystem(java API). ## Java Interface Java scanner should extends `org.apache.doris.jni.JniScanner`, implements the following methods: ``` // Initialize JniScanner public abstract void open() throws IOException; // Close JniScanner and release resources public abstract void close() throws IOException; // Scan data and save as vector table public abstract int getNext() throws IOException; ``` See demo usage in `org.apache.doris.jni.MockJniScanner` ## c++ interface C++ reader should use `doris::JniConnector` to get data from `org.apache.doris.jni.JniScanner`. See demo usage in `doris::MockJniReader`. ## Pushed-down predicates Java scanner can get pushed-down predicates by `org.apache.doris.jni.vec.ScanPredicate`. ## Remaining works: 1. Implement complex nested types. 2. Read hudi MOR table as the end-to-end demo usage.	2023-03-30 23:47:45 +08:00
HappenLee	1d2dbe7898	[Bug][Pipeline] Run clickbench dead lock in pipeline exec engine (#18211 ) In pipeline exec engine run clickbench may dead lock in some query	2023-03-30 21:41:57 +08:00
Mingyu Chen	1050df7076	[fix](fs) fix local file system copy bug (#18243 ) `copy_dirs` has a bug that will cause infinity iteration	2023-03-30 21:36:07 +08:00
amory	ea41d94582	[Improve](complex-type) Support Count(complexType) (#17868 ) Support count function for ARRAY/MAP/STRUCT type	2023-03-30 15:43:32 +08:00
huanghaibin	e3bd812887	[fix](stream-load) find line delimiter in csv should start with no offset (#18161 ) when loading big file with multi bytes line delimiter, some line record maybe incomplete because of _output_buf_limit, so this incomplete data will move to the beginning of the output buf and read more data into output buf. In this case, find line delimiter should start with no offset to avoid a bug that spilt two lines as one line.	2023-03-30 14:42:34 +08:00
Gabriel	b7af110f61	[Bug](bloomfilter) Fix bloom filter for date type (#18205 )	2023-03-30 14:15:06 +08:00
zhangstar333	525f15dddf	[vectorized](function) support array_sortby function (#18071 )	2023-03-30 11:07:49 +08:00
TengJianPing	9877143210	[fix](like) fix wrong result of like pattern with backslash (#18039 ) Result is empty for query select * from person where address like '%\\\\%';, but MySQL can get a line of result. CREATE TABLE `person` ( `id` int(11) NULL, `name` text NULL, `age` int(11) NULL, `class` int(11) NULL, `address` text NULL ) ENGINE=OLAP UNIQUE KEY(`id`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`id`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ); insert into person values (10001,'test1',30,2,'test\\\\,xxx'); Adding logs: select * from person where address like '%\\\\%'; I0323 10:26:15.907760 2387043 like.cpp:558] arg str: %\\%, size: 4, pattern LIKE_ENDS_WITH_RE: (?:%+)(((\\%)\|(\\_)\|([^%_]))+), size: 30 I0323 10:26:15.907789 2387043 like.cpp:562] match 0: \\%, size: 3 I0323 10:26:15.907801 2387043 like.cpp:562] match 1: \%, size: 2 I0323 10:26:15.907811 2387043 like.cpp:562] match 2: \%, size: 2 I0323 10:26:15.907821 2387043 like.cpp:562] match 3: , size: 0 I0323 10:26:15.907830 2387043 like.cpp:562] match 4: \, size: 1 I0323 10:26:15.907842 2387043 like.cpp:615] search_string : \\% I0323 10:26:15.907855 2387043 like.cpp:619] search_string escape removed: \% It matchs against the LIKE_ENDS_WITH_RE which is wrong, the meaning of the sql should be: match strings that have one backslash in any place.	2023-03-30 11:05:09 +08:00
yiguolei	a1114d46e8	[refactor](unify type system) remove switch case in histogram helper (#18222 )	2023-03-30 10:54:08 +08:00
Lijia Liu	2ee1468576	[improvement](executor) Support task group schedule in pipeline engine (#17615 )	2023-03-30 10:49:50 +08:00

1 2 3 4 5 ...

4186 Commits