doris

Author	SHA1	Message	Date
luozenglin	17c8123371	[test](regression) add some regression cases on constant evaluation. (#16599 )	2023-02-28 10:57:37 +08:00
zhangguoqiang	da2e9f4179	[improvement](test)Add nereids p0 pipeline trigger not required (#17193 )	2023-02-28 10:51:54 +08:00
Luzhijing	b0de8d1925	[doc][community]correct the number of committers (#16905 )	2023-02-28 10:48:06 +08:00
zhannngchen	00723e36cf	[enhancement](merge-on-write) add delete bitmap correctness check for single load (#17147 ) For Unique Key MoW table, if there are duplicate keys in one single load job and there's multiple segments, we need to calculate delete bitmap to mark these duplicate keys deleted. Add a check here to detect any bugs that might cause duplicate keys.	2023-02-28 10:06:36 +08:00
奕冷	049ecccc57	[feature-wip](BE http)Support BE http service using brpc (#16123 ) Now, streamload is not supported.	2023-02-28 09:59:29 +08:00
Jibing-Li	76e539dbda	[Improvement](multi catalog)(nereids)Support JDBC external table for new planner. (#17063 ) Support JDBC external table for Nereids planner. JDBC table is another type of table, like olap table, hms table and so on.	2023-02-28 09:43:04 +08:00
morrySnow	bf9997ae3d	[fix](Nereids) date/datetime foor and ceil should always nullable (#17188 )	2023-02-28 09:37:10 +08:00
xueweizhang	e0cd8599d2	[fix](delete) fix delete from bug which can get wrong result (#17146 ) 理论上，如果是两次独立的删除，比如delete from table where a=1; delete from table where a=2;其实这个地方应该可以使用的，但是目前的代码，是把所有不同版本的delete predicates和不同列的delete predicates都放到一起了，失去了版本信息、失去了谓词间可能是and的关系，统一弱化成了delete predicates都是独立的，有一个delete predicates满足条件，就把page都去掉。这个pr的修改方式，就是在当前代码的基础上，当只有一个delete predicate的时候才能保证后续淘汰page的正确性，所以这里一律加了 == 1的判断才传递delete predicates。如果要把不同版本的delete predicates和不同列的delete predicates作为完整和严谨的逻辑去判断page，需要修改的设计就有点多了，目前的方案算是一种优先解决bug的思路，后续可以进一步把delete predicates这块加速zone判断进行page淘汰的逻辑完善，提高delete predicates使用的场景。	2023-02-28 09:20:10 +08:00
Zhengguo Yang	b51ce415e7	[Feature](load) Add submitter and comments to load job (#16878 ) * [Feature](load) Add submitter and comments to load job	2023-02-28 09:06:19 +08:00
Jibing-Li	dd1bd6d8f1	[Fix](multi catalog)Support hive default partition. (#17179 ) Hive store all the data without partition columns to a default partition named __HIVE_DEFAULT_PARTITION__. Doris will fail to get the this partition when the partition column type is INT or something else that __HIVE_DEFAULT_PARTITION__ couldn't convert to. This pr is to support hive default partition, set the column value to NULL for the missing partition columns.	2023-02-28 00:08:29 +08:00
huangzhaowei	d3a6cab716	[Fix](MySQLLoad) Fix load a big local file bug since bytebuffer from mysql packet using the same byte array (#16901 ) Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array. And the later bytes will overwrite the previous one and make wrong bytes order among the network. Copy the byte array and then fill it into network.	2023-02-28 00:06:44 +08:00
zhannngchen	84413f33b8	[enhancement](merge-on-write) add skip_delete_bitmap session variable for debug purpose (#17127 )	2023-02-27 23:31:28 +08:00
Yusheng Xu	e8de07a6a5	[feature](cooldown) Forbid storage policy for MoW tables (#17148 ) * disable setting storage policy on MoW table * fix error in regression test * make the name of test table unique * use Strings.isNullOrEmpty to replace equals * fix error in if statement	2023-02-27 18:42:31 +08:00
yongjinhou	c807596c51	[Docs](docs) Modify plugin documents (#17161 ) * modify plugin docs * add qe_slow_log_ms description * add version describtion	2023-02-27 18:42:02 +08:00
奕冷	0db58800d3	[fix](stmt-forward) fix result missing (#17173 )	2023-02-27 18:01:43 +08:00
Xin Liao	d5b1d3403f	[fix](merge-on-write) fix that the version of delete bitmap is incorrect when calculate delete bitmap between segments (#17095 ) Different version numbers are used to calculate the delete bitmap between segments and rowsets, resulting in the failure of the last update of the delete bitmap.	2023-02-27 17:17:25 +08:00
Xin Liao	cec3d19dd2	[fix](regression) drop table before and after test for streamLoad_action case (#17164 )	2023-02-27 17:14:49 +08:00
Bowen Liang	29bf31c138	[chore](thirdparty) Show progress bar when downloading dependencies (#16736 )	2023-02-27 15:21:22 +08:00
Stalary	95837b7958	[Enhancement](ES): Support mapping es date format and replace simple json with jackson (#16806 ) * Support mapping es date format, default/yyyy-MM-dd HH:mm:ss/yyyy-MM-dd/epoch_millis * Replace simple json with jackson, resolve column order random problem * Add es array doc version	2023-02-27 14:47:21 +08:00
Pxl	b06f3da96c	[Bug] fix not close when pipeline context prepare failed (#17061 )	2023-02-27 14:24:39 +08:00
Pxl	f26f0a1059	[Regression Test] modify expectRelativeError from 1e-10 to 1e-8 (#17162 )	2023-02-27 14:23:28 +08:00
奕冷	c0360f80bb	[enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases (#15339 ) Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param, which enables to limit the number of elements in result array.	2023-02-27 14:22:30 +08:00
Pxl	0723e55f76	[Bug](build) fix compile fail on unused value #17165 error: variable 'nullcount' set but not used [-Werror,-Wunused-but-set-variable] int nullcount = 0;	2023-02-27 14:19:44 +08:00
huangzhaowei	2626995fc1	[Doc](Load)Add mysql load document (#16483 ) * Add doc * 1 * doc2 * review again * fix comment * fix comment * format * add recommand dir * cleint --local-infile * add streaming_load_max_mb	2023-02-27 13:25:34 +08:00
huangzhaowei	26ccb6ba5a	[feature-wip](MTMV) Add some metrics for MTMV (#16913 ) Demo: ``` # HELP doris_fe_mtmv_job Total job number of mtmv. # TYPE doris_fe_mtmv_job gauge doris_fe_mtmv_job{type="TOTAL-JOB"} 1 doris_fe_mtmv_job{type="ACTIVE-JOB"} 1 # HELP doris_fe_mtmv_task Running task number of mtmv. # TYPE doris_fe_mtmv_task gauge doris_fe_mtmv_task{type="RUNNING-TASK"} 0 doris_fe_mtmv_task{type="PENDING-TASK"} 0 doris_fe_mtmv_task{type="FAILED-TASK"} 0 doris_fe_mtmv_task{type="TOTAL-TASK"} 1 ```	2023-02-27 11:27:23 +08:00
yiguolei	33acaa067b	[refactor](mempool) remove mempool parameter from key decoder methods (#17137 ) decode method is only used for big int and other decode method is only used in unit test. I remove the useless method and we can remove mempool parameter from decode method. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-27 11:16:14 +08:00
TengJianPing	aab8dad191	[fix](sort) fix bug of sort (#17151 ) The logic of topn and full sort is wrong when there are both offsets and limits, the offset is not considered when doing the max heap optimization, which will lead to wrong result.	2023-02-27 10:55:12 +08:00
lihangyu	29dc08fc45	[Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. (#17124 ) * [Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. `_simdjson_set_column_value` could become a hot spot while parsing json in simdjson mode, introduce `_prev_positions` to cache results for previous row (keyed as index in JSON object) due to the json name field order, should be quite the same between each lines * fix case	2023-02-27 10:39:22 +08:00
starocean999	2d5f32caf1	[fix](nereids) dphyper join reorder may lost join condition in some case (#16995 ) when emitCsgCmp, we should check if there is some missed edges should be used as connection edge. If there is missed edge but can't be used as connection edge, the emitCsgCmp should return and seek for another plan.	2023-02-27 10:36:11 +08:00
WenYao	f228cfdd00	[enhancement](session-variable)add a use_fix_replica session variable to fix query replica (#17101 ) Add use_fix_replica session variable, so that we can be better debug replica inconsistencies problem. If use_fix_replica default is -1, which means not fix, else we will choose the {use_fix_replica} smallest replica.	2023-02-27 10:20:23 +08:00
Xinyi Zou	857d38e24b	[fix](scan) Default enable function(Like) pushdown #17154 function pushdown: #10355 NGram BloomFilter Index apply like pushdown: #11579 Enabled by default, make sure it stays active. If NGram BloomFilter Index is not used, this like pushdown can be replaced by #15917, which can push down all expressions including like.	2023-02-27 09:58:37 +08:00
DuRipeng	aefcc98715	[Enhancement](datetimev2-enhance) support 'microseconds_sub' function for datetimev2 (#17130 ) Based on #16970 , introduce microseconds_sub function for datetimev2	2023-02-27 08:47:30 +08:00
morrySnow	4d1f3b8abf	[fix](Nereids) mow unique table's preagg should work like duplicate table (#17028 )	2023-02-26 22:52:50 +08:00
morrySnow	469b6b8466	[enhancement](Nereids) datetime v2 type precision derive (#17079 )	2023-02-26 22:33:55 +08:00
jakevin	710529b060	[enhance](Nereids): refactor LogicalJoin. (#17099 )	2023-02-26 22:28:54 +08:00
Kang	7cb6c522b0	[Enhancement](array) vectorized string equal comparasion in array_contains function use StringRef instead of string_view operator == for vectorized impl for array_contains function. - test data: 10,000,000 rows with a ARRAY<STRING> column. There are 10 elements, average length 11 chars, in the array column in each row. - test SQL: `select count() from test_like_array where array_contains(s_arr, 'xxxxxxxx');` - test result: 0.76 sec vs. 0.52 sec, 30% time reduced	2023-02-26 19:42:26 +08:00
Mingyu Chen	e9619368e9	[fix](s3) fix SdkClientException: Multiple HTTP implementations were found on the classpath (#17136 )	2023-02-26 15:32:43 +08:00
Pxl	6bb721d86b	[Chore](build) fix some warning on code generate and webui #17078 [WARNING:gensrc/thrift/parquet.thrift:22] Uncaptured doctext at on line 18. [WARNING:gensrc/thrift/parquet.thrift:23] Uncaptured doctext at on line 22. [WARNING:gensrc/thrift/parquet.thrift:436] Uncaptured doctext at on line 428. WARNING in asset size limit: The following asset(s) exceed the recommended size limit (244 KiB).WARNING in asset size limit: The following asset(s) exceed the recommended size limit (244 KiB). This can impact web performance WARNING in entrypoint size limit: The following entrypoint(s) combined asset size exceeds the recommended limit Warning : Macro "NonTerminator" has been declared but never used.	2023-02-26 13:01:19 +08:00
plat1ko	0251cb8941	[fix](cooldown) Handle re-add replica with cooldowned data #17047 Modify rule of choosing cooldown replica, only alive replica can be cooldown replica. Handle re-add replica with cooldowned data.	2023-02-26 12:36:55 +08:00
zxealous	a0782a1855	[fix](file reader) fix be core in broker file reader (#17039 ) A const reference member variables as class member stores a temporary object, which cannot be got after the temporary object being destroyed, cause be core dump while enable debug level log _broker_addr has been destroyed in BrokerFileReader	2023-02-26 12:35:31 +08:00
zhangstar333	94927b3b1c	[vectorized](bug) fix open fold constant cause be core dump (#17055 ) add a defer in fold constant to close. add more type when call _get_result function in fold constant.3. fix in can't handle null. eg:select 1 in (2, NULL, 1); in java udf jni_ctx will be nullptr, so call close will be core dump. Describe your changes.	2023-02-26 12:30:03 +08:00
奕冷	5018223176	[Enhancement](stmt-forward) better error msg for follower fe #17132 The error log msg for the FE follower's forward to master failure is ambiguous as seen, so we should clarify it.	2023-02-26 12:28:33 +08:00
奕冷	605d840231	[improvement](log)enhance log msg of finding be policy failure (#17134 )	2023-02-26 11:52:25 +08:00
xueweizhang	d3a7cb8bde	[fix](stream_load) can abort 2pc stream load when table dropped #17088 when stream load with 2pc, the table was droped before commit, it will get error commit or abort, trasaction can not finish. if commit or abort ,will get error: { "status": "ANALYSIS_ERROR", "msg": "errCode = 7, detailMessage = unknown table, tableId=52579" } after this pr, i can abort success.	2023-02-26 11:20:41 +08:00
Bowen Liang	d8eb3ec6f7	fix set command example to `enable_pipeline_engine` (#17103 )	2023-02-26 11:06:04 +08:00
caoliang-web	14e80b18c8	Add csv file header filter documentation example (#17115 )	2023-02-26 11:05:45 +08:00
wangtianyi2004	32d08c9556	Update run-docker-cluster.md (#17116 )	2023-02-26 11:05:28 +08:00
ZhangJian He	8e179d3a54	[minor][typo] fix typo in load-clickbench-data script (#17133 )	2023-02-26 10:56:04 +08:00
Tiewei Fang	3a9aa03aab	[BugFix](oracle-catalog) Modify the doris data type mapping of oracle `NUMBER(p,s)` type (#17051 ) The data type `NUMBER(p,s)` of oracle has some different of doris decimal type in semantics. For Oracle Number(p,s) type： 1. if s<0 , it means this is an Interger. This `NUMBER(p,s)` has (p+\|s\| ) significant digit, and rounding will be performed at s position. eg: if we insert 1234567 into `NUMBER(5,-2)` type, then the oracle will store 1234500. In this case, Doris will use int type (`TINYINT/SMALLINT/INT/.../LARGEINT`). 2. if s>=0 && s<p , it just like doris Decimal(p,s) behavior. 3. if s>=0 && s>p, it means this is a decimal(like 0.xxxxx). p represents how many digits can be left to the left after the decimal point, the figure after the decimal point s will be rounded. eg: we can not insert 0.0123456 into `NUMBER(5,7)` type, because there must be two zeros on the right side of the decimal point, we can insert 0.0012345 into `NUMBER(5,7)` type. In this case, Doris will use `DECIMAL(s,s)` 4. if we don't specify p and s for `NUMBER(p,s)` like `NUMBER`, the p and s of `NUMBER` are uncertain. In this case, doris can not determine p and s, so doris can not determine data type.	2023-02-26 09:05:41 +08:00
Tiewei Fang	f6ce072297	[Enhencement](csv-reader) Optimize csv_reader `_split_value` and fix json_reader case sensitive (#17093 ) 1. Enhencement: For single-charset column separator，csv_reader use another method of `split value`. 2. BugFix Set `json` file format loading to be sensitive.	2023-02-26 09:03:04 +08:00

1 2 3 4 5 ...

8965 Commits