doris

Author	SHA1	Message	Date
YueW	c19e35116b	[fix](inverted index)fix transaction id not unique for one index change job when light index change (#21180 )	2023-06-26 19:54:05 +08:00
lihangyu	50c1d55769	[Improve](dynamic schema) support filtering invalid data (#21160 ) * [Improve](dynamic schema) support filtering invalid data 1. Support dynamic schema to filter illegal data. 2. Expand the regular expression for ColumnName to support more column names. 3. Be compatible with PropertyAnalyzer and support legacy tables. 4. Default disable parse multi dimenssion array, since some bug unresolved	2023-06-26 19:32:43 +08:00
yagagagaga	05d94e5a4c	[typo](docs) add a create table as select sample (#21078 )	2023-06-26 19:27:05 +08:00
yuanyuan8983	eb2a08bdf2	[typo](docs) Update the audit document (#21185 )	2023-06-26 19:25:10 +08:00
YueW	65d81c04e6	[Docs](inverted index) update docs for build index (#21184 )	2023-06-26 19:24:44 +08:00
zy-kkk	839ad8786a	[typo](docs) improvement SQL manual ddl drop doc (#21188 )	2023-06-26 18:51:28 +08:00
zy-kkk	986f3b2176	[typo](docs) improvement SQL manual ddl alter doc (#21179 )	2023-06-26 18:17:01 +08:00
zy-kkk	5ebac73a93	[typo](docs) improvement SQL manual ddl create doc (#21181 )	2023-06-26 18:16:50 +08:00
zy-kkk	9c5a0cc471	[bug](jdbc catalog) fix getPrimaryKeys fun bug (#21137 )	2023-06-26 17:13:50 +08:00
jakevin	cdc2d42c3a	[refactor](Nereids): adjust order of rewrite rules. (#21133 ) Put the rules that eliminate plan in front to avoid block other rules, so we can avoid to invoke pushdown filter/limit again	2023-06-26 16:47:33 +08:00
HappenLee	5fdd9b9254	[Bug](RuntimeFiter) Fix bf error change the murmurhash to crc32 in regression test p2 (#21167 )	2023-06-26 16:39:45 +08:00
starocean999	102b7f8873	remove useless case (#21166 )	2023-06-26 16:27:32 +08:00
starocean999	f2ed1bce1a	[fix](nereids)change PushdownFilterThroughProject post processor from bottom up to top down rewrite (#21125 ) 1. pass physicalProperties in withChildren function 2. use top down traverse in PushdownFilterThroughProject post processor	2023-06-26 15:34:41 +08:00
YueW	960e04b0ed	[fix](inverted index) fix build inverted index failed but not return immediately (#21165 )	2023-06-26 14:05:12 +08:00
slothever	2b3c82f57a	[fix](multi-catalog)fix max compute scanner OOM and datetime (#20957 ) 1. Fix MC jni scanner OOM 2. add the second datetime type for MC SDK timestamp 3. make s3 uri case insensitive by the way 4. optimize max compute scanner parallel model	2023-06-26 13:53:29 +08:00
slothever	d4240ac21b	[fix](multi-catalog)add oss sdk, supported oss properties (#21029 )	2023-06-26 13:00:44 +08:00
Armando Zhu	5d2b69b06d	[Enhancement](regression) let test case fail fast when job is cancelled (#20578 ) (#21103 ) In doris regression-test/suites, a lot of test cases quit immediately only if "FINISHED", otherwise they will wait till timeout. For example: while (max_try_secs--) { String res = getJobState(tbName1) if (res == "FINISHED") { sleep(3000) break } else { Thread.sleep(1000) if (max_try_secs < 1) { println "test timeout," + "state:" + res assertEquals("FINISHED", res) } } } This PR added checks so that these test cases can quit immediately also if "CANCELLED", which is the only unchanging status other than "FINISHED".	2023-06-26 12:58:51 +08:00
ZhangYu0123	66005570c9	[fix](regression) fix p1 test_backup_restore fail caused by http download 401 invalid token error #21107	2023-06-26 12:56:46 +08:00
Mingyu Chen	1dec592e91	[improvement](fs_bench) optimize the usage of fs benchmark tool for hdfs (#21154 ) Optimize the usage of fs benchmark tool: 1. Remove `Open` benchmark, it is useless. 2. Remove `Delete` benchmark, it is dangerous. 3. Add `SingleRead` benchmark, user can specify an exist file to test read operation: `sh bin/run-fs-benchmark.sh --conf=conf/hdfs_read.conf --fs_type=hdfs --operation=single_read` 4. Modify the `run-fs-benchmark.sh`, remove `OPTS` section, use options in `fs_benchmark_tool` directly 5. Add some custom counters in the benchmark result, eg: ``` -------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------------------- HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 6864 ms 2385 ms 1 ReadRate=200.936M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3919 ms 1828 ms 1 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3839 ms 1819 ms 1 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_mean 4874 ms 2011 ms 3 ReadRate=304.054M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_median 3919 ms 1828 ms 3 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_stddev 1724 ms 324 ms 3 ReadRate=89.3768M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_cv 35.37 % 16.11 % 3 ReadRate=29.40% HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_max 6864 ms 2385 ms 3 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_min 3839 ms 1819 ms 3 ReadRate=200.936M/s ``` - For `open_read` and `single_read`, add `ReadRate` as `bytes per second`. - For `create_write`, add `WriteRate` as `bytes per second`. - For `exists` and `rename`, add `ExistsCost` and `RenameCost` as `time cost per one operation`.	2023-06-26 11:37:14 +08:00
Mingyu Chen	1138ed1d70	[doc](catalog) update and improve doc of multi catalog (#21105 ) Update the document of multi catalog feature.	2023-06-26 11:36:44 +08:00
Kang	2e6d91aa99	[chore](block) temporarily disable DCHECK for column name equality in MutableBlock (#21116 ) * tempororyly disable DCHECK for column name equality in MutableBlock::add_rows * num columns EQ to LE	2023-06-26 10:49:27 +08:00
yiguolei	28abeef72b	[performace](colddata) opt cold data read performance (#21141 ) In #10370, we try to opt string evaluate performance by rewrite the predicate using dict value. But it has to check if the string column is full dict encoding. So that we add a logic to read the last page of the string column to check it. But it has some bad performance for cold data because it has to load the column's ordinal index and zone map index. In some scenario for example, select * from table where pk_col=1. If the query condition is primary key, the result maybe just a few rows but the result may have 100 columns, it will cost a lot of time to load these indices. We could find a lot of time is spending on block_init_time. In my test, a table with 50 string columns and query with primary key. The first read time will reduce from 220ms to 40ms.	2023-06-26 10:39:20 +08:00
TengJianPing	baf9a2107b	[fix](regression) fix case failure by adding sync after stream load (#21155 )	2023-06-26 10:38:46 +08:00
Xinyi Zou	6f7759b08d	[fix](memory) fix mem tracker grace exit (#21136 )	2023-06-26 10:28:24 +08:00
zy-kkk	880252984b	[typo](docs) fix jdbc catalog doc example err (#21152 )	2023-06-26 10:14:17 +08:00
caiconghui	f8ef4ed18f	[fix](log4j) fix some issues when modify log config (#21099 ) Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-06-26 08:46:33 +08:00
Mingyu Chen	af51a31c21	[deps](benchmark) bump benchmakr from 1.5.6 -> 1.8.0 (#21121 ) To support some new methods used in #21074	2023-06-25 23:42:54 +08:00
Pxl	0122aa79df	[Chore](vectorized) remove all isVectorized (#21076 ) isVectorized is always true now	2023-06-25 23:13:34 +08:00
starocean999	58b3e5ebdb	[fix](nereids)scan node's smap should use materiazlied slots and project list as left and right expr list (#21142 )	2023-06-25 22:34:43 +08:00
yuxuan-luo	8f7a62c79b	[improvement](mutil-catalog) PaimonColumnValue support short and Decimal (#20723 )	2023-06-25 22:31:38 +08:00
Xiangyu Wang	2c2d56e8a0	[Feature](broker-load) Add priority info for ShowLoadStmt. (#20984 ) Following pr #20628 , add priority information of the load job.	2023-06-25 22:11:21 +08:00
airborne12	1ac8cdec7e	[Fix](inverted index) fix inverted query cache for chinese tokenizer (#21106 ) 1. query cache for chinese tokenizer is confusing when just converting w_char to char. 2. seperate query_type from inverted_index_reader to clean code.	2023-06-25 22:04:02 +08:00
yiguolei	64790a3a86	[bugfix](workloadgroup) could not upgrade from 2.0 alpha (#21149 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-06-25 22:02:53 +08:00
minghong	2d1163c4d8	[refactor](nereids) update Agg stats derive method #21036 This pr has no effect on tpch queries. Some tpcds queries are impacted. They are 4/11/23/24/47/51/57/65/74, in which 4 and 51 are improved	2023-06-25 21:47:32 +08:00
minghong	34b048a2bd	[fix](nereids) update outer join estimation #21126 the row count of left outer join should be no less than left child row count.	2023-06-25 21:37:55 +08:00
Xiangyu Wang	af2b67e65a	[Fix](multi-catalog) Invalidate cache when enable auto refresh catalog. (#21070 ) The default value of RefreshCatalogStmt.invalidCache is false now, but the RefreshManager.RefreshTask does not invoke RefreshCatalogStmt.analyze() so it will not invalidate the cache. This pr mainly fix this problem	2023-06-25 19:14:44 +08:00
AKIRA	638aa41988	[fix](planner) fix push filter through agg #21080 In the previous implementation, the check for groupby exprs was ignored. Add this necessary check to make sure it would work You could reproduce it by runnning belowing sql: CREATE TABLE t_push_filter_through_agg (col1 varchar(11451) not null, col2 int not null, col3 int not null) UNIQUE KEY(col1) DISTRIBUTED BY HASH(col1) BUCKETS 3 PROPERTIES( "replication_num"="1" ); CREATE VIEW `view_i` AS SELECT `b`.`col1` AS `col1`, `b`.`col2` AS `col2` FROM ( SELECT `col1` AS `col1`, sum(`cost`) AS `col2` FROM ( SELECT `col1` AS `col1`, sum(CAST(`col3` AS INT)) AS `cost` FROM `t_push_filter_through_agg` GROUP BY `col1` ) a GROUP BY `col1` ) b; SELECT SUM(`total_cost`) FROM view_a WHERE `dt` BETWEEN '2023-06-12' AND '2023-06-18' LIMIT 1;	2023-06-25 19:14:20 +08:00
Kang	69d5adaee3	[Improvement](doc) improve ngram and inverted index documents #21091	2023-06-25 19:13:41 +08:00
Hong Liu	ee2492dd78	[typo](doc)fix delete table associate to other table only support unique model (#21129 ) Co-authored-by: smallhibiscus <844981280>	2023-06-25 19:04:27 +08:00
shuke	55e7af1e31	[fix](test) fix two case bug #21124	2023-06-25 18:53:20 +08:00
starocean999	b6c9feb458	[fix](nereids) check table privilege when it's needed (#21130 ) check privilege on LogicalOlapScan, LogicalEsScan, LogicalFileScan and LogicalSchemaScan	2023-06-25 18:35:39 +08:00
Siyang Tang	46f0295b78	[feature](load-refactor-with-tvf) S3 load with S3 tvf and native insert (#19937 )	2023-06-25 17:45:31 +08:00
AKIRA	771b0cbb4c	[fix](stats) Update analyze task execute time (#21026 ) Before this PR last_execute_time of pending analyze jobs would be 1970-01-01, you can reproduce it by run show analyze	2023-06-25 15:52:33 +08:00
AKIRA	cf66280e60	[opt](stats) Sampling when aggregate column stats (#21020 ) In the previous implementation, when aggregating partition statistics into column statistics, the calculation of distinct values (ndv) for the entire column was performed without using sampling, resulting in reduced efficiency of the sampling process. Before this PR analyze below table which has 1000000 lines would cost 5.75sec, after this PR, it would cost 3.39sec. ```sql CREATE TABLE IF NOT EXISTS `duplicate_all` ( `k3` int(11) null comment "", `k0` boolean null comment "", `k1` tinyint(4) null comment "", `k2` smallint(6) null comment "", `k4` bigint(20) null comment "", `k5` decimalv3(9, 3) null comment "", `k6` char(36) null comment "", `k10` date null comment "", `k11` datetime null comment "", `k7` varchar(64) null comment "", `k8` double null comment "", `k9` float null comment "", `k12` string null comment "", `k13` largeint(40) null comment "" ) engine=olap DUPLICATE KEY(`k3`) DISTRIBUTED BY HASH(`k3`) BUCKETS 5 properties("replication_num" = "3") ```	2023-06-25 15:52:01 +08:00
AKIRA	dd99468b8f	[fix](stats) Fix jdbc timeout with multiple FE when execute analyze table (#21115 ) SQL may forward to master to execute when connecting to follower node, the result should be set to `StmtExecutor#proxyResultSet` Before this PR, in above scenario , submit analyze sql by mysql client/jdbc whould return get malformed packet/ Communication failed.	2023-06-25 15:49:36 +08:00
Lijia Liu	76bdcf1d26	[improvement](pipeline) task group scan entity (#19924 )	2023-06-25 14:43:35 +08:00
mch_ucchi	80d54368e0	[minor](Nereids) replace some nullable field to Optional (#20967 )	2023-06-25 12:02:25 +08:00
Mryange	6896776034	[test](regression) update some case in p2 (#21094 ) update some case in p2	2023-06-25 11:16:56 +08:00
yiguolei	207bc53b06	[functionpushdown](performance) move function pushdown as default false since its performance is not good (#21111 ) set enable function pushdown default to false. enable it in fuzzy mode to test this feature. We should remove function pushdown in the future since we already have common expr pushdown. Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-06-25 10:36:20 +08:00
didiaode18	20b92b0812	[Feature](log)friendly hint for creating table failed (#20617 )	2023-06-25 10:02:26 +08:00

1 2 3 4 5 ...

11475 Commits