doris

Author	SHA1	Message	Date
gnehil	c78341b728	[improvement](spark-load) support datev2 and datetimev2 #21839	2023-07-24 09:07:53 +08:00
shee	ff9811fa1b	[Bug][Colocate] when adding a table to the colocate group, we should check that the number of buckets per partition is the same (#21906 ) for example CREATE TABLE `colocate_a` ( dt date, k1 int, v1 int ) ENGINE=OLAP DUPLICATE KEY(`k1`) PARTITION BY RANGE(`dt`) (PARTITION p1 VALUES [('2022-10-02'), ('2022-10-03')) DISTRIBUTED BY HASH(`k1`) BUCKETS 2 PROPERTIES ( "replication_num" = "3", "in_memory" = "false", "storage_format" = "V2" ); ALTER TABLE colocate_a set ("colocate_with" = "ab"); CREATE TABLE `colocate_b` ( dt date, k1 int, v1 int ) ENGINE=OLAP DUPLICATE KEY(`k1`) PARTITION BY RANGE(`dt`) (PARTITION p1 VALUES [('2022-10-02'), ('2022-10-03')) DISTRIBUTED BY HASH(`k1`) BUCKETS 2 PROPERTIES ( "replication_num" = "3", "in_memory" = "false", "storage_format" = "V2" ); ALTER TABLE colocate_b ADD PARTITION p2 VALUES [("2022-10-03"),("2022-10-04")) DISTRIBUTED BY HASH(k1) BUCKETS 10; ALTER TABLE colocate_b set ("colocate_with" = "ab"); table colocate_b partition p2 set bucket num is 10 then take it into group ab. In ColocateTableCheckerAndBalancer matchGroup occur : java.lang.IllegalStateException: 2 vs. 10 303861 at com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[guava-30.0-jre.jar:?] 303862 at org.apache.doris.clone.ColocateTableCheckerAndBalancer.matchGroup(ColocateTableCheckerAndBalancer.java:242) ~[doris-fe.jar:1.2-SNAPSHOT] 303863 at org.apache.doris.clone.ColocateTableCheckerAndBalancer.runAfterCatalogReady(ColocateTableCheckerAndBalancer.java:95) ~[doris-fe.jar:1.2-SNAPSHOT] 303864 at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[doris-fe.jar:1.2-SNAPSHOT] 303865 at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT] --------- Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>	2023-07-24 09:01:16 +08:00
赵立伟	d0219062ef	[refactor](be) use std::move to improve performance of push_back #22056	2023-07-24 08:51:28 +08:00
wuwenchi	64348055a1	[improvement](iceberg) Optimize the split to the user-specified size #22078 According to the specified split size, the split tasks are merged to keep a single task near the expected size.	2023-07-24 08:48:10 +08:00
Chenyang Sun	0396ac9d38	fix(compaction) release the block and segment iterator after reading to the end of the segment file (#22082 ) When reading to the end of the segment file, clearing the block did not release the memory, leading to high memory usage during compaction. When reading through segment file for columns that are dictionary encoded, the column iterator in the segment iterator will hold the dictionary. Release the segment iterator to free up the dictionary.	2023-07-24 08:47:19 +08:00
Mingyu Chen	0c811edb78	[deps](hadoop) update hadoop libs to 3.3.4.5 (#22062 )	2023-07-23 20:17:16 +08:00
Mingyu Chen	a5099a2d3b	[minor](log) print error msg to fe.out before log is initialized (#22106 ) The exception may be thrown before LOG is initialized. Such as wrong config value. So we need to print it to fe.out, otherwise we can't know what's wrong. After this PR, the error can be found in fe.out, such as: ``` java.lang.NumberFormatException: For input string: "3g" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at org.apache.doris.common.ConfigBase.setConfigField(ConfigBase.java:253) at org.apache.doris.common.ConfigBase.setFields(ConfigBase.java:232) at org.apache.doris.common.ConfigBase.initConf(ConfigBase.java:146) at org.apache.doris.common.ConfigBase.init(ConfigBase.java:112) at org.apache.doris.DorisFE.start(DorisFE.java:101) at org.apache.doris.DorisFE.main(DorisFE.java:73) ```	2023-07-23 19:20:10 +08:00
Liqf	ddd7e9871d	[improvement](Jsonb) optimization Jsonb path parse (#21495 ) The previous logic was to read jsonbvalue while parsing the json path. For complex json paths, there will be a lot of repeated parsing work. The optimization idea is to separate the analysis and value of jsonpath	2023-07-23 18:59:12 +08:00
Xin Liao	4f0158c458	[fix](partial-update) fix update core for merge-on-write table (#22090 )	2023-07-23 13:35:08 +08:00
yiguolei	2c16fe0da9	[bugfix](runtimefilter) runtime filter is shared between multi instances with same node id, should not cache exprs (#22114 ) runtime filter is shared among multi instances. in the past, we cached pushdown expr(runtime filter generated) every scannode[runtime filter consumer] will try to call prepare expr but the expr may generated with different fn_context_id --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-07-23 13:04:33 +08:00
zhangstar333	256051a965	[bug](node) fix partiton sort node core dump when eos (#22108 ) fix partiton sort node core dump when eos	2023-07-23 12:00:53 +08:00
Siyang Tang	22aa54e335	[enhancement](config) enlarge max_bytes_per_broker_scanner to 5G #22099	2023-07-23 12:00:32 +08:00
zhangdong	eceb30f47e	[doc](catalog)paimon doc (#21966 ) code pr: #21910	2023-07-23 11:24:40 +08:00
zhangdong	dfb5d4bc13	[fix](catalog) do not call makeSureInitialized when create/drop table/db from hms meta event (#21941 ) Supplement to #21104	2023-07-23 11:24:20 +08:00
yiguolei	f8307f1a1a	[bugfix](scanner) when scanner init failed during get tablet, not need call update counters (#22117 ) Co-authored-by: yiguolei <yiguolei@gmail.com> If the scanner is failed during init or open, then not need update counters because the query is fail and the counter is useless. And it may core during update counters. For example, update counters depend on scanner's tablet, but the tablet == null when init failed.	2023-07-23 10:19:20 +08:00
caiconghui	8cb532230a	[fix](metric) fix prometheus metric format error (#22045 ) we should define metric name only once like following: # HELP doris_fe_query_latency_ms # TYPE doris_fe_query_latency_ms summary doris_fe_query_latency_ms{quantile="0.75"} 1.0 doris_fe_query_latency_ms{quantile="0.95"} 2.0 doris_fe_query_latency_ms{quantile="0.98"} 100.0 doris_fe_query_latency_ms{quantile="0.99"} 100.0 doris_fe_query_latency_ms{quantile="0.999"} 100.0 doris_fe_query_latency_ms{quantile="0.75",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.95",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.98",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.99",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.999",user="default_cluster:test1"} 1.0	2023-07-22 22:38:29 +08:00
Pxl	0755fd16d8	remove create hot partition failed check (#22093 )	2023-07-22 17:47:46 +08:00
amory	3d0f952934	[FIX](complex-type)delete enable_map/struct_type switch #21957	2023-07-22 15:29:32 +08:00
Pxl	ae809fbeba	[Bug](storage )fix dead lock when create_tablet need lock two tablet && update mv_p0… (#21969 ) fix dead lock when create_tablet need lock two tablet && update mv_p0/ssb case	2023-07-22 15:27:05 +08:00
zhannngchen	50c8563f35	[fix](partial update) fix some bugs of sequence column (#21896 )	2023-07-22 15:26:48 +08:00
Jibing-Li	355ac18363	[Fix](jdbc catalog) Pass conjuncts to JdbcScanNode and FileScanNode before doing finalize. (#21998 ) JdbcScanNode need to use the conjuncts to generate sql in finalize function. But the conjuncts have not passed to JdbcScanNode yet while calling finalize. This pr is to pass the conjuncts to scan node before using it to avoid scan the whole table.	2023-07-22 14:08:44 +08:00
zy-kkk	42ec92fd12	[enhancement](jdbc catalog) Add sqlserver jdbc url param `useBulkCopyForBatchInsert=true` (#22032 ) When useBulkCopyForBatchInsert=false, the JDBC driver will not use SQL Server's Bulk Copy API for batch insertions. Thus, during the batch insertion process, each insert statement needs to be individually sent to the SQL Server, leading to a higher number of network roundtrips. Network latency could potentially become a significant factor contributing to performance degradation. For this reason, we recommend setting this parameter to true by default to enhance the performance of PreparedStatement batch insertions. In this manner, when performing batch insertions, the JDBC driver will send all insertion data to SQL Server in one go via the Bulk Copy API, rather than sending each insert statement individually. This can significantly reduce the number of network roundtrips, thereby improving performance. Please note that this option is only effective for fully parameterized INSERT statements. If your INSERT statement is mixed with other SQL statements, or if it contains values specified directly in the statement, then the JDBC driver will not use the Bulk Copy API, but instead will use the standard insert method.	2023-07-22 11:32:21 +08:00
amory	f7e3cc1553	[FIX](map)fix map proto contains_null #22107 when we select map in order by and limit; be node will coredump	2023-07-22 10:41:55 +08:00
Jibing-Li	82f5a3f684	[Fix] (multi catalog)Fix external table couldn't find db bug (#22074 ) Nereids LogicalCatalogRelation and PhysicalCatalogRelation getDatabase function only try to search InternalCatalog to find a table. This will cause all external table failed to query because it couldn't find the external database in Internal catalog. ``` mysql> explain select count(*) from multi_partition_orc; ERROR 1105 (HY000): AnalysisException, msg: Database [default_cluster:multi_partition] does not exist. ``` This pr is using catalog name to find the correct catalog first, and then try to get the database in this catalog.	2023-07-22 00:13:26 +08:00
starocean999	93f9a8cbf5	[fix](nereids)PredicatePropagation only support integer types for now (#22096 )	2023-07-21 23:40:08 +08:00
catpineapple	32fce013f7	[feature](docs) add docs dbt-doris adapter (#22067 )	2023-07-21 23:34:47 +08:00
xzj7019	0b1c82b021	[opt](nereids) enhance runtime filter pushdown (#21883 ) Current runtime filter can't be pushed down into complicated plan pattern, such as set operation as join child and cte sender as filter before shuffling. This pr refines the pushing down ability and can able to push the filter into different plan tree layer recursively, such as nested subquery, set op, cte sender, etc.	2023-07-21 23:31:30 +08:00
zhangstar333	afeac4419f	[Bug](node) fix partition sort node forget handle some type of key in hashmap (#22037 ) * [enhancement](repeat) add filter in repeat node in BE * update	2023-07-21 23:30:40 +08:00
Chenyang Sun	f7ac827c90	[fix](compaction) fix time series compaction point policy (#21670 )	2023-07-21 23:09:02 +08:00
YueW	ef01988ae1	[opt](inverted index) support the same column create different type index (#21972 )	2023-07-21 23:02:39 +08:00
starocean999	acf4aa2818	[fix](planner)shouldn't force push down conjuncts for union statement (#22079 ) * [fix](planner)shouldn't force push down conjuncts for union statement	2023-07-21 21:12:56 +08:00
Mingyu Chen	85cc044aaa	[feature](create-table) support setting replication num for creating table opertaion globally (#21848 ) Add a new FE config `force_olap_table_replication_num`. If this config is larger than 0, when doing creating table operation, the replication num of table will forcibly be this value. Default is 0, which make no effect. This config will only effect the creating olap table operation, other operation such as `add partition`, `modify table properties` will not be effect. The motivation of this config is that the most regression test cases are creating table will single replica, this will be the regression test running well in p0, p1 pipeline. But we also need to run these cases in multi backend Doris cluster, so we need test cases will multi replicas. But it is hard to modify each test cases. So I add this config, so that we can simply set it to create all tables with specified replication number.	2023-07-21 19:36:04 +08:00
Siyang Tang	e489b60ea3	[feature](load) support line delimiter for old broker load (#22030 )	2023-07-21 19:31:19 +08:00
谢健	b76d0d84ac	[enhancement](Nereids) support other join framework in DPHyper (#21835 ) implement CD-A algorithm in order to support others join in DPHyper. The algorithm details are in on the correct and complete enumeration of the core search	2023-07-21 18:31:52 +08:00
Kaijie Chen	bed940b7fc	[fix](log) column index off-by-one error in scanner logs (#19747 )	2023-07-21 18:30:01 +08:00
mch_ucchi	7cac36d9e8	[chore](Nereids) fix typo in some plan visitor (#21830 )	2023-07-21 18:22:20 +08:00
Dongyang Li	37f230ee3e	[pipeline](regression) do not run build if only modified regression conf (#22075 ) in order to fast exclude cases that block regression pipeline.	2023-07-21 17:13:28 +08:00
Calvin Kirs	c3663c5ff1	[Fix](Sonar)sonar not working due to changing thrift code generation … (#22076 )	2023-07-21 17:08:48 +08:00
lihangyu	40299d280d	[Fix](json reader) fix rapidjson `array->PushBack` may take ownership… (#21988 ) With bellow json path `["$.data","$.data.datatimestamp"]` After `array_obj->PushBack` the `data` field owner will be taken from array_obj, and lead to null values for json path `$.data.datatimestamp` Rapidjson doc: ``` //! Append a GenericValue at the end of the array. \note The ownership of \c value will be transferred to this array on success. */ GenericValue& PushBack(GenericValue& value, Allocator& allocator); ```	2023-07-21 17:02:01 +08:00
HHoflittlefish777	d1c5025bce	[Fix](compaction) add error message when load unsupport compression code (#22033 )	2023-07-21 16:50:46 +08:00
plat1ko	6b20cdb170	[Fix](compaction) Fix SizeBasedCumulativeCompactionPolicy pick_input_rowsets (#21732 )	2023-07-21 16:48:53 +08:00
bobhan1	2b2ac10e93	[feature](partial update) add failure tolerance for strict mode partial update stream load	2023-07-21 16:46:44 +08:00
yagagagaga	63b17bc7ba	[typo](docs) fix some mistake in Doris & Spark Column Type Mapping (#19998 )	2023-07-21 16:37:51 +08:00
wudi	67a3f37779	[doc](routineload)add routine load ssl example for access ali-kafka (#21877 )	2023-07-21 16:03:10 +08:00
Xin Liao	db69af1165	[fix](meger-on-write) fix query result wrong when schema change (#22044 )	2023-07-21 15:29:04 +08:00
Kaijie Chen	e4c6b9893a	[improve](load) add more profiles in tablets channel (#21838 )	2023-07-21 13:59:15 +08:00
bobhan1	732e0d14ff	[Enhancement](window-funnel)add different modes for window_funnel() function (#20563 )	2023-07-21 13:57:27 +08:00
yujun	94e2c3cf0f	[fix](tablet clone) sched wait slot if has be path (#22015 )	2023-07-21 13:27:40 +08:00
bobhan1	74313c7d54	[feature-wip](autoinc)(step-3) add auto increment support for unique table (#22036 )	2023-07-21 13:24:41 +08:00
ZenoYang	6512893257	[refactor](vectorized) Remove useless control variables to simplify aggregation node code (#22026 ) * [refactor](vectorized) Remove useless control variables to simplify aggregation node code * fix	2023-07-21 12:45:23 +08:00

1 2 3 4 5 ...

12012 Commits