doris

Author	SHA1	Message	Date
mch_ucchi	05bdbce8fc	[Feature](Nereids) support update unique table statement (#20313 )	2023-06-06 20:32:43 +08:00
Chengpeng Yan	ae428c29e2	[feature](planner)(nereids) support user defined variable (#20334 ) Support user-defined variables. After this PR, we can use `set @a = xx` to define a user variable and use it in the query like `select @a`. the changes of this PR: 1. Support the grammar for `set user variable` in the parser. 2. Add the `userVars` in `VariableMgr` to store the user-defined variables. 3. For the `set @a = xx`, we will store the variable name and its value in the `userVars` in `VariableMgr`. 4. For the `select @a`, we will get the value for the variable name in `userVars`.	2023-06-06 14:35:16 +08:00
amory	1f032a551d	[Improve](array-functions) support array first function (#20397 ) add array_first(lambda, [1,2,3,null]) function for doris	2023-06-06 12:08:46 +08:00
TengJianPing	1b94b6368f	[fix](load) in strict mode, return error for insert if datatype convert fails (#20378 ) * [fix](load) in strict mode, return error for load and insert if datatype convert fails Revert "[fix](MySQL) the way Doris handles boolean type is consistent with MySQL (#19416)" This reverts commit 68eb420cabe5b26b09d6d4a2724ae12699bdee87. Since it changed other behaviours, e.g. in strict mode insert into t_int values ("a"), it will result 0 is inserted into table, but it should return error instead. * fix be ut * fix regression tests	2023-06-06 12:04:03 +08:00
morrySnow	e553615a27	[opt](Nereids) perfer use datev2 / datetimev2 in date related functions (#20224 ) 1. update all date related functions' signatures order. 1.1. if return value need to be compute with time info, args with datetimev2 at the top of the list, followed by datev2, datetime and date 1.2. if return value need to be compute with only date info, args with datev2 at the top of list, followed by datetimev2, date and datetime 2. Priority for use datev2, if we must cast date to datev2 or datetime/datetimev2	2023-06-06 11:42:29 +08:00
Mryange	2fc1141c5f	[test](regression) update some case in p2 (#20436 )	2023-06-06 11:05:56 +08:00
Yang, Xu	d02737a293	[feature](struct-type) support struct_element function (#19045 ) This commit support a function allows return a field column in named struct column. Since the function can return any type, this commit also supports ANY_STRUCT_TYPE and ANY_ELEMENT_TYPE.	2023-06-06 10:44:08 +08:00
Mingyu Chen	f839c90c27	[fix][refactor](backend-policy)(compute) refactor the hierarchy of external scan node and fix compute node bug #20402 There should be 2 kinds of ScanNode: OlapScanNode ExternalScanNode The Backends used for ExternalScanNode should be controlled by FederationBackendPolicy. But currently, only FileScanNode is controlled by FederationBackendPolicy, other scan node such as MysqlScanNode, JdbcScanNode will use Mix Backend even if we enable and prefer to use Compute Backend. In this PR, I modified the hierarchy of ExternalScanNode, the new hierarchy is: ScanNode OlapScanNode SchemaScanNode ExternalScanNode MetadataScanNode DataGenScanNode EsScanNode OdbcScanNode MysqlScanNode JdbcScanNode FileScanNode FileLoadScanNode FileQueryScanNode MaxComputeScanNode IcebergScanNode TVFScanNode HiveScanNode HudiScanNode And previously, the BackendPolicy is the member of FileScanNode, now I moved it to the ExternalScanNode. So that all subtype ExternalScanNode can use BackendPolicy to choose Compute Backend to execute the query. All all ExternalScanNode should implement the abstract method createScanRangeLocations(). For scan node like jdbc scan node/mysql scan node, the scan range locations will be selected randomly from compute node(if preferred). And for compute node selection. If all scan nodes are external scan nodes, and prefer_compute_node_for_external_table is set to true, the BE for this query will only select compute nodes.	2023-06-06 10:35:30 +08:00
yangshijie	0a90a9d507	[feature-wip](duplicate_no_keys) Add some test cases of all the duplicate tables in test case tpcds_sf100_dup_without_key_p2 and make them duplicate tables without keys (#20431 )	2023-06-05 21:04:41 +08:00
Xin Liao	05d497d21e	[fix](sequence) value predicates shouldn't be push down when has sequence column (#20408 ) * (fix)[sequence] value predicates shouldn't be push down when has sequence column * add case	2023-06-05 19:18:34 +08:00
mch_ucchi	fac0b50f56	[Fix](Planner)fix cast date/datev2/datetime to float/double return null. (#20008 )	2023-06-05 19:06:50 +08:00
minghong	92721c84d3	[improve](nereids)derive analytics node stats (#20340 ) 1. derive analytic node stats, add support for rank() 2. filter estimation stats derive updated. update row count of filter column. 3. use ColumnStatistics.orginal to replace ColumnStatistics.orginalNdv, where ColumnStatistics.orginal is the column statisics get from TableScan. TPCDS 70 on tpcds_sf100 improved from 23sec to 2 sec This pr has no performance downgrade on other tpcds queries and tpch queries.	2023-06-05 18:56:20 +08:00
starocean999	5b2efd196b	[fix](execution) result_filter_data should be filled by 0 when can_filter_all is true (#20438 )	2023-06-05 17:05:35 +08:00
jakevin	9d39fd7aae	[fix](Nereids): fix filter can't be pushdown unionAll (#20310 )	2023-06-05 16:56:25 +08:00
Gabriel	20bf309ffb	[test](p2) Fix p2 output (#20432 )	2023-06-05 15:58:04 +08:00
minghong	0dc6d3a568	[fix](nereids) avg size of column stats always be 0 (#20341 ) it takes lot of effort to compute the avgSizeByte for col stats. we use schema information to avoid compute actual average size	2023-06-05 13:01:58 +08:00
AKIRA	cd0379df4e	[fix](nereids) select with specified partition name is not work as expected (#20269 ) This PR is to fix the select specific partition issue, certain codes related to this feature were accidentally deleted.	2023-06-05 12:48:54 +08:00
ZhangYu0123	d03bb4ba7b	[Optimize](function) Optimize locate function by compare across strings (#20290 ) Optimize locate function by compare across strings. about 90% speed up test by sum()	2023-06-05 12:43:14 +08:00
starocean999	c6387847aa	[fix](nereids) change defaultConcreteType function's return value for decimal (#20380 ) 1. add default decimalv2 and decimalv3 for NullType 2. change defaultConcreteType of decimalv3 to this	2023-06-05 10:50:07 +08:00
amory	59a0f80233	[Improve](array-function)Improve array function intersect (#20085 ) now we just support array function with 2 arrays , but intersect operator can support more than 2 arrays	2023-06-05 10:38:48 +08:00
Jerry Hu	e90b78d783	[chore](regression) add case in test_delete (#20372 ) Add some cases of deletion conditions with numeric values.	2023-06-05 09:38:29 +08:00
Kang	ffadaa4935	[improvement](inverted index) skip write index on load and generate index on compaction (#20325 )	2023-06-03 16:03:21 +08:00
YueW	b62c5a70c7	[fix](match query) fix array column match query failed without inverted index (#20344 )	2023-06-02 21:10:12 +08:00
YueW	adc3acb283	[fix](match) fix match query with compound predicates return -6003 (#20361 )	2023-06-02 18:25:37 +08:00
zy-kkk	a20a6d2bea	[refactor](jdbc catalog) Refactor the JdbcClient code (#20109 ) This PR does the following: 1. This PR is a substantial refactor of the JDBC client architecture. The previous monolithic JDBC client has been refactored into an abstract base class `JdbcClient`, and a set of database-specific subclasses (e.g., `JdbcMySQLClient`, `JdbcOracleClient`, etc.), and the JdbcClient required config, abstract into an object. This allows for improved modularity, easier addition of support for new databases, and cleaner, more maintainable code. This change is backward-compatible and does not affect existing functionality. 2. As a result of client refactoring, OceanBaseClient can automatically recognize the mode of operation as MySQL or Oracle, so we cancel the oceanbase_mode property in the Jdbc Catalog, but due to the cancellation of the property, When creating a single OceanBase Jdbc Table, the table type needs to be filled in as oceanbase(mysql mode) or oceanbase_oracle(oracle_mode). The above work is a change in the usage behavior, please note. 3. For the PostgreSQL Jdbc Catalog, I did two things: 1. The adaptation to MATERIALIZED VIEW and FOREIGN TABLE is added 2. Fixed reading jsonb, which had been incorrectly changed to json in a previous PR 4. fix some jdbc catalog test case 5. modify oceanbase jdbc doc And,Thanks @wolfboys for the guidance	2023-06-02 17:58:10 +08:00
amory	d68f3f3b3d	[Feature](array-functions)improve array functions for array_last_index (#20294 ) Now we just support array_first_index for lambda input , but no array_last_index	2023-06-02 13:54:03 +08:00
Jerry Hu	8ff8705b3f	[fix](olap) deletion statement with space conditions did not take effect (#20349 ) Deletion statement like this: delete from tb where k1 = ' '; The rows whose k1's value is ' ' will not be deleted.	2023-06-02 13:52:57 +08:00
starocean999	a8a4da9b9e	[fix](nereids)dphyper join reorder may cache wrong project list for project node (#20209 ) * [fix](nereids)dphyper join reorder may cache wrong project list for project node	2023-06-02 09:35:28 +08:00
xueweizhang	ecdc5124be	[feature-wip](duplicate-no-keys) schame change support for duplicate no keys (#19326 )	2023-06-02 09:22:41 +08:00
HappenLee	608d2a3eca	[Bug](exec) push down no group by agg min cause error result (#20289 ) sql """ CREATE TABLE t1_int ( num int(11) NULL, dgs_jkrq bigint(20) NULL ) ENGINE=OLAP DUPLICATE KEY(num) COMMENT 'OLAP' DISTRIBUTED BY HASH(num) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" ); """ sql """insert into t1_int values(1,1),(1,2),(1,3),(1,4),(1,null);""" qt_sql """ select min(dgs_jkrq) from t1_int; """ get the error result：4 after change we get the right result：1	2023-06-01 17:29:46 +08:00
Gabriel	a8b273ae31	[P2](test) Fix P2 output (#20311 )	2023-06-01 15:11:12 +08:00
Mryange	519f01133a	[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )	2023-06-01 13:09:58 +08:00
Jibing-Li	1b968c4ade	[fix](multi catalog)Fix nereids planner text format include extra column index bug (#20260 ) Nereids planner include all columns index in TFileScanRangeParams, this may cause the column projection incorrect for text format table. Because csv reader use the column index position to split a line. Extra column index will cause get wrong split result. This PR is to reset the column index after Projection, remove the useless column index.	2023-06-01 12:17:47 +08:00
mch_ucchi	cc41cb0e7e	[Fix](Nereids) fix some insert into select bugs (#20052 ) fix 3 bugs: 1. failed to insert into a table with mv. ```sql create table t ( id int, c1 int, c2 int, c3 int ) duplicate key(id) distributed by hash(id) buckets 4 create materialized view k12s3m as select id, sum(c1), max(c3) from t group by id; insert into t select -4, -4, -4, 'd'; ``` insert will rise exception because mv column is not handled. now we will add a target column and value as defineExpr. 2. failed to insert into a table with not all the columns. ```sql insert into t(c1, c2) select c1, c2 from t ``` and t(id ukey, c1, c2, c3), will insert too many data, we fix it by change the output partitions. 3. failed to insert into a table with complex select. the select statement has join or agg, fix the bug by the way similar to the one at 2nd bug.	2023-06-01 12:15:19 +08:00
starocean999	68e593fbf1	[fix](nereids)(planner) case when should return NullLiteral when all case result is NullLiteral (#20280 )	2023-06-01 11:11:41 +08:00
lihangyu	9e21318834	[refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594 ) 1. make ColumnObject exception safe 2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema 3. add more test cases	2023-06-01 10:25:04 +08:00
LiBinfeng	65a75abecb	[Fix](Nereids) bitmap type should not be used in comparison predicate (#19807 ) When using nereids, if we use compare operator of bitmap type, an analyze exception need to be throwed. like: select id from (select BITMAP_EMPTY() as c0 from expr_test) as ref0 where c0 = 1 order by id Which c0 in subq0 is a bitmap type, this scenario is not supported right now.	2023-05-31 23:09:36 +08:00
YueW	6adb3fdf11	[fix](match_phrase) Fix the inconsistent query result for 'match_phrase' after creating index without support_phrase property (#20258 ) if create inverted index without support_phrase property, remaining the match_phrase condition to filter by match function.	2023-05-31 18:09:50 +08:00
AKIRA	d93ff5d1ab	[fix](pipeline) Enable pipeline explicitly in the plan shape check cases. (#20221 ) enable pipeline explicitly in tpcds plan shape check	2023-05-31 14:40:24 +08:00
starocean999	1f22aa6961	[fix](nereids) like function's nullable property should be PropagateNullable (#20237 )	2023-05-31 12:13:38 +08:00
Jibing-Li	3f91127854	[fix](regression)Update external Brown test case out file. #20232 Update external Brown test case out file to match the new precision.	2023-05-31 09:21:04 +08:00
Gabriel	ff05217a1e	[regression](p0) fix test for `array_enumerate_uniq` (#20231 )	2023-05-30 22:14:19 +08:00
Ashin Gau	b7a69fbf4b	[test](regression) add regression test from materialized slot bug (#20207 ) The test query includes the conversion of string types to other types, and the processing of materialized columns for nested subqueries, which is the regression test for bug fix(#18783)	2023-05-30 21:23:05 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
Chengpeng Yan	a855253543	[fix](Nereids) filter should not push through union to OneRowRelation (#20132 ) ## Problem summary When we want to push the filter through the union. We should check whether the union's children are `OneRowRelation` or not. If there are some `OneRowRelation`, we shouldn't push down the filter to that part Before this PR ``` mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1; +------+------+ \| a \| b \| +------+------+ \| 1 \| 2 \| \| 3 \| 3 \| +------+------+ 2 rows in set (0.01 sec) ``` After this PR ``` mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1; +------+------+ \| a \| b \| +------+------+ \| 1 \| 2 \| +------+------+ 1 row in set (0.38 sec) ```	2023-05-30 17:06:52 +08:00
Mingyu Chen	0c98355fff	[fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137 ) 1. Fix create catalog with resource replay bug. If user create catalog using `create catalog hive with resource xxx`, when replaying edit log, there is a bug that resource may be dropped, causing NPE and FE will fail to start. In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true. So that `with resource` will not be allowed, and it will be deprecated later. And also fix the replay bug to avoid NPE. 2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication. When user create 2 hive catalogs, one use simple auth, the other use kerberos auth. The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.` So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`. Which means this property will be added automatically when user creating hive catalog, to avoid such problem. 3. Fix calling `hdfsExists()` issue When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found. 3. Some code refactor Avoid import `org.apache.parquet.Strings`	2023-05-30 16:57:39 +08:00
Mingyu Chen	49ce4e6fda	[fix](test) fix p2 broker load (#20196 )	2023-05-30 16:26:00 +08:00
Gabriel	631494e05d	[regression](decimalv3) Fix output for P1 regression (#20213 )	2023-05-30 15:21:29 +08:00
bobhan1	bb12a1cb49	[Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724 )	2023-05-30 13:09:19 +08:00
Mryange	94e1072d14	Revert "[fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926 )" (#20204 ) This reverts commit 8ca4f9306763b5a18ffda27a07ab03cc77351e35.	2023-05-30 10:35:33 +08:00

1 2 3 4 5 ...

1261 Commits