doris

Author	SHA1	Message	Date
DuRipeng	52d25f41a4	[feature](multi-catalog) Rename multi-catalog config 'specified_database_list' to 'include_database_list', and introduce new multi-catalog config 'exclude_database_list' (#18834 ) In my scene, We need to specify databases that are excluded to synchronize to doris, like some databases store temporary table. Since #17803 introduce `specified_database_list` to specify 'include databases', this pr introduce new config `exclude_database_list` to specify 'exclude databases', and rename `specified_database_list` to `include_database_list` for naming symmetry. BTW, when `include_database_list` and `exclude_database_list` specify overlapping databases, `exclude_database_list` would take effect with higher privilege over `include_database_list`.	2023-05-04 09:30:02 +08:00
nanfeng	da4de37dec	[feature-wip](mv lifecycle) separate life cycle of base table and its materialized views (#19210 ) support related syntax and add:regress-test case --------- Co-authored-by: yzy <yzy@nanfeng_yzy@163.com>	2023-04-30 17:42:02 +08:00
yiguolei	8eab20d3df	[bugfix](low cardinality) cached code is wrong will result wrong query result when many null pages (#19221 ) Sometimes the dict is not initialized when run comparison predicate here, for example, the full page is null, then the reader will skip read, so that the dictionary is not inited. The cached code is wrong during this case, because the following page maybe not null, and the dict should have items in the future. This will result the dict string column query return wrong result, if there are many null values in the column. I also add some regression test for dict column's equal query, larger than query, less than query. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-29 21:28:41 +08:00
yixiutt	aef9355cd3	[feature-wip](partial update) PART1: support basic partial write (#17542 )	2023-04-28 17:17:57 +08:00
ElvinWei	718297d3c1	[test](statistics) add p0 test of sampling statistics (#19176 ) 1. Added test p0 for sampling collection statistics 2. Modify the uniqueKeys of table analysis_jobs for deletion based on relevant conditions 3. Solve the problem that incremental statistics p0 is less stable	2023-04-28 15:50:05 +08:00
ElvinWei	484612a0af	[opt](statistics) optimize Incremental statistics collection and statistics cleaning (#18971 ) This pr mainly optimizes the following items: - the collection of statistics: clear up invalid historical statistics before collecting them, so as not to affect the final table statistics. - the incremental collection of statistics: in the case of incremental collection, only the corresponding partition statistics need to be collected. TODO: Supports incremental collection of materialized view statistics.	2023-04-27 11:51:47 +08:00
TengJianPing	708c1850d9	[test](hll) add test case for hll_raw_agg (#19127 )	2023-04-27 11:33:49 +08:00
lihangyu	e76b3a316f	[Bug](mysql proto) fix binary proto with dynamic mode (#19055 ) Dynamic mode used in array type when serialize it to mysql row buffer using dynamic mode, when combine binary row format with dynamic mode,something goes wrong, and lead to invalid binary row format.	2023-04-27 11:18:01 +08:00
brody715	20395ce501	[feature](array_function): add support for array_cum_sum function (#18231 )	2023-04-27 09:57:13 +08:00
TengJianPing	1afa7c786f	[test](regression) add test case for bucket shuffle of datetime column (#19088 )	2023-04-27 09:05:32 +08:00
xy720	925efc1902	[bug](map-type)fix some bugs in map and map element function (#18935 ) fix some bugs in map and map element function.	2023-04-26 22:10:15 +08:00
starocean999	aacc075f09	[fix](planner) SetOperationNode's slots' nullability calculation is wrong (#19108 ) SetOperationNode's slots' nullability should consider slots info from all children, even some children have EmptyResultSet	2023-04-26 21:18:37 +08:00
morrySnow	e83d0d9b6a	[opt](Nereids) forbid some bad pattern aggregate in AggregateStrategy (#18877 ) since we cannot do stats derive and cost estimate on agg very good. this PR remove some aggregate pattern that usually not good. 1. one stage agg after exchange. this pattern is good only when process very few rows. 2. three stage distinct agg with gather middle merge.	2023-04-26 20:01:35 +08:00
Jibing-Li	59d8aa5a6f	[Fix](multi catalog)Fix Hive partition path doesn't contain partition value case bug (#19053 ) Hive support create partition with a specific location. In this case, the file path for the create partition may not contain the partition name and value. Which will cause Doris fail to query the the hive partition. This pr is to fix this bug.	2023-04-26 17:18:51 +08:00
zhengyu	0c9fb7297e	[fix](regression) mv segcompaction_p1 to segcompaction_p2 (#18806 ) segcompaction_p1 contains fairly large load jobs, which will exceed memlimit or timeout in pipeline under such heavy loads. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-04-26 15:34:46 +08:00
AKIRA	d3a0b94602	[feature](stats) Support to kill analyze #18901 1. Report error if submit analyze jobs when stats table is not available 2. Support kill analyze 3. Support cancel sync analyze	2023-04-26 14:23:44 +08:00
zhangstar333	d037938a4c	[vectorzied](function) fix year_floor get result is incorrectly (#19006 )	2023-04-26 11:39:22 +08:00
Mryange	5fd6d8ebd4	[fix](function) Support more behaviors of cast time in MySQL	2023-04-26 07:49:54 +08:00
herry2038	17b59df8dd	[fix](function) Array_map compared offset rows one by one (#18406 ) Array_map 's multi columns compare not only nested data rows to be equal,but also the offsets data must equal each other.	2023-04-25 19:12:19 +08:00
Qi Chen	61b7a52444	[Enhancement](multi-catalogs) Use decimal V3 type in multi-catalogs module. (#18926 ) 1. Use decimal V3 type in JDBC and Iceberg tables. 2. Fix hdfs TVF decimal V3 type and regression test.	2023-04-25 14:49:40 +08:00
AKIRA	a4a85f2476	[feat](stats) Return job id for async analyze stmt (#18800 ) 1. Return job id from async analysis 2. Sync analysis jobs don't save to analysis_jobs anymore	2023-04-25 14:43:54 +08:00
Ashin Gau	39d66ca2c6	[fix](parquet) hasn't initialize select vector when number of nested values equals zero (#18953 ) Fix bug when reading array type in parquet file: ``` ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed, reason = [IO_ERROR]Decode too many values in current page ``` When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong. The situation where this error occurs is particularly extreme: The column pages have remaining values to be read, but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.	2023-04-25 14:21:33 +08:00
lihangyu	d555bae290	[Bug](serde) fix serialize column to jsonb when meet boolean and decimal_v3 (#19011 ) * [Bug](serde) fix serialize column to jsonb when meet boolean and decimal_v3 * add comment to explain why use uint8	2023-04-25 10:48:13 +08:00
ZhangYu0123	93c48f2bb0	[fix](regression) fix show create table in_memory = false test result error #19022	2023-04-25 09:04:59 +08:00
Mingyu Chen	207c827cdb	[fix](test) fix result of CHARACTER_OCTET_LENGTH in . (#18896 )	2023-04-25 08:42:54 +08:00
xueweizhang	efebb3d21e	[fix](schema) fix show create table get wrong random distribution info (#18895 ) * [fix](schema) fix show create table get wrong random distribution info --------- Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-04-24 23:33:42 +08:00
starocean999	6bf51150f3	[fix](nereids) remove unnecessary project above scan node (#18920 ) 1. remove unnecessary project node above scan node. 2. fix in subquery may be recognized as scalar subquery bug 3. fix some Quantile related functions' return type bug	2023-04-24 13:58:57 +08:00
Pxl	1f9450e0f7	[Chore](case) add some regression-test case about materialized-view #18946	2023-04-24 11:36:56 +08:00
Mellorsssss	ab2a6864bc	[function](json) Json unquote (#18037 )	2023-04-24 10:33:29 +08:00
Tiewei Fang	45d0f53529	[Regression-test](Export) add regression test for export #18897	2023-04-23 19:43:22 +08:00
Qi Chen	a9ac930e5f	[Fix](mutli-catalogs) Fix jdbc regression tests. (#18927 ) - Fix `test_show_where` result. - Remove `enable_decimal_conversion = true` in `test_mysql_jdbc_catalog`. - Remove `test_show_create_catalog`.	2023-04-23 19:42:13 +08:00
TengJianPing	b75f4c97f3	[function](string) support char function (#18878 ) * [function](string) support char function * fix	2023-04-22 08:36:48 +08:00
Mryange	de0e89d1b4	[feature](function) Modified cast as time to behave more like MySQL (#18565 ) Because the underlying type of time was float64, select cast("19:22:18" as time) would result in a null value in the past. Results in the following:	2023-04-22 06:11:59 +08:00
Qi Chen	6eea3d9e2d	[Test](multi-catalog) Fix test_hive_parquet regression test order issue. (#18879 ) l_orderkey cannot guarantee unique order.	2023-04-21 22:59:34 +08:00
Jibing-Li	425101bf53	[fix](test)Move broker test to p2. Move test data to cos in Beijing region (#18893 ) Fix broker load p2 test case error. 1. Move test data from cos Hong kong region to Beijing region. 2. Move broker load test to p2 group. 3. Fix error message mismatch error.	2023-04-21 22:15:52 +08:00
xueweizhang	f7651d8dfb	(fix)[olap] not support in_memory=true now (#18731 ) * (fix)[olap] can not set in_memory=true now --------- Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-04-21 21:55:37 +08:00
lihangyu	af20b2c95e	[Bug](topn opt) Fix be crash when enable topn opt with larger thresho… (#18858 ) topn opt should be inited when update it	2023-04-21 17:45:00 +08:00
Liqf	ec1ab1a3d2	[Improve](GEO)wkb input and output are represented as hexadecimal strings And delete EWKB (#18721 )	2023-04-21 15:11:18 +08:00
starocean999	c41b486e7e	[fix](nereids) LogicalProject should always has non-empty project list (#18863 )	2023-04-21 14:28:07 +08:00
ElvinWei	1a6401d682	[enchancement](statistics) support sampling collection of statistics (#18880 ) 1. Supports sampling to collect statistics 2. Improved syntax for collecting statistics 3. Support histogram specifies the number of buckets 4. Tweaked some code structure --- The syntax supports WITH and PROPERTIES, using the same syntax as before. Column Statistics Collection Syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] [ [WITH SYNC] \| [WITH INCREMENTAL] \| [WITH SAMPLE PERCENT \| ROWS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Column histogram collection syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] UPDATE HISTOGRAM [ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT \| ROWS ][ WITH BUCKETS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Illustrate： - sync：Collect statistics synchronously. Return after collecting. - incremental：Collect statistics incrementally. Incremental collection of histogram statistics is not supported. - sample percent \| rows：Collect statistics by sampling. Scale and number of rows can be sampled. - buckets：Specifies the maximum number of buckets generated when collecting histogram statistics. - table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`. - column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas. - properties：Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement) - 'sync' = 'true' - 'incremental' = 'true' - 'sample.percent' = '50' - 'sample.rows' = '1000' - 'num.buckets' = 10 --- TODO: - Supplement the complete p0 test - `Incremental` statistics see #18653	2023-04-21 13:11:43 +08:00
Pxl	c033c6239f	[Bug](table-function) fix wrong result when seprator of explode_split size more than one (#18824 ) fix wrong result when seprator of explode_split size more than one	2023-04-21 11:00:47 +08:00
Qi Chen	3328a65b75	[Fix](mutli-catalog) Use decimal v3 type to fix decimal loss issue in multi-catalog module. (#18835 ) Fix decimal v3 precision loss issues in the multi-catalog module. Now it will use decimal v3 to represent decimal type in the multi-catalog module. Regression Test: `test_load_with_decimal.groovy`	2023-04-20 11:02:53 +08:00
slothever	f280b04736	[regression-test](iceberg)add iceberg in regression case (#18792 ) add iceberg 'in' clause regression case for #18226	2023-04-19 15:09:20 +08:00
minghong	7d6b1a115a	[feature](nereids)Tpc-h 1T plan shape check #18717 add regression test to check tpc-h 1T plan shape	2023-04-19 12:00:54 +08:00
AKIRA	031d35d4a1	[fix](stats) Stats still in cache after user dropped it (#18720 ) 1. Evict the dropped stats from cache 2. Remove codes for the partition level stats collection 3. Disable analyze whole database directly 4. Fix the potential death loop in the stats cleaner 5. Sleep thread in each loop when scanning stats table to avoid excessive IO usage by this task.	2023-04-18 16:41:10 +08:00
TengJianPing	0b074ade02	[fix](const column) fix coredump caused by const column for some functions (#18737 )	2023-04-18 13:57:55 +08:00
zhangstar333	6b351a2818	[vectorzied](function) fix array_map function analyzed failed with order by clause (#18676 ) * [vectorzied](function) fix array_map function analyzed failed with order by clause * add test	2023-04-18 12:01:44 +08:00
Gabriel	74d424e6d4	[Bug](DECIMAL) Fix bug for arithmatic expr DECIMALV2 / DECIMALV3 (#18723 )	2023-04-17 16:43:36 +08:00
luozenglin	1e06763366	[fix](bitmap) fix bitmap_count errors to set nullable to non-nullable bitmap col (#18689 )	2023-04-17 13:23:27 +08:00
Gabriel	5300b21db7	[Bug](DECIMALV3) report failure if a decimal value is overflow (#18336 )	2023-04-17 13:18:14 +08:00

1 2 3 4 5 ...

1113 Commits