doris

Author	SHA1	Message	Date
Qi Chen	845d459f05	[Fix](orc-reader) Fix some bugs of orc lazy materialization. (#20410 ) Fix some bugs of orc lazy materialization(#18615) - Fix issue causing column size to continuously increase after `execute_conjuncts()` by calling `Block::erase_useless_column()`. - Fix partition issues of orc lazy materialization. - Fix lazy materialization will not be used when the predicate column is inconsistent with the orc file.	2023-06-09 08:53:01 +08:00
Qi Chen	4faee4d8fd	[Fix](multi-catalog) Fix be crashed when query hive table after schema changed(new column added). (#20537 ) Fix be crashed when query hive table after schema changed(new column added). Regression Test: test_hive_schema_evolution.groovy	2023-06-08 18:10:36 +08:00
Jibing-Li	3f91127854	[fix](regression)Update external Brown test case out file. #20232 Update external Brown test case out file to match the new precision.	2023-05-31 09:21:04 +08:00
Ashin Gau	b7a69fbf4b	[test](regression) add regression test from materialized slot bug (#20207 ) The test query includes the conversion of string types to other types, and the processing of materialized columns for nested subqueries, which is the regression test for bug fix(#18783)	2023-05-30 21:23:05 +08:00
Jibing-Li	9c22fc4130	[fix](multi catalog)Support Hive partiton manually removed (#20024 ) If the user manually removed a hive partition (remove the partition dir through hdfs), doris will failed to query the hive table with an error message get file split failed for table. That is because the Hive metadata still contains the removed partition. This pr is to fix this bug. Skip the not exist dirs.	2023-05-26 10:32:45 +08:00
Jibing-Li	4398b91576	[Fix](multi catalog)Change all partition names to lower case (#19816 ) Iceberg table partition name may contain upper case characters, for example: City=xxx, Nation=xxx. But in Doris, all column names are in lower case. Here we transfer the partition name to lower case to keep consist with column name.	2023-05-23 09:31:31 +08:00
Qi Chen	1d01136b1b	[Fix](parquet-reader) Fix partition field conjuncts not work. (#19837 ) Fix partition field conjuncts not work. Add predicate_partition_columns in _slot_id_to_filter_conjuncts(single slot conjuncts) to _filter_conjuncts, others should had been added from not_single_slot_filter_conjuncts.	2023-05-19 08:44:02 +08:00
Ashin Gau	30c4f25cb3	[fix](multi-catalog) verify the precision of datetime types for each data source (#19544 ) Fix threes bugs of timestampv2 precision: 1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2; 2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost. 3. TVF doesn't use the precision from meta data of file format.	2023-05-17 20:50:15 +08:00
Jibing-Li	68505a1192	[Test](multi catalog)Add test case for Iceberg External Table. #19488	2023-05-11 01:13:40 +08:00
Jibing-Li	59d8aa5a6f	[Fix](multi catalog)Fix Hive partition path doesn't contain partition value case bug (#19053 ) Hive support create partition with a specific location. In this case, the file path for the create partition may not contain the partition name and value. Which will cause Doris fail to query the the hive partition. This pr is to fix this bug.	2023-04-26 17:18:51 +08:00
Ashin Gau	39d66ca2c6	[fix](parquet) hasn't initialize select vector when number of nested values equals zero (#18953 ) Fix bug when reading array type in parquet file: ``` ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed, reason = [IO_ERROR]Decode too many values in current page ``` When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong. The situation where this error occurs is particularly extreme: The column pages have remaining values to be read, but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.	2023-04-25 14:21:33 +08:00
Qi Chen	3328a65b75	[Fix](mutli-catalog) Use decimal v3 type to fix decimal loss issue in multi-catalog module. (#18835 ) Fix decimal v3 precision loss issues in the multi-catalog module. Now it will use decimal v3 to represent decimal type in the multi-catalog module. Regression Test: `test_load_with_decimal.groovy`	2023-04-20 11:02:53 +08:00
slothever	f280b04736	[regression-test](iceberg)add iceberg in regression case (#18792 ) add iceberg 'in' clause regression case for #18226	2023-04-19 15:09:20 +08:00
Qi Chen	eb0fd0017e	[Fix](orc-reader) Fix the scale of decimal column is incorrect when query orc tables. (#18324 ) The scale of decimal column is incorrect when query orc tables.	2023-04-04 08:50:47 +08:00
slothever	97aab138aa	[fix](parquet-reader) reset value idx in bool rle decoder and support iceberg datetime(3) (#18245 ) 1. Fix value idx in bool rle decoder 2. Iceberg table support datetimev2(3). In the previous version, we converted hive timestamp to datetimev2(0) default.	2023-04-01 21:00:01 +08:00
slothever	678314d657	[fix](regression)fix glue regression (#17952 )	2023-03-24 00:10:20 +08:00
slothever	455c800405	[feature](parquet-reader) add rle bool and delta decoder to read AWS Glue (#17112 ) Support delta encoding and rle(bool) to read Glue data add delta bit pack decoder, add delta length byte array decoder, add delta byte array decoder. add rle bool decoder. We find some data type is read with delta encoding on AWS Glue, so it should be supported. The definition of delta encoding can refer to the delta encoding in parquet.	2023-03-12 20:09:58 +08:00
Jibing-Li	dd1bd6d8f1	[Fix](multi catalog)Support hive default partition. (#17179 ) Hive store all the data without partition columns to a default partition named __HIVE_DEFAULT_PARTITION__. Doris will fail to get the this partition when the partition column type is INT or something else that __HIVE_DEFAULT_PARTITION__ couldn't convert to. This pr is to support hive default partition, set the column value to NULL for the missing partition columns.	2023-02-28 00:08:29 +08:00
Mingyu Chen	f940cf4cf6	[fix](multi-catalog) fix recursive get schema cache bug (#16415 )	2023-02-06 09:23:07 +08:00
Ashin Gau	1973b3a86f	[test](regression) add tvf regression to test the remove of eof check (#16342 ) Add regression test for #16302. This regression test will be failed if add EOF check for non-predicate columns.	2023-02-02 10:06:36 +08:00
Jibing-Li	1589d453a3	[fix](multi catalog)Support parquet and orc upper case column name (#16111 ) External hms catalog table column names in doris are all in lower case, while iceberg table or spark-sql created hive table may contain upper case column name, which will cause empty query result. This pr is to fix this bug. 1. For parquet file, transfer all column names to lower case while parse parquet metadata. 2. For orc file, store the origin column names and lower case column names in two vectors, use the suitable names in different cases. 3. FE side, change the column name back to the origin column name in iceberg while doing convertToIcebergExpr.	2023-01-27 23:52:11 +08:00
Mingyu Chen	4035bd83c3	[fix](jdbc) fix jdbc driver bug and external datasource p2 test case issue (#16033 ) Fix bug that when create jdbc resource with only jdbc driver file name, it will failed to do checksum This is because we forgot the pass the full driver url to JdbcClient. Add ResultSet.FETCH_FORWARD and set AutoCommit to false to jdbc connection, so to avoid OOM when fetching large amount of data set useCursorFetch in jdbc url for both MySQL and PostgreSQL. Fix some p2 external datasource bug	2023-01-18 17:48:06 +08:00
slothever	2339dcda05	[fix](icebergv2)update icebergv2 regression case (#15442 ) update icebergv2 regression case Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-12-30 12:24:26 +08:00
slothever	3ff01ca799	[feature-wip](multi-catalog) support Iceberg time travel in external table (#15418 ) For example SELECT* FROM tbl FOR VERSION AS OF 10963874102873; SELECT* FROM tbl FOR TIME AS OF '1986-10-26 01:21:00';	2022-12-30 00:25:21 +08:00
Jibing-Li	0e154feeb9	[feature](multi catalog nereids)Add file scan node to nereids. (#15201 ) Add file scan node to nereids, so that the new planner could support external hms table.	2022-12-29 10:31:11 +08:00
Ashin Gau	fc8f6a0715	[fix](multi-catalog) throw NPE when reading data after EOF (#15358 ) 1. Fix 1 bug: Throw null pointer exception when reading data after the reader reaches the end of file, so should return directly when `_do_lazy_read` read no data. 2. Optimize code: Remove unused parameters. 3. Fix regression test	2022-12-26 22:49:35 +08:00
slothever	ede68e075d	[fix](iceberg-v2) fix fe iceberg split, add regression case (#15299 )	2022-12-23 19:33:00 +08:00
slothever	2bb4ea5dea	[regresion-test](icebergv2) add icebergv2 test case (#15187 )	2022-12-22 13:45:07 +08:00
Jibing-Li	3506b568ff	[Regression](multi catalog)P2 regression case for external hms catalog on emr. #15156	2022-12-19 09:21:48 +08:00
lsy3993	a078a0d602	[test](catalog)add some emr hive case (#14848 )	2022-12-07 14:41:57 +08:00

30 Commits