doris

Author	SHA1	Message	Date
Mryange	6dddd4c499	[function](cast)Make string casting to integers more like MySQL's beh… (#41541 ) …avior (#38847) https://github.com/apache/doris/pull/38847 ## Proposed changes There are two issues here. First, the results of casting are inconsistent between FE and BE . ``` FE mysql [(none)]>select cast('3.000' as int); +----------------------+ \| cast('3.000' as INT) \| +----------------------+ \| 3 \| +----------------------+ mysql [(none)]>set debug_skip_fold_constant = true; BE mysql [(none)]>select cast('3.000' as int); +----------------------+ \| cast('3.000' as INT) \| +----------------------+ \| NULL \| +----------------------+ ``` The second issue is that casting on BE converts '3.0' to null. Here, the casting logic for FE and BE has been unified <!--Describe your changes.--> ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> --------- Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>	2024-10-11 09:32:00 +08:00
morrySnow	a45dc8796a	[fix](Nereids) simplify decimal comparison wrong when cast to smaller scale (#41151 ) (#41618 ) pick from master #41151	2024-10-09 23:03:01 +08:00
hui lai	8eda15ae16	[opt](routine load) support routine load perceived schema change (#39412 ) (#40508 ) pick #39412 At present, if the table structure changes, the routine load cannot perceive it. As a long-running load, it should be able to perceive the changes in the table structure.	2024-09-10 11:05:58 +08:00
lihangyu	677435cef8	[Pick](Branch-2.1) pick json reader fix and support specify $. as column (#39271 ) #39206 #38213	2024-08-13 17:44:45 +08:00
hui lai	0ee0dd6ae3	[fix](routine load) reset Kafka progress cache when routine load job topic change (#38474 ) (#39181 ) pick (#38474) When change routine load job topic from test_topic_before to test_topic_after by ``` ALTER ROUTINE LOAD FOR test_topic_change FROM KAFKA("kafka_topic" = "test_topic_after"); ``` (test_topic_before has 5 rows and test_topic_after has 1 rows) Exception happened, which cannot consume any data: ``` 2024-07-29 15:57:28,122 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,123 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,125 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,126 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,128 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,129 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,131 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,133 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,134 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,136 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 2024-07-29 15:57:28,137 WARN (Routine load task scheduler\|55) [KafkaRoutineLoadJob.hasMoreDataToConsume():792] Kafka offset fallback. partition: 0, cache offset: 5 get latest of fset: 1, task 16656914-ba0a-465d-8e79-8252b423b0fc, job 16615 ``` It is necessary to reset Kafka progress cache when routine load job topic change.	2024-08-10 23:00:39 +08:00
morrySnow	3c535e80dd	[fix](compatibility) type toSql should return lowercase string (#38012 ) (#38517 ) pick from master #38012 revert #25951	2024-08-09 11:35:42 +08:00
hui lai	79b07d0b8a	[fix](routine load) fix enclose and escape can not set in routine load job (#38402 ) (#38825 ) pick (#38402)	2024-08-04 22:17:12 +08:00
hui lai	8e4fad99a1	[test](routine load) add routine load case with timestamp as offset(#38567 ) (#38822 ) pick (#38567)	2024-08-04 22:05:19 +08:00
amory	338fa32303	[pick](simdjson) fix simdjson with object array when jsonroot is not empty (#38633 ) ## Proposed changes backport: https://github.com/apache/doris/pull/38490 Issue Number: close #xxx <!--Describe your changes.-->	2024-08-01 11:04:54 +08:00
lihangyu	d9fd419e47	[Fix](JsonReader) fix json with duplicate key entry may result out of bound exception (#38147 ) #38146	2024-07-19 22:53:02 +08:00
caiconghui	9ff129b630	[fix](stream_load) fix stream load may failed caused by column name with keyword (#35822 ) (#37890 ) #35938 #35822 let KW_SQL, KW_CACHE, KW_COLOCATE, KW_COMPRESS_TYPE, KW_DORIS_INTERNAL_TABLE_ID, KW_HOTSPOT, KW_PRIVILEGES, KW_RECENT, KW_STAGES, KW_WARM, KW_UP, KW_CONVERT_LSC be as non-reserved ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> --------- Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2024-07-16 20:20:34 +08:00
Xin Liao	0a95757a4d	[opt](test) Optimize execution time of test_s3_load case #37562 (#37612 ) cherry pick from #37562	2024-07-10 19:09:46 +08:00
hui lai	c66df8d9e6	[branch-2.1](load) fix no error url if no partition can be found (#36831 ) (#37401 ) ## Proposed changes pick #36831 before ``` Stream load result: { "TxnId": 2014, "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf", "Comment": "", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "[DATA_QUALITY_ERROR]Encountered unqualified data, stop processing", "NumberTotalRows": 1, "NumberLoadedRows": 1, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 1669, "LoadTimeMs": 58, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 10, "ReadDataTimeMs": 0, "WriteDataTimeMs": 47, "CommitAndPublishTimeMs": 0 } ``` after ``` Stream load result: { "TxnId": 2014, "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf", "Comment": "", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "[DATA_QUALITY_ERROR]too many filtered rows", "NumberTotalRows": 1, "NumberLoadedRows": 0, "NumberFilteredRows": 1, "NumberUnselectedRows": 0, "LoadBytes": 1669, "LoadTimeMs": 58, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 10, "ReadDataTimeMs": 0, "WriteDataTimeMs": 47, "CommitAndPublishTimeMs": 0, "ErrorURL": "http://XXXX:8040/api/_load_error_log?file=__shard_4/error_log_insert_stmt_c6461270125a615b-2873833fb48d56a3_c6461270125a615b_2873833fb48d56a3" } ``` ## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->	2024-07-08 10:41:33 +08:00
Mingyu Chen	ceef9ee123	[feature](serde) support presto compatible output format (#37039 ) (#37253 ) bp #37039	2024-07-04 13:56:05 +08:00
starocean999	6f944549d1	[fix](regression)fix case failure (#37058 )	2024-07-02 09:55:18 +08:00
gnehil	a6a84b8ecc	[improvement](stream load)(cherry-pick) support hll_from_base64 for stream load column mapping (#36819 ) picked from https://github.com/apache/doris/pull/35923	2024-06-26 20:12:40 +08:00
starocean999	cbaff8a700	[fix](nereids)change the decimal's precision and scale for cast(xx as decimal) (#36540 ) pick from master #36316 expression cast( xx as decimal )'s datatype maybe decimalv3 or decimalv2 depending on enable_decimal_conversion value in fe conf file. if enable_decimal_conversion is true, the datatype is decimalv3(9, 0), but the datatype was decimalv3(38, 9) in 2.0 releases. So this pr change the datatype same as 2.0 releases to keep the behavior consistent.	2024-06-20 17:46:11 +08:00
lw112	48d4601ee3	[regression-test](load) add something like $.tag.[a.b] key's json case (#35134 )	2024-05-31 22:45:09 +08:00
minghong	50e81d9db7	[feat](nereids) add more rules to eliminate empty relation (#34997 ) -branch-2.1 (#35534 ) eliminate empty relations for following patterns: topn->empty sort->empty distribute->empty project->empty (cherry picked from commit 8340f23946c0c8e40510ce937acd3342cb2e28b7) ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 18:12:42 +08:00
feiniaofeiafei	1e07971a98	[Feat](nereids)when dealing insert into stmt with empty table source, fe returns directly (#35333 ) * [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418) When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation. When dealing insert into stmt with empty table source, fe returns directly. * [Fix](nereids) fix when insert into select empty table --------- Co-authored-by: feiniaofeiafei <moailing@selectdb.com>	2024-05-24 16:25:00 +08:00
HHoflittlefish777	7ba66c5890	[branch-2.1](routine-load) do not schedule task when there is no data (#34654 )	2024-05-11 11:01:18 +08:00
Mingyu Chen	8da260ee0d	[fix](hdfs)read 'fs.defaultFS' from core-site.xml for hdfs load which has no default fs (#34217 ) (#34372 ) bp #34217 Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>	2024-05-01 00:31:49 +08:00
Xin Liao	cd1c9edd71	[fix](pipeline-load) fix no error url when data quality error and total rows is negative (#34072 ) (#34204 ) Co-authored-by: HHoflittlefish777 <77738092+HHoflittlefish777@users.noreply.github.com>	2024-04-27 18:19:08 +08:00
HHoflittlefish777	55d5ed9ab6	[test](streamload) add load empty file regression test (#34110 )	2024-04-26 07:42:09 +08:00
xy720	080c07ad87	[bug](random distribution) fix data loss and incorrect in random distribution table #33962	2024-04-24 17:13:50 +08:00
Pxl	5a5063be20	[bug](fix) heap use after free when json parse failed (#33955 )	2024-04-22 22:33:24 +08:00
Tiewei Fang	36a70ba1e7	[Fix](Csv-Reader)Fix the issue of BE core dump caused by improper configuration of column_seperator and line_delimiter. (#33693 )	2024-04-20 20:06:48 +08:00
amory	2648a92594	[FIX](load)fix load with split-by-string (#33713 )	2024-04-17 23:42:14 +08:00
HHoflittlefish777	b85bf3b6b0	[test](cast) add test for stream load cast (#33189 )	2024-04-10 15:26:09 +08:00
超威老仲	b0b5f84e40	[feature](load) support compressed JSON format data for broker load (#30809 )	2024-04-10 14:20:53 +08:00
meiyi	7484a7ba5f	[fix](broker load) improve the checking of overlapping partitions of same table (#32254 )	2024-03-21 14:07:24 +08:00
Mryange	8bd101129a	[behavior change](output) change float output format (#32049 )	2024-03-21 14:07:22 +08:00
lihangyu	278b232e76	[Bug](json reader) object should stop processing when encounter error (#31159 ) If DATA_QUALITY_ERROR encountered we should stop processing this document any more.Otherwise there will be UB in simdjson.	2024-02-21 13:53:32 +08:00
HowardQin	0d32aeeaf6	[improvement](load) Enable lzo & Remove dependency on Markus F.X.J. Oberhumer's lzo library (#30573 ) Issue Number: close #29406 1. increase lzop version to 0x1040, I set to 0x1040 only for decompressing lzo files compressed by higher version of lzop, no change of decompressing logic, actully, 0x1040 should have "F_H_FILTER" feature, but it mainly for audio and image data, so we do not support it. 2. use orc::lzoDecompress() instead of lzo1x_decompress_safe() to decompress lzo data 3. use crc32c::Extend() instead of lzo_crc32() 4. use olap_adler32() instead of lzo_adler32() 5. thus, remove dependency of Markus F.X.J. Oberhumer's lzo library 6. remove DORIS_WITH_LZO, so lzo file are supported by stream and broker load by default 7. add some regression test	2024-02-05 22:00:24 +08:00
Guangdong Liu	009bca9652	[regression test](broker load) add partition load case (#28259 )	2024-01-30 15:30:39 +08:00
Guangdong Liu	5f20d7c5d0	[regression test](stream load) test for `enable_profile` (#28534 )	2024-01-30 15:30:39 +08:00
HowardQin	17cf4ab2c1	[case](regression) streamload publish timeout (#29457 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2024-01-07 19:50:16 +08:00
HowardQin	8a169b9906	[case](regression) Test enable pipeline load (#28172 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-12-28 10:49:19 +08:00
lw112	172f68480b	[Enhancement](load) Limit the number of incorrect data drops and add documents (#27727 ) In the load process, if there are problems with the original data, we will store the error data in an error_log file on the disk for subsequent debugging. However, if there are many error data, it will occupy a lot of disk space. Now we want to limit the number of error data that is saved to the disk. Be familiar with the usage of doris' import function and internal implementation process Add a new be configuration item load_error_log_limit_bytes = default value 200MB Use the newly added threshold to limit the amount of data that RuntimeState::append_error_msg_to_file writes to disk Write regression cases for testing and verification Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>	2023-12-22 10:43:18 +08:00
Guangdong Liu	1e08845fc5	[regression test](broker load) add case for sequence col (#27583 )	2023-12-16 22:47:20 +08:00
Zhiyu Hu	4f5821407f	[case]Load data with load_parallelism=any > 1 and stream load with compress type (#27306 )	2023-12-13 18:41:14 +08:00
Guangdong Liu	43327383c3	[regression test](broker laod) add exception case for merge type (#27840 )	2023-12-13 18:34:34 +08:00
meiyi	45b2dbab6a	[improve](group commit) Group commit support max filter ratio when rows is less than value in config (#28139 )	2023-12-12 16:33:36 +08:00
Ma1oneZhang	e49ed3d885	[regression test](memtable) add case for aggregation memtable (#28056 ) 1. create aggregation table 2. insert some data 3. drop the table and create again 4. modify some parameters for some branch 5. insert some data 6. change the parameters back to its default	2023-12-12 11:14:59 +08:00
meiyi	1e5ff40e17	[refactor](group commit) remove future block (#27720 ) Co-authored-by: huanghaibin <284824253@qq.com>	2023-12-11 08:41:51 +08:00
HHoflittlefish777	3f3843fd4f	[fix](load) fix loaded rows error (#28061 )	2023-12-07 14:43:51 +08:00
Guangdong Liu	616ba2823f	[regression test](broker load) Test case missing files (#27580 )	2023-12-07 10:17:07 +08:00
abmdocrt	605257ccb7	[Enhancement](group commit) Add regression case for wal limit (#27949 )	2023-12-06 14:23:50 +08:00
amory	358d73a0ae	[FIX](complextype) fix empty quote with complex type (#27942 )	2023-12-05 12:25:26 +08:00
HHoflittlefish777	97d36b4f38	[fix](csv_reader) fix trim_double_quotes behavior change (#27882 )	2023-12-03 22:57:55 +08:00

1 2 3 4

181 Commits