doris

Author	SHA1	Message	Date
Siyang Tang	9cd9e195d8	[test](load) add some s3 load regression test (#24399 )	2023-09-23 22:40:45 +08:00
amory	ce79711b0d	[FIX](serde) fix map/array deserialize string with quote pair (#24808 )	2023-09-23 01:12:20 +08:00
HHoflittlefish777	74bba4bdaf	[enhancement](regression-test) Add routine load case (#24536 )	2023-09-22 14:55:01 +08:00
wangqt	7c32b2b4ed	[Fix](broker load) broker load with or predicate error fix #24157 Co-authored-by: wangqingtao6 <wangqingtao6@jd.com>	2023-09-20 14:56:32 +08:00
daidai	c704497d02	[fix](csv_reader)Fixed bug when parsing multi-character delimiters. (#24572 ) Fixed bug when parsing multi-character delimiters.	2023-09-20 12:41:35 +08:00
daidai	4b5cea1ef8	[enhancement](fix)change ordinary type null value is \N,complex type null value is null (#24207 )	2023-09-16 21:46:42 +08:00
Siyang Tang	f61e6483bf	[enhancement](broker-load) support compress type for old broker load, and split compress type from file format (#23882 )	2023-09-14 21:42:28 +08:00
amory	268c867679	[Improve](serde)replace function_cast from_string to serde (#24087 ) Now we can not support streamload with column which is map/array nested map/array serde can do this now , so we can replace it Notice. if item data in complex type data is empty we just return error, instead of makeup default value , because now we can not define right default for complex type	2023-09-14 13:53:16 +08:00
Siyang Tang	232b58a27d	[fix](broker-load) make sequence column name case insensitive (#24071 )	2023-09-10 10:51:07 +08:00
Mingyu Chen	650af8f4df	[fix](test) fix broker load with default value test case (#24123 )	2023-09-10 10:28:22 +08:00
daidai	f9a75b5c4f	[feature](csv_serde)1.append csv serde for serialize to csv and deserialize from csv. 2.let csvReader use csv serde not text_converter. (#23352 ) 1. append csv serde for serialize to csv and deserialize from csv. 2. let csvReader use csv serde not text_converter.	2023-09-10 00:16:21 +08:00
Calvin Kirs	4dac2d3b94	[Fix](Plan)StreamLoad cannot be parsed correctly when it contains complex where conditions (#23874 )	2023-09-05 11:26:59 +08:00
zzzzzzzs	e525e021ee	[Enhancement](Load) stream tvf support csv header (#23797 ) Co-authored-by: yiguolei <676222867@qq.com>	2023-09-05 11:15:45 +08:00
zzzzzzzs	6630f92878	[Enhancement](Load) stream tvf support json (#23752 ) stream tvf support json [{"id":1, "name":"ftw", "age":18}] [{"id":2, "name":"xxx", "age":17}] [{"id":3, "name":"yyy", "age":19}] example: curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select id, name from http_stream(\"format\" = \"json\", \"strip_outer_array\" = \"true\", \"read_json_by_line\" = \"true\")" -T /root/json_file.json http://127.0.0.1:8030/api/_http_stream	2023-09-02 01:09:06 +08:00
zzzzzzzs	05771e8a14	[Enhancement](Load) stream Load using SQL (#23362 ) Using stream load in SQL mode for example: example.csv 10000,北京 10001,天津 curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select c1,c2 from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql curl -v --location-trusted -u root: -H "sql: insert into test.t2(c1, c2, c3) select c1,c2, 'aaa' from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql curl -v --location-trusted -u root: -H "sql: insert into test.t3(c1, c2) select c1, count(1) from stream(\"format\" = \"CSV\", \"column_separator\" = \",\") group by c1" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql	2023-08-30 19:02:48 +08:00
Kaijie Chen	a5761a25c5	[feature](move-memtable)[7/7] add regression tests (#23515 ) Co-authored-by: laihui <1353307710@qq.com>	2023-08-26 17:52:10 +08:00
mch_ucchi	7f1857b4e7	[fix](regression-test) fix unstable case load_p0/insert/test_insert.groovy (#23326 )	2023-08-23 16:22:49 +08:00
daidai	2dda44d7b5	[fix](csv-reader)fix bug of multi-char delimiter in csv reader fix bug that csv_reader parse line in order to get column.	2023-08-23 15:19:13 +08:00
Siyang Tang	b49dc8042d	[feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539 ) ## Proposed changes Refactor thoughts: close #22383 Descriptions about `enclose` and `escape`: #22385 ## Further comments 2023-08-09: It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic. Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph. Trimming escape will be enable after fix: #22411 is merged Cases should be discussed: 1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large? 2. What if an infinite line occurs in the case? Essentially, `1.` is equivalent to this. Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.	2023-08-15 09:23:53 +08:00
zzzzzzzs	66784cef71	[Enhancement](Load) Stream Load using SQL (#22509 ) This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first. thanks @Cai-Yao @yiguolei	2023-08-08 13:49:04 +08:00
Xujian Duan	3024b82918	[fix](load)Fix wrong default value for char and varchar of reading json data (#22626 ) If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct: +------+ \| col \| +------+ \| 1 \| +------+ But expect: +------+ \| col \| +------+ \| NULL \| +------+ --------- Co-authored-by: duanxujian <duanxujian@jd.com>	2023-08-05 12:47:27 +08:00
amory	7261845b3d	[FIX](complex-type)fix complex type nested col_const (#22375 ) for array/map/struct in mysql_writer unpack_if_const only unpack self column not nested , so col_const should not used in nested column.	2023-07-31 14:53:18 +08:00
lihangyu	40299d280d	[Fix](json reader) fix rapidjson `array->PushBack` may take ownership… (#21988 ) With bellow json path `["$.data","$.data.datatimestamp"]` After `array_obj->PushBack` the `data` field owner will be taken from array_obj, and lead to null values for json path `$.data.datatimestamp` Rapidjson doc: ``` //! Append a GenericValue at the end of the array. \note The ownership of \c value will be transferred to this array on success. */ GenericValue& PushBack(GenericValue& value, Allocator& allocator); ```	2023-07-21 17:02:01 +08:00
lihangyu	20242d9a0e	[Improve](simdjson) put unescaped string value after parsed (#21866 ) In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB. If not unescape, then later jsonb parse will be failed	2023-07-20 10:33:17 +08:00
HHoflittlefish777	6a0a21d8b0	[regression-test](load) add streamload default value test (#21536 )	2023-07-06 10:14:13 +08:00
HHoflittlefish777	2beed11256	[Bug](streamload) fix inconsistent load result of be and fe (#20950 )	2023-06-21 18:12:51 +08:00
Siyang Tang	8366ce7a81	[enhancement](insert-stmt) Make `insert into tbl values();` compatible with mysql (#20694 )	2023-06-18 19:56:07 +08:00
Xujian Duan	0b228b3414	[fix](load)Support load json data with default value (#20624 ) * support json default value --------- Co-authored-by: duanxujian <duanxujian@jd.com>	2023-06-12 14:51:31 +08:00
Gabriel	55ccddb62c	[Conf](decimalv3) enable decimalv3 by default	2023-05-29 15:38:31 +08:00
amory	ee34b6de2d	[Refact] (serde) refact mysql serde with data type (#19543 ) refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter	2023-05-26 14:11:17 +08:00
lvshaokang	f14e6189a9	[feature](load-refactor) Unfied mysql load use InsertStmt (#19571 )	2023-05-24 12:09:16 +08:00
WenYao	481e9aebdb	[Refactor](spark load) remove parquet scanner (#19251 )	2023-05-18 19:19:13 +08:00
xy720	39ec8aa64c	[refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068 )	2023-05-11 18:44:37 +08:00
xy720	925efc1902	[bug](map-type)fix some bugs in map and map element function (#18935 ) fix some bugs in map and map element function.	2023-04-26 22:10:15 +08:00
Jibing-Li	425101bf53	[fix](test)Move broker test to p2. Move test data to cos in Beijing region (#18893 ) Fix broker load p2 test case error. 1. Move test data from cos Hong kong region to Beijing region. 2. Move broker load test to p2 group. 3. Fix error message mismatch error.	2023-04-21 22:15:52 +08:00
Qi Chen	3328a65b75	[Fix](mutli-catalog) Use decimal v3 type to fix decimal loss issue in multi-catalog module. (#18835 ) Fix decimal v3 precision loss issues in the multi-catalog module. Now it will use decimal v3 to represent decimal type in the multi-catalog module. Regression Test: `test_load_with_decimal.groovy`	2023-04-20 11:02:53 +08:00
lihangyu	6c0af24e9d	[Improve](simdjson reader) support UTF-8 unicode (with BOM) (#18585 )	2023-04-13 21:58:44 +08:00
huangzhaowei	7c36bef6bc	[Feature-Wip](MySQL Load)Show load warning for my sql load (#18224 ) 1. Support the show load warnings for mysql load to get the detail error message. 2. Fix fillByteBufferAsync not mark the load as finished in same data load 3. Fix drain data only in client mode.	2023-04-04 22:44:48 +08:00
amory	ea41d94582	[Improve](complex-type) Support Count(complexType) (#17868 ) Support count function for ARRAY/MAP/STRUCT type	2023-03-30 15:43:32 +08:00
huangzhaowei	bae9d8d7f2	[Feature-Wip](MySQL LOAD)Add trim quotes property for mysql load (#17775 ) Add trim quotes property for mysql load to trim double quotes in the load files.	2023-03-21 00:32:58 +08:00
Kang	5d3de05976	[feature](map) basic functions for map datatype (#16916 ) basic functions for map datatype: - MAP<K, V> map(K k1, V v1, ...) - BIGINT map_size(MAP<K, V> m) - BOOL map_contains_key(MAP<K, V> m, K k1) - BOOL map_contains_value(MAP<K, V> m, V v1) - ARRAY< K> map_keys(MAP<K, V> m) - ARRAY< V> map_values(MAP<K, V> m)	2023-03-17 10:28:17 +08:00
amory	ee7226348d	[FIX](Map) fix map compaction error (#17795 ) When compaction case, memory map offsets coming to same olap convertor which is from 0 to 0+size but it should be continue in different pages when in one segment writer . eg : last block with map offset : [3, 6, 8, ... 100] this block with map offset : [5, 10, 15 ..., 100] the same convertor should record last offset to make later coming offset followed last offset. so after convertor : the current offset should [105, 110, 115, ... 200], then column writer just call append_data() to make the right offset data append pages	2023-03-16 13:54:01 +08:00
Tiewei Fang	13e05c4a5d	[Enhencement](stream load) add some regression test for json format streamload (#17520 )	2023-03-12 20:13:07 +08:00
amory	06dee69174	[Refactor](map) remove using column array in map to reduce offset column (#17330 ) 1. remove column array in map 2. add offsets column in map Aim to reduce duplicate offset from key-array and value-array in disk	2023-03-09 11:22:26 +08:00
Kang	4b743061b4	[feature](function) support type template in SQL function (#17344 ) A new way just like c++ template is proposed in this PR. The previous functions can be defined much simpler using template function. # map element extract template function [['element_at', '%element_extract%'], 'E', ['ARRAY<E>', 'BIGINT'], 'ALWAYS_NULLABLE', ['E']], # map element extract template function [['element_at', '%element_extract%'], 'V', ['MAP<K, V>', 'K'], 'ALWAYS_NULLABLE', ['K', 'V']], BTW, the plain type function is not affected and the legacy ARRAY_X MAP_K_V is still supported for compatability.	2023-03-08 10:51:31 +08:00
luozenglin	e7cba11680	[fix](array)(parquet) fix be core dump due to load from parquet file containing array types (#17298 )	2023-03-06 15:18:42 +08:00
xy720	0732eb54bc	[feature](struct-type) support csv format stream load for struct type (#17143 ) Refactor from_string method in data_type_struct.cpp to support csv format stream load for struct type.	2023-03-01 15:48:48 +08:00
huangzhaowei	d3a6cab716	[Fix](MySQLLoad) Fix load a big local file bug since bytebuffer from mysql packet using the same byte array (#16901 ) Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array. And the later bytes will overwrite the previous one and make wrong bytes order among the network. Copy the byte array and then fill it into network.	2023-02-28 00:06:44 +08:00
lihangyu	29dc08fc45	[Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. (#17124 ) * [Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. `_simdjson_set_column_value` could become a hot spot while parsing json in simdjson mode, introduce `_prev_positions` to cache results for previous row (keyed as index in JSON object) due to the json name field order, should be quite the same between each lines * fix case	2023-02-27 10:39:22 +08:00
lihangyu	113023fb86	(Enhancement)[load-json] support simdjson in new json reader (#16903 ) be config: enable_simdjson_reader=true related PR #11665	2023-02-21 11:31:00 +08:00

1 2

81 Commits