doris

Author	SHA1	Message	Date
Siyang Tang	9cd9e195d8	[test](load) add some s3 load regression test (#24399 )	2023-09-23 22:40:45 +08:00
HHoflittlefish777	74bba4bdaf	[enhancement](regression-test) Add routine load case (#24536 )	2023-09-22 14:55:01 +08:00
wangqt	7c32b2b4ed	[Fix](broker load) broker load with or predicate error fix #24157 Co-authored-by: wangqingtao6 <wangqingtao6@jd.com>	2023-09-20 14:56:32 +08:00
Xinyi Zou	fc12362a6d	[feature-wip](arrow-flight)(step2) FE support Arrow Flight server (#24314 ) This is a POC, the design documentation will be updated soon	2023-09-20 14:42:54 +08:00
daidai	c704497d02	[fix](csv_reader)Fixed bug when parsing multi-character delimiters. (#24572 ) Fixed bug when parsing multi-character delimiters.	2023-09-20 12:41:35 +08:00
mch_ucchi	79fbc2e819	[regression-test](planner)add test for insert default values (#23559 ) Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>	2023-09-18 12:24:54 +08:00
Siyang Tang	f61e6483bf	[enhancement](broker-load) support compress type for old broker load, and split compress type from file format (#23882 )	2023-09-14 21:42:28 +08:00
amory	268c867679	[Improve](serde)replace function_cast from_string to serde (#24087 ) Now we can not support streamload with column which is map/array nested map/array serde can do this now , so we can replace it Notice. if item data in complex type data is empty we just return error, instead of makeup default value , because now we can not define right default for complex type	2023-09-14 13:53:16 +08:00
zhangguoqiang	026cefbfbc	[Regresstion](external)open case test_doris_jdbc_catalog (#24093 ) open case test_doris_jdbc_catalog	2023-09-13 10:09:58 +08:00
XuJianxu	7e467c91d3	[test](regression) add routine load cases (#24194 ) add routine load cases	2023-09-12 18:00:01 +08:00
Siyang Tang	232b58a27d	[fix](broker-load) make sequence column name case insensitive (#24071 )	2023-09-10 10:51:07 +08:00
Mingyu Chen	650af8f4df	[fix](test) fix broker load with default value test case (#24123 )	2023-09-10 10:28:22 +08:00
daidai	f9a75b5c4f	[feature](csv_serde)1.append csv serde for serialize to csv and deserialize from csv. 2.let csvReader use csv serde not text_converter. (#23352 ) 1. append csv serde for serialize to csv and deserialize from csv. 2. let csvReader use csv serde not text_converter.	2023-09-10 00:16:21 +08:00
Calvin Kirs	4dac2d3b94	[Fix](Plan)StreamLoad cannot be parsed correctly when it contains complex where conditions (#23874 )	2023-09-05 11:26:59 +08:00
zzzzzzzs	e525e021ee	[Enhancement](Load) stream tvf support csv header (#23797 ) Co-authored-by: yiguolei <676222867@qq.com>	2023-09-05 11:15:45 +08:00
zzzzzzzs	6630f92878	[Enhancement](Load) stream tvf support json (#23752 ) stream tvf support json [{"id":1, "name":"ftw", "age":18}] [{"id":2, "name":"xxx", "age":17}] [{"id":3, "name":"yyy", "age":19}] example: curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select id, name from http_stream(\"format\" = \"json\", \"strip_outer_array\" = \"true\", \"read_json_by_line\" = \"true\")" -T /root/json_file.json http://127.0.0.1:8030/api/_http_stream	2023-09-02 01:09:06 +08:00
zzzzzzzs	05771e8a14	[Enhancement](Load) stream Load using SQL (#23362 ) Using stream load in SQL mode for example: example.csv 10000,北京 10001,天津 curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select c1,c2 from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql curl -v --location-trusted -u root: -H "sql: insert into test.t2(c1, c2, c3) select c1,c2, 'aaa' from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql curl -v --location-trusted -u root: -H "sql: insert into test.t3(c1, c2) select c1, count(1) from stream(\"format\" = \"CSV\", \"column_separator\" = \",\") group by c1" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql	2023-08-30 19:02:48 +08:00
shuke	93db9b455a	[test](fix case) fix sql user conflict in test case (#23583 )	2023-08-29 11:33:49 +08:00
Kaijie Chen	82fe5aa5a0	[fix](regression) rename tables in test_stream_load_move_memtable (#23545 )	2023-08-28 14:31:00 +08:00
Kaijie Chen	a5761a25c5	[feature](move-memtable)[7/7] add regression tests (#23515 ) Co-authored-by: laihui <1353307710@qq.com>	2023-08-26 17:52:10 +08:00
mch_ucchi	7f1857b4e7	[fix](regression-test) fix unstable case load_p0/insert/test_insert.groovy (#23326 )	2023-08-23 16:22:49 +08:00
daidai	2dda44d7b5	[fix](csv-reader)fix bug of multi-char delimiter in csv reader fix bug that csv_reader parse line in order to get column.	2023-08-23 15:19:13 +08:00
Siyang Tang	2ad46c5826	[fix](show) show load warning support load v2 (#22759 )	2023-08-22 20:08:19 +08:00
Dongyang Li	dcd51c304a	Update test_csv_with_enclose_and_escape.groovy (#23173 )	2023-08-21 17:08:25 +08:00
Siyang Tang	b49dc8042d	[feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539 ) ## Proposed changes Refactor thoughts: close #22383 Descriptions about `enclose` and `escape`: #22385 ## Further comments 2023-08-09: It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic. Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph. Trimming escape will be enable after fix: #22411 is merged Cases should be discussed: 1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large? 2. What if an infinite line occurs in the case? Essentially, `1.` is equivalent to this. Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.	2023-08-15 09:23:53 +08:00
zzzzzzzs	66784cef71	[Enhancement](Load) Stream Load using SQL (#22509 ) This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first. thanks @Cai-Yao @yiguolei	2023-08-08 13:49:04 +08:00
Xujian Duan	3024b82918	[fix](load)Fix wrong default value for char and varchar of reading json data (#22626 ) If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct: +------+ \| col \| +------+ \| 1 \| +------+ But expect: +------+ \| col \| +------+ \| NULL \| +------+ --------- Co-authored-by: duanxujian <duanxujian@jd.com>	2023-08-05 12:47:27 +08:00
Dongyang Li	ef0e0b7d79	[case](fix) add sync after stream load (#22601 )	2023-08-05 08:28:26 +08:00
Ashin Gau	89433f6a13	[fix](complex_type) throw error when reading complex types in broker/stream load (#22331 ) Check whether there are complex types in parquet/orc reader in broker/stream load. Broker/stream load will cast any type as string type, and complex types will be casted wrong. This is a temporary method, and will be replaced by tvf.	2023-07-31 22:23:08 +08:00
amory	7261845b3d	[FIX](complex-type)fix complex type nested col_const (#22375 ) for array/map/struct in mysql_writer unpack_if_const only unpack self column not nested , so col_const should not used in nested column.	2023-07-31 14:53:18 +08:00
zhangguoqiang	3eeca7ee55	[enhance](regresstion case)add external group mark 0727 (#22287 ) * add external group mark 0727 * add external pipeline regression conf 0727 * update pipeline regression config 0727 * open es config from docker 0727	2023-07-28 17:11:19 +08:00
shuke	ef218d79da	[fix](case) add sync after stream load (#22232 ) add sync after stream load	2023-07-28 17:05:20 +08:00
amory	3d0f952934	[FIX](complex-type)delete enable_map/struct_type switch #21957	2023-07-22 15:29:32 +08:00
lihangyu	40299d280d	[Fix](json reader) fix rapidjson `array->PushBack` may take ownership… (#21988 ) With bellow json path `["$.data","$.data.datatimestamp"]` After `array_obj->PushBack` the `data` field owner will be taken from array_obj, and lead to null values for json path `$.data.datatimestamp` Rapidjson doc: ``` //! Append a GenericValue at the end of the array. \note The ownership of \c value will be transferred to this array on success. */ GenericValue& PushBack(GenericValue& value, Allocator& allocator); ```	2023-07-21 17:02:01 +08:00
lihangyu	20242d9a0e	[Improve](simdjson) put unescaped string value after parsed (#21866 ) In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB. If not unescape, then later jsonb parse will be failed	2023-07-20 10:33:17 +08:00
HHoflittlefish777	6a0a21d8b0	[regression-test](load) add streamload default value test (#21536 )	2023-07-06 10:14:13 +08:00
shuke	f77c69ab95	[fix](test) case bug, streamload without sync. (#21161 )	2023-06-28 18:22:19 +08:00
HHoflittlefish777	2beed11256	[Bug](streamload) fix inconsistent load result of be and fe (#20950 )	2023-06-21 18:12:51 +08:00
Siyang Tang	8366ce7a81	[enhancement](insert-stmt) Make `insert into tbl values();` compatible with mysql (#20694 )	2023-06-18 19:56:07 +08:00
lvshaokang	37db0145b4	[fix](load) fix mysql load parse response npe (#20699 )	2023-06-13 18:14:03 +08:00
Xujian Duan	0b228b3414	[fix](load)Support load json data with default value (#20624 ) * support json default value --------- Co-authored-by: duanxujian <duanxujian@jd.com>	2023-06-12 14:51:31 +08:00
Kaijie Chen	6c96e1dc9f	[fix](regression) add sync after streamload in test_stream_load (#20425 ) Add sync after streamload in test_stream_load to fix following error: Exception in load_p0/stream_load/test_stream_load.groovy(line 180): throw exception } log.info("Stream load result: ${result}".toString()) def json = parseJson(result) assertEquals("success", json.Status.toLowerCase()) assertEquals(1, json.NumberTotalRows) assertEquals(0, json.NumberFilteredRows) } } order_qt_sql1 " SELECT * FROM ${tableName2}" ^^^^^^^^^^^^^^^^^^^^^^^^^^ERROR LINE^^^^^^^^^^^^^^^^^^^^^^^^^^ // test common case def tableName3 = "test_all" def tableName4 = "test_less_col" def tableName5 = "test_bitmap_and_hll" def tableName6 = "test_unique_key" def tableName7 = "test_unique_key_with_delete" def tableName8 = "test_array" def tableName10 = "test_struct" sql """ DROP TABLE IF EXISTS ${tableName3} """ Exception: java.lang.IllegalStateException: Check tag 'sql1' failed: Check tag 'sql1' failed, line 1 mismatch, real line is empty, but expect is 2019 9 9 9 7.700 a 2019-09-09 1970-01-01T08:33:39 k7 9.0 9.0 sql: SELECT * FROM load_nullable_to_not_nullable	2023-06-06 08:32:25 +08:00
shuke	01770ba68a	[fix](regression-test) variable's scope returned by curl (#20347 )	2023-06-01 23:38:39 +08:00
shuke	05b7c65509	[fix](regression-test) fix multi-thread problem of regression-test #20322	2023-06-01 18:57:17 +08:00
shuke	4a682a0a46	[fix][regression-test] set timeout of curl in regression test to avoid hanged when be crashed. (#20222 ) Currently in regression-test, when a be crash, because curl does not set a timeout, suite-thread will get stuck. To solve this, encapsulate the call to be into a function, set the timeout uniformly, and avoid getting stuck	2023-06-01 11:00:09 +08:00
lvshaokang	f14e6189a9	[feature](load-refactor) Unfied mysql load use InsertStmt (#19571 )	2023-05-24 12:09:16 +08:00
Mingyu Chen	a7f3bfec89	[refactor](cluster)(step-2) remove cluster related to Backend (#19842 )	2023-05-21 09:00:35 +08:00
WenYao	481e9aebdb	[Refactor](spark load) remove parquet scanner (#19251 )	2023-05-18 19:19:13 +08:00
Zhengguo Yang	6748ae4a57	[Feature] Collect the information statistics of the query hit (#18805 ) 1. Show the query hit statistics for `baseall` ```sql MySQL [test_query_db]> show query stats from baseall; +-------+------------+-------------+ \| Field \| QueryCount \| FilterCount \| +-------+------------+-------------+ \| k0 \| 0 \| 0 \| \| k1 \| 0 \| 0 \| \| k2 \| 0 \| 0 \| \| k3 \| 0 \| 0 \| \| k4 \| 0 \| 0 \| \| k5 \| 0 \| 0 \| \| k6 \| 0 \| 0 \| \| k10 \| 0 \| 0 \| \| k11 \| 0 \| 0 \| \| k7 \| 0 \| 0 \| \| k8 \| 0 \| 0 \| \| k9 \| 0 \| 0 \| \| k12 \| 0 \| 0 \| \| k13 \| 0 \| 0 \| +-------+------------+-------------+ 14 rows in set (0.002 sec) MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall where k9 > 1 group by k0,k1,k2; +------+------+--------+-------------+ \| k0 \| k1 \| k2 \| sum(`k3`) \| +------+------+--------+-------------+ \| 0 \| 6 \| 32767 \| 3021 \| \| 1 \| 12 \| 32767 \| -2147483647 \| \| 0 \| 3 \| 1989 \| 1002 \| \| 0 \| 7 \| -32767 \| 1002 \| \| 1 \| 8 \| 255 \| 2147483647 \| \| 1 \| 9 \| 1991 \| -2147483647 \| \| 1 \| 11 \| 1989 \| 25699 \| \| 1 \| 13 \| -32767 \| 2147483647 \| \| 1 \| 14 \| 255 \| 103 \| \| 0 \| 1 \| 1989 \| 1001 \| \| 0 \| 2 \| 1986 \| 1001 \| \| 1 \| 15 \| 1992 \| 3021 \| +------+------+--------+-------------+ 12 rows in set (0.050 sec) MySQL [test_query_db]> show query stats from baseall; +-------+------------+-------------+ \| Field \| QueryCount \| FilterCount \| +-------+------------+-------------+ \| k0 \| 1 \| 0 \| \| k1 \| 1 \| 0 \| \| k2 \| 1 \| 0 \| \| k3 \| 1 \| 0 \| \| k4 \| 0 \| 0 \| \| k5 \| 0 \| 0 \| \| k6 \| 0 \| 0 \| \| k10 \| 0 \| 0 \| \| k11 \| 0 \| 0 \| \| k7 \| 0 \| 0 \| \| k8 \| 0 \| 0 \| \| k9 \| 1 \| 1 \| \| k12 \| 0 \| 0 \| \| k13 \| 0 \| 0 \| +-------+------------+-------------+ 14 rows in set (0.001 sec) ``` 2. Show the query hit statistics summary for all the mv in a table ```sql MySQL [test_query_db]> show query stats from baseall all; +-----------+------------+ \| IndexName \| QueryCount \| +-----------+------------+ \| baseall \| 1 \| +-----------+------------+ 1 row in set (0.005 sec) ``` 3. Show the query hit statistics detail info for all the mv in a table ```sql MySQL [test_query_db]> show query stats from baseall all verbose; +-----------+-------+------------+-------------+ \| IndexName \| Field \| QueryCount \| FilterCount \| +-----------+-------+------------+-------------+ \| baseall \| k0 \| 1 \| 0 \| \| \| k1 \| 1 \| 0 \| \| \| k2 \| 1 \| 0 \| \| \| k3 \| 1 \| 0 \| \| \| k4 \| 0 \| 0 \| \| \| k5 \| 0 \| 0 \| \| \| k6 \| 0 \| 0 \| \| \| k10 \| 0 \| 0 \| \| \| k11 \| 0 \| 0 \| \| \| k7 \| 0 \| 0 \| \| \| k8 \| 0 \| 0 \| \| \| k9 \| 1 \| 1 \| \| \| k12 \| 0 \| 0 \| \| \| k13 \| 0 \| 0 \| +-----------+-------+------------+-------------+ 14 rows in set (0.017 sec) ``` 4. Show the query hit for a database ```sql MySQL [test_query_db]> show query stats for test_query_db; +----------------------------+------------+ \| TableName \| QueryCount \| +----------------------------+------------+ \| compaction_tbl \| 0 \| \| bigtable \| 0 \| \| empty \| 0 \| \| tempbaseall \| 0 \| \| test \| 0 \| \| test_data_type \| 0 \| \| test_string_function_field \| 0 \| \| baseall \| 1 \| \| nullable \| 0 \| +----------------------------+------------+ 9 rows in set (0.005 sec) ``` 5. Show query hit statistics for all the databases ```sql MySQL [(none)]> show query stats; +-----------------+------------+ \| Database \| QueryCount \| +-----------------+------------+ \| test_query_db \| 1 \| +-----------------+------------+ 1 rows in set (0.005 sec) ```	2023-05-15 10:56:34 +08:00
zhangdong	b129c9901b	[improvement](FQDN)Change the implementation of fqdn (#19123 ) Main changes: 1. If fqdn is enabled in the configuration file, when fe starts, localAddr will obtain fqdn instead of IP, priority_ Networks will fail 2. The IP and host names of Backend and Front are combined into one field, host. When fqdn is enabled, it represents the host name, and when not enabled, it represents the IP address 3. The communication between clusters directly uses fqdn, and various Connection pool add authentication mechanisms to prevent the IP address of the domain name from changing and the connection between nodes from making errors 4. No longer requires polling to verify if the IP has changed, delete fqdnManager 5. Change the method of verifying the legitimacy of nodes between FEs from obtaining client IP to displaying the identity of the transmitting node itself in the HTTP request header or the message body of the throttle 6. When processing the heartbeat, if BE finds that the host stored by itself is inconsistent with the host stored by the master, after verifying the legitimacy of the host, it will change its own host instead of directly reporting an error 7. Simplify the generation logic of fe name Scope of influence: 1. Establishing communication connections between clusters 2. Determine whether it is the same node through attributes such as IP 3. Print Log 4. Information display 5. Address Splicing 6. k8s deployment 7. Upgrade compatibility Test plan: 1. Change the IP address of the node, while keeping the fqdn unchanged, change the IP addresses of fe and be, and verify whether the cluster can read and write data normally 2. Use the master code to generate metadata, and use the previous metadata on the current pr to verify whether it is compatible with the old version (upgrading is no longer supported if fqdn has been enabled before) 3. Deploy fe and be clusters using k8s to verify whether the cluster can read and write data normally 4. According to https://doris.apache.org/zh-CN/docs/dev/admin-manual/cluster-management/fqdn?_highlight=fqdn#%E6%97%A7%E9%9B%86%E7%BE%A4%E5%90%AF%E7%94%A8fqdn Upgrading old clusters 5. Use streamload to specify the fqdn of fe and be to import data separately 6. Use different users to start transactions and write data using insert statements	2023-05-11 00:44:48 +08:00

1 2 3

120 Commits