Commit Graph

13721 Commits

Author SHA1 Message Date
27f5b623e6 [Chore](docs)Add SSL Faq (#22956) 2023-08-15 09:49:39 +08:00
b49dc8042d [feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539)
## Proposed changes

Refactor thoughts: close #22383
Descriptions about `enclose` and `escape`: #22385

## Further comments

2023-08-09: 
It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic.

Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph.
 
Trimming escape will be enable after fix: #22411 is merged

Cases should be discussed: 

1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large?
2. What if an infinite line occurs in the case? Essentially,  `1.` is equivalent to this.  

Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.
2023-08-15 09:23:53 +08:00
7bc98748cf [fix](datastream sender) fix wrong result of broadcast join; fix wrong result of pipeline (#22942)
Fix bug of #22765
Close #22924
2023-08-14 18:59:19 +08:00
hzq
ad8a8203a2 [fix](mysql compatibility) add an internal database mysql to improve mysql compatibility (#22868) 2023-08-14 17:03:11 +08:00
45481f5fe2 [optimize](Nereids): optimize Nereids performance (#22885) 2023-08-14 15:21:29 +08:00
8f471a3a1f [fix](Nereids) push agg to meta scan is not work well (#22811) 2023-08-14 14:35:21 +08:00
fa6110accd [fix](catalog)paimon support more data type (#22899) 2023-08-14 13:48:33 +08:00
Pxl
d371101bfd [Improvement](aggregation) make fixed hashmap's bitmap_size flexable (#22573)
make fixed hashmap's bitmap_size flexable
2023-08-14 10:47:06 +08:00
29fbe749cd [refactor](load) split rowset builder out of delta writer (#22805) 2023-08-14 10:32:58 +08:00
c67d1cc805 [docs](releasenote)2.0.0 release note (#22904) 2023-08-14 10:11:03 +08:00
e2b06cd0cf [opt](docs) Optimize docs to avoid user set wrong replication_allocation (#22767) 2023-08-14 09:38:22 +08:00
48037622cd [Bug](pipeline) fix pipeline jdbc coredump in regression test (#22892)
Issue Number: Bug fix pipeline jdbc coredump in regression test
2023-08-13 22:56:09 +08:00
Pxl
49d503911e [MV](exec) disable create mv with select star (#22895) 2023-08-13 19:28:51 +08:00
abc9de07b3 [Bug](pipeline) make sure sink is not blocked before try close (#22765)
make sure sink is not blocked before try close
2023-08-13 13:20:48 +08:00
bddab94121 [Enhancement](partial update) Support including delete sign column in partial update stream load (#22874) 2023-08-13 10:32:21 +08:00
395840cbbb [Chore](refactor) Split IndexChannel from vtablet_sink.h into vtablet_sink.cc (#22848)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-13 10:21:12 +08:00
bff3b90263 [fix](tablet clone) fix tablet sched failed when tablet missing tag and version incomplete (#22861) 2023-08-13 10:18:01 +08:00
79a61ced42 [docs](load) fix indentation in stream load manual (#22807) 2023-08-13 10:16:11 +08:00
23add67d14 [fix](load) fix core at memtable writer mem_consumption (#22914) 2023-08-13 10:10:28 +08:00
41ff48f838 [regresstion][external]fix case test_show_where and es_query 0811 (#22898) 2023-08-12 19:41:55 +08:00
1f8cb3f54a [Chore](doc) Fix doc zh-CN typo (#22903)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-12 16:14:06 +08:00
23094a01d4 [fix](test) load data inpath will remove the data in hdfs (#22908)
Load data from hdfs in hive will move the source directory into table's location directory, leading the error like Can not get first file, please check uri in tvf test.
2023-08-12 15:12:00 +08:00
4e880288c6 [refactor]use clear concept to replace std::enable_if_t (#22801)
---------

Signed-off-by: flynn <fenglv15@mails.ucas.ac.cn>
2023-08-12 15:10:30 +08:00
2b81553879 [doc](docs) Add some docs of baidu cloud bos (#22833)
* [doc](docs) Add some docs of baidu cloud bos

* fix
2023-08-12 07:09:57 +08:00
5e2748d2b4 [Improve](complex-type)update orc reader for complex type and add regress tests (#22856) 2023-08-12 07:06:12 +08:00
04272f398d [docs](release note) Update README.md (#22900) 2023-08-11 22:33:32 +08:00
5b09254fac [improvement](external statistics)Fix external stats collection bugs (#22788)
1. Collect external table row count when execute analyze database.
2. Support show cached table stats (row count)
3. Support alter external table column stats.
4. Refresh/Invalidate table row count stat memory cache when analyze task finished and drop table stats.
2023-08-11 21:58:24 +08:00
cd6453434b [Enhancement](merge-on-write) add correctness check for the calculation of delete bitmap (#22282)
Currently, for merge-on-write unique table, the delete bitmap of a rowset will be calculated during flush phase, commit phase and publish phase. In this PR, we add a special mark in every rowset considered when we calculate delete bitmap in these three phases. Before we finally merge the delete bitmap to the table meta's delete bitmap, we will check if all the rowsets contain the special mark to check if we have considered all the rowsets during the above three phases.
Because the executor can not fail in publish phase if the coordinator have received successful commits info from all the executors, we just print logs if this correctness check failed rather than report a failure.
2023-08-11 21:12:35 +08:00
44475b64ef [fix](pg test) fix postgresql jdbc catalog test case (#22875) 2023-08-11 20:50:47 +08:00
28561f77e9 [fix](regression)fix test_hdfs_tvf regression_test out file : decimalv3 -> decimal (#22852) 2023-08-11 20:44:18 +08:00
84ee814bc3 [docs](docs) Update invalid pics of release note 1.1.0 and 2.0-beta (#22804) 2023-08-11 20:08:21 +08:00
130c47e669 [Fix](Nereids)add need forward for enable_nereids_dml and format some cases (#22888) 2023-08-11 19:35:29 +08:00
045843991a [Fix](Nereids) fix insert into table of random distribution for nereids (#22831)
currently insert into a table of random distribution info is not supported, we fix it by set physical properties to Any.
2023-08-11 19:26:39 +08:00
a2fd488438 [chore](Nereids): polish StatsCalculatorTest (#22884) 2023-08-11 18:08:18 +08:00
7ac4df67ab [Fix](regression)Fix test_mysql_jdbc_catalog_nereids p2 test case (#22870)
Fix test_mysql_jdbc_catalog_nereids p2 test case.
2023-08-11 17:57:48 +08:00
a089fe3e43 [Improve](jni-avro)Reduce the volume of the avro-scanner-jar package (#22276)
The avro-scanner-jar package is reduced from 204M to 160M.

Hadoop-related dependencies in the original avro pom are directly packaged into a jar package, resulting in a jar volume of 200M. Now since there is already a hadoop jar package environment in be lib, it can be directly referenced.
2023-08-11 17:26:14 +08:00
db69457576 [fix](avro)Fix S3 TVF avro format reading failure (#22199)
This pr fixes two issues:

1. when using s3 TVF to query files in AVRO format, due to the change of `TFileType`, the originally queried `FILE_S3 ` becomes `FILE_LOCAL`, causing the query failed.
2. currently, both parameters `s3.virtual.key` and `s3.virtual.bucket` are removed. A new `S3Utils`  in jni-avro to parse the bucket and key of s3.
The purpose of doing this operation is mainly to unify the parameters of s3.
2023-08-11 17:22:48 +08:00
72e264dd59 [fix](executor)fix error when FixedContainer with null (#22850) 2023-08-11 17:20:50 +08:00
4f7c6aa27f [fix](case) update nereids_delete_using to pass run without load data (#22853)
Co-authored-by: stephen <hello-stephen@qq.com>
2023-08-11 17:16:29 +08:00
3e169511e3 [test](jdbc_mysql)update test_jdbc_query_mysql regression test result #22866 2023-08-11 17:15:14 +08:00
548226acfc [fix](planner)shouldn't change the child type to assignmentCompatibleType if it's INVALID_TYPE (#22841)
if changing the child type to INVALID_TYPE, the later getBuiltinFunction call will fail
2023-08-11 17:14:49 +08:00
bcac160013 [fix](broadcast shuffle) fix wrong result of broadcast shuffle (#22847)
When data stream sender is doing broadcast shuffle, it accumulate to batch size and then send blocks to destinations, but for local receivers, it ONLY send the current block, which will cause data loss.

This issue is introduced by #22218.

If #22218 is pick to 2.0 branch, then also need to pick this PR.
2023-08-11 17:01:11 +08:00
3e9ba632d7 [typo](docs) Add a guide to using SQL for the jdbc catalog (#22880) 2023-08-11 16:28:42 +08:00
0c38f42827 [fix](doc) Remove introduction to unstable features (#22832)
1. Remove introduction to unstable features
2. Rename some sub-titles to avoid mixed use of chiniese and english
2023-08-11 15:59:16 +08:00
f88f021e52 [fix](bug) Fix BE thread safe start and stop #22560 2023-08-11 15:34:10 +08:00
8c3b95c523 [Fix](multi-catalog) sync default catalog when forwarding query to master. (#22684)
Assume that there is a hive catalog named hive_ctl, a hive db named db1 and a table named tbl1, if we connect a slave FE and execute following commands:

1. `switch hive_ctl`
2. `show partitions from db1.tbl1`

Then we will meet the error like this:
```
MySQL [(none)]> show partitions from db1.tbl1;
ERROR 1049 (42000): errCode = 2, detailMessage = Unknown database 'default_cluster:db1'
```

The reason is that the slave FE  will forward the `ShowPartitionStmt` to master FE but we do not sync the default catalog information, so the parser can not find the db and throws this exception. This is just one case, some other simillar cases will failed too.
2023-08-11 14:59:04 +08:00
72837a3ab4 [enhancement](Nereids): Plan equals() hashcode() don't need LogicalProprties (#22774)
- deepEquals don't need to compare LogicalProperties
- Plan equals() hashcode() don't need logicalProperty
2023-08-11 14:53:47 +08:00
209f36f1bf [fix](multi-catalog)fix jdbc loader (#22814) 2023-08-11 14:36:19 +08:00
94a7b44540 [Improvement](log) add config to controll compression of fe log & fe audit log (#22865)
fe log is large for a busy doris cluster, if you want to preserve some historical logs, it cost too much disk space.
enable compression is a good way to save space.
and a gzip compressed text file can be viewed without decompression.
2023-08-11 14:08:08 +08:00
f2075d0a81 [Fix](multi-catalog) Fix decimal precision issue in regression test result. (#22819)
Fix decimal precision issue in regression test result.
2023-08-11 13:49:30 +08:00