type check could not work because no expression in plan.
sink and scan have no expression at all. so cannot check type.
this pr add expression on logical sink to let type check work well
## Proposed changes
Refactor thoughts: close#22383
Descriptions about `enclose` and `escape`: #22385
## Further comments
2023-08-09:
It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic.
Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph.
Trimming escape will be enable after fix: #22411 is merged
Cases should be discussed:
1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large?
2. What if an infinite line occurs in the case? Essentially, `1.` is equivalent to this.
Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.
1. Collect external table row count when execute analyze database.
2. Support show cached table stats (row count)
3. Support alter external table column stats.
4. Refresh/Invalidate table row count stat memory cache when analyze task finished and drop table stats.
The avro-scanner-jar package is reduced from 204M to 160M.
Hadoop-related dependencies in the original avro pom are directly packaged into a jar package, resulting in a jar volume of 200M. Now since there is already a hadoop jar package environment in be lib, it can be directly referenced.
This pr fixes two issues:
1. when using s3 TVF to query files in AVRO format, due to the change of `TFileType`, the originally queried `FILE_S3 ` becomes `FILE_LOCAL`, causing the query failed.
2. currently, both parameters `s3.virtual.key` and `s3.virtual.bucket` are removed. A new `S3Utils` in jni-avro to parse the bucket and key of s3.
The purpose of doing this operation is mainly to unify the parameters of s3.
Assume that there is a hive catalog named hive_ctl, a hive db named db1 and a table named tbl1, if we connect a slave FE and execute following commands:
1. `switch hive_ctl`
2. `show partitions from db1.tbl1`
Then we will meet the error like this:
```
MySQL [(none)]> show partitions from db1.tbl1;
ERROR 1049 (42000): errCode = 2, detailMessage = Unknown database 'default_cluster:db1'
```
The reason is that the slave FE will forward the `ShowPartitionStmt` to master FE but we do not sync the default catalog information, so the parser can not find the db and throws this exception. This is just one case, some other simillar cases will failed too.
fe log is large for a busy doris cluster, if you want to preserve some historical logs, it cost too much disk space.
enable compression is a good way to save space.
and a gzip compressed text file can be viewed without decompression.
Upgrade guava to 32.1.2-jre
Set ck dependency scope to provided
Upgrade okio to 3.4.0
Upgrade snake yaml to 1.33
Upgrade aws-java-sdk to 1.12.519
Upgrade hadoop to 3.3.6
1. If derived from a origin column, eg: `create table tbl1 as select col1 from tbl2`, the length will be same os the origin column.
2. If derived from a function, eg: `create table tbl1 as select func(col1) from tbl2`, the length will be 65533.
3. If derived from a constant value, eg: `create table tbl1 as select "abc" from tbl2`, the length will be 65533.
Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema by session var `truncate_char_or_varchar_columns`.
This function can be used to replace bitmap_union(to_bitmap(expr)), because bitmap_union(to_bitmap(expr)) need create many many small bitmaps firstly and then merge them into a single bitmap.
bitmap_agg will convert the column value into a bitmap directly. Its performance is better than bitmap_union(to_bitmap(expr)) . In our test , there is about 30% improvement.
Implement the TransientTaskRegister to support submitting transient tasks which do not require a timer trigger.
rename some class:
TimerTaskDisruptor -> TaskDisruptor
TimerTaskEvent -> TaskEvent
TimerTaskExpirationHandler -> TaskHandler
AsyncJobManager -> TimerJobManager
MemoryTask -> TransientTask
make PlanNode.getNumInstance() abstract to force every PlanNode specify how to define its numInstance.
By default, PlanNode.numInstance is 1. PlanNode except exchangeNode should not use this default value directly. PlanNode.numInstance is used for PlanNode which will change numInstance like exchange node.
Execute Sql
delete from test_table.
2023-08-09 11:51:46,586 WARN (mysql-nio-pool-7|540) [StmtExecutor.analyze():987] Analyze failed. stmt[25, 519f916eeb94a8b-afe8e1094fb39fc1]
java.lang.NullPointerException: null
at org.apache.doris.rewrite.ExprRewriter.applyRuleBottomUp(ExprRewriter.java:236) ~[classes/:?]
at org.apache.doris.rewrite.ExprRewriter.applyRule(ExprRewriter.java:226) ~[classes/:?]
at org.apache.doris.rewrite.ExprRewriter.applyRuleRepeatedly(ExprRewriter.java:216) ~[classes/:?]
at org.apache.doris.rewrite.ExprRewriter.rewrite(ExprRewriter.java:166) ~[classes/:?]
at org.apache.doris.rewrite.ExprRewriter.rewrite(ExprRewriter.java:151) ~[classes/:?]
at org.apache.doris.analysis.DeleteStmt.analyze(DeleteStmt.java:127) ~[classes/:?]
at org.apache.doris.qe.StmtExecutor.analyze(StmtExecutor.java:983) ~[classes/:?]
at org.apache.doris.qe.StmtExecutor.executeByLegacy(StmtExecutor.java:660) ~[classes/:?]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:448) ~[classes/:?]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:419) ~[classes/:?]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:441) ~[classes/:?]
at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:589) ~[classes/:?]
at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:826) ~[classes/:?]
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[classes/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Fix Result
[HY000][1105] errCode = 2, detailMessage = Where clause is not set
Affected version
2.0-Alpha +