When the user imports data, there are some special characters in the data, which will cause the import to fail
The following error message appears:
2023-07-28 15:15:28.960 INFO 21756 --- [-interval-flush] c.a.d.p.w.d.DorisWriterManager : Doris interval Sinking triggered: label[datax_doris_writer_7aa415e6-5a9c-4070-a699-70b4a627ae64].
2023-07-28 15:15:29.015 INFO 21756 --- [ Thread-3] c.a.d.p.w.d.DorisStreamLoadObserver : Start to join batch data: rows[95968] bytes[3815834] label[datax_doris_writer_7aa415e6-5a9c-4070-a699-70b4a627ae64].
2023-07-28 15:15:29.038 INFO 21756 --- [ Thread-3] c.a.d.p.w.d.DorisStreamLoadObserver : Executing stream load to: 'http://10.38.60.218:8030/api/ods_prod/ods_pexweb_online_product/_stream_load', size: '3911802'
2023-07-28 15:15:31.559 WARN 21756 --- [ Thread-3] c.a.d.p.w.d.DorisStreamLoadObserver : Request failed with code:500
2023-07-28 15:15:31.561 INFO 21756 --- [ Thread-3] c.a.d.p.w.d.DorisStreamLoadObserver : StreamLoad response :null
2023-07-28 15:15:31.564 WARN 21756 --- [ Thread-3] c.a.d.p.w.d.DorisWriterManager : Failed to flush batch data to Doris, retry times = 0
java.io.IOException: Unable to flush data to Doris: unknown result status.
at com.alibaba.datax.plugin.writer.doriswriter.DorisStreamLoadObserver.streamLoad(DorisStreamLoadObserver.java:66) ~[doriswriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.doriswriter.DorisWriterManager.asyncFlush(DorisWriterManager.java:163) [doriswriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.doriswriter.DorisWriterManager.access$000(DorisWriterManager.java:19) [doriswriter-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.writer.doriswriter.DorisWriterManager$1.run(DorisWriterManager.java:134) [doriswriter-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_221]
在fe.log日志中发现下面的错误信息:
ava.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: " l"
at java.net.URLDecoder.decode(URLDecoder.java:194) ~[?:1.8.0_221]
at org.springframework.http.converter.FormHttpMessageConverter.read(FormHttpMessageConverter.java:352) ~[spring-web-5.3.22.jar:5.3.22]
at org.springframework.web.filter.FormContentFilter.parseIfNecessary(FormContentFilter.java:109) ~[spring-web-5.3.22.jar:5.3.22]
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:88) ~[spring-web-5.3.22.jar:5.3.22]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117) ~[spring-web-5.3.22.jar:5.3.22]
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) ~[spring-web-5.3.22.jar:5.3.22]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117) ~[spring-web-5.3.22.jar:5.3.22]
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600) ~[jetty-security-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandle
Cache the iceberg table. When accessing the same table, the metadata will only be loaded once.
Cache the snapshot of the table to optimize the performance of the iceberg table function.
Add cache support for iceberg's manifest file content
a simple test from 2.0s to 0.8s
before
mysql> refresh table tb3;
Query OK, 0 rows affected (0.03 sec)
mysql> select * from tb3;
+------+------+------+
| id | par | data |
+------+------+------+
| 1 | a | a |
| 2 | a | b |
| 3 | a | c |
....
| 68 | a | a |
| 69 | a | b |
| 70 | a | c |
+------+------+------+
70 rows in set (2.10 sec)
mysql> select * from tb3;
+------+------+------+
| id | par | data |
+------+------+------+
| 1 | a | a |
| 2 | a | b |
| 3 | a | c |
...
| 68 | a | a |
| 69 | a | b |
| 70 | a | c |
+------+------+------+
70 rows in set (2.00 sec)
after
mysql> refresh table tb3;
Query OK, 0 rows affected (0.03 sec)
mysql> select * from tb3;
+------+------+------+
| id | par | data |
+------+------+------+
| 1 | a | a |
| 2 | a | b |
...
| 68 | a | a |
| 69 | a | b |
| 70 | a | c |
+------+------+------+
70 rows in set (2.05 sec)
mysql> select * from tb3;
+------+------+------+
| id | par | data |
+------+------+------+
| 1 | a | a |
| 2 | a | b |
| 3 | a | c |
...
| 68 | a | a |
| 69 | a | b |
| 70 | a | c |
+------+------+------+
70 rows in set (0.80 sec)
When adding be, it is required to have only one colon, otherwise an error will be reported. However, ipv6 has many colons
```
String[] pair = hostPort.split(":");
if (pair.length != 2) {
throw new AnalysisException("Invalid host port: " + hostPort);
}
```
fragment executor's destruct method will call close, it depends on query context's object pool, because many object is put in query context's object pool such as runtime filter.
It should be deleted before query context. Or there will be heap use after free error.
It is fixed in #17675, but Do not know why not in master. So 1.2-lts does not have this problem.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .
1. 4kfiles , 60kwline num
before: 1 min 37.70 sec
after: 50.18 sec
2. 50files , 60kwline num
before: 1.12 sec
after: 0.82 sec
Enhance broadcast join cost calculation, by considering both the build side effort from building bigger hash table, and more probe side effort from bigger cost of ProbeWhenBuildSideOutput and ProbeWhenSearchHashTable, if parallel_fragment_exec_instance_num is more than 1.
Current solution gives a penalty factor on rightRowCount, and the factor is the total instance number to the power of 2.
Penalty on outputRows is not taken currently and will be refined in next generation cost model.
Also brings some update for shape checking:
update original control variable in shape file parallel_fragment_exec_instance_num to parallel_pipeline_task_num, if pipeline is enabled.
fix a be_number variable inactive issue.
consider sql:
```
SELECT *
FROM sub_query_correlated_subquery1 t1
WHERE coalesce(bitand(
cast(
(SELECT sum(k1)
FROM sub_query_correlated_subquery3 ) AS int),
cast(t1.k1 AS int)),
coalesce(t1.k1, t1.k2)) is NULL
ORDER BY t1.k1, t1.k2;
```
is Null conjunct is lost in SubqueryToApply rule. This pr fix it
After auto retry merged, it's hard to determine the execute times of doExecute method in compile time, and if the expected execute times in the expectation block is missed, unexpected invocation exception would be thrown, so just remove the expected execute times
select c_name from customer union select c_name from customer
this sql used agg node to get distinct row of c_name,
so it's no need to wait for inserted all data to hash map,
could output the data which it's inserted into hash map successed.