With thirdparties 1.4.0 to 1.4.1
1. Add patch for aws-c-cal-0.4.5
2. Add some solutions for `undefined reference libpsl`
3. Move libgsasl to fix link problme of libcurl.
4. Downgrade openssl to 1.0.2k to fix problem of low version glibc
some user do not know Doris support type boolean, they use TINYINT,
so i add type BOOLEAN when enter 'help create table' in mysql client.
currently, type BOOLEAN size is 1 byte, but the value of boolean column only in {0,1} ,
which waste some memory, and i want change it's implement to 1 bit in the future.
When the memory usage of BE reaches mem_limit, printing ReservationTrackerCounters through MemTracker
may cause BE crash in high concurrency.
ReservationTrackerCounters is not actually used in the current Doris, and the memory tracker in Doris
will be redesigned in the future.
1. Refactor the create method of hdfs reader & writer.
libhdfs3 does not support arm64. So we should not support hdfs reader & writer on arm64.
2. And micro for LowerUpperImpl
```
EXPORT TABLE xxx
...
PROPERTIES
(
"label" = "mylabel",
...
);
```
And than user can use label to get the info by SHOW EXPORT stmt:
```
show export from db where label="mylabel";
```
For compatibility, if not specified, a random label will be used. And for history jobs, the label will be "export_job_id";
Not like LOAD stmt, here we specify label in `properties` because this will not cause grammatical conflicts,
and there is no need to modify the meta version of the metadata.
1 Avoid sending tasks if there is no data to be consumed
By fetching latest offset of partition before sending tasks.(Fix [Optimize] Avoid too many abort task in routine load job #6803 )
2 Add a preCheckNeedSchedule phase in update() of routine load.
To avoid taking write lock of job for long time when getting all kafka partitions from kafka server.
3 Upgrade librdkafka's version to 1.7.0 to fix a bug of "Local: Unknown partition"
See offsetsForTimes fails with 'Local: Unknown partition' edenhill/librdkafka#3295
4 Avoid unnecessary storage migration task if there is no that storage medium on BE.
Fix [Bug] Too many unnecessary storage migration tasks #6804
* [Feature] Support lateral view
The syntax:
```
select k1, e1 from test lateral view explode_split(k1, ",") tmp as e1;
```
```explode_split``` is a special function of doris,
which is used to separate the string column according to the specified split string,
and then convert the row to column.
This is a conforming function of string separation + table function,
and its behavior is equivalent to explode in hive ```explode(split(string, string))```
The implement:
A tablefunction operator is added to the implementation to handle the syntax of the lateral view separately.
The query plan is following:
```
MySQL [test]> explain select k1, e1 from test_explode lateral view explode_split (k2, ",") tmp as e1;
+---------------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:`k1` | `e1` |
| |
| RESULT SINK |
| |
| 1:TABLE FUNCTION NODE |
| | table function: explode_split(`k2`, ',') |
| | |
| 0:OlapScanNode |
| TABLE: test_explode |
+---------------------------------------------------------------------------+
```
* Add ut
* Add multi table function node
* Add session variables 'enable_lateral_view'
* Fix ut
I tested hex in a 1000w times for loop with random numbers,
old hex avg time cost is 4.92 s,optimize hex avg time cost is 0.46 s which faster nearly 10x.
This reverts part of commit 11ec38dd6fd9f86632d83c47bd9d8bc05db69a2b(#6736)
Because it will cause view query problem described in #6792
The following bug fix kept:
1. Fix the problem that the WITH statement cannot be printed when UNION is included in SQL
1. Replace std::max with a ternary expression, std::max is much heavier than the ternary operator
2. Replace std::set with arrays, std::set is based on red-black trees, traversal will follow the chain domain, and cache hits are not good
3. Optimize the serialize function, improve the calculation speed of num_non_zero_registers by reducing branches, and the serialization of _registers after optimization is faster
4. The test found that the performance improvement is more obvious
When creating sync jobs, we should ban that different jobs can connect to the same canal instance,
or else these jobs will compete with each other for the data produced by the same canal instance,
which may cause data inconsistency.
while query with multi where conditions, such as `where dt in (20210926,20210919) and hour<=13`,
will cause int * int product overflow result. and then in the function extend_scan_key will call
`range.convert_to_fixed_value()` mistakenly. And for a big `range[_low_value, _high_value)`,
mass value will be inserted into _fixed_values, result in oom finally.
Remove part of dynamic_cast, reduce the overhead caused by type conversion,
and probably reduce the cpu consumption of parquet file import by about 10%
Fixed#6726
If the plan fragment contains colocated agg plan node, it will be a colocated fragment.
The scan range and backend id of colocated fragment instance should be different from ordinary scheduler logic.
Tablets in the same bucket must fall on the same be.
For example, for the same bucket in different partitions,
even though the tablet id is different, they must be scheduled to the same be for scan node.