Commit Graph

7341 Commits

Author SHA1 Message Date
45fa2fc56b [fix](multi catalog)Use -1 as external es table column id instead of uniq id (#14557)
Using cache to store external table columns, doesn't persist uniq id for external columns anymore.
So use -1 as column id for ES external table.
Avoid non-master FE trying to get uniq id problem. The problem will cause non-master FE fail to write bdbje.
2022-11-25 16:13:16 +08:00
9630257704 [fix](Nereids): fix bugs in random construct join plan (#14575) 2022-11-25 16:05:29 +08:00
4728e75079 [feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340)
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
2022-11-25 15:22:44 +08:00
d5777bb1e9 [enhancement](outfile) add retry for broker pwrite #14556
Problem:
We got following error frequently while SELECT xxx INTO OUTFILE:
ERROR 1064 (HY000): RpcException, msg: Fail to write to broker, broker:TNetworkAddress(hostname=a.b.c.d, port=8111) failed:write() send(): Broken pipe

Reason:

we cache broker thrift client in BE;
thrift client check connect isOpen only return cached flag, not care the real socket is opened or closed;
after we get client from cache, the socket may already closed, then pwrite will failed.
How to fix:
Other interfaces such as open and close, will reopen and retry again, but pwrite do not retry.
As there are write offset inside pwrite, and the broker(server) side also will check the write offset, it is safe to retry pwrite.
2022-11-25 14:20:33 +08:00
7ba4cd764a [enhancement](array-function) array_position,array_contains,countequal which in FunctionArrayIndex handle target NULL (#14564)
in the previous, the result is:
```
mysql> select array_position([1, null], null);
+--------------------------------------+
| array_position(ARRAY(1, NULL), NULL) |
+--------------------------------------+
|                                 NULL |
+--------------------------------------+
1 row in set (0.02 sec)
```

but after this commit, the result become:
```
mysql> select array_position([1, null], null);
+--------------------------------------+
| array_position(ARRAY(1, NULL), NULL) |
+--------------------------------------+
|                                    2 |
+--------------------------------------+
1 row in set (0.02 sec)
```
2022-11-25 14:19:50 +08:00
5efdcb9ed0 [improvement](storage) For debugging problem: add session variable (#14576) 2022-11-25 14:16:00 +08:00
d5d356b17f [vectorized](function) support order by field function (#14528)
* [vectorized](function) support order by field function

* update

* update test
2022-11-25 14:00:46 +08:00
25de068a05 [fix](parquet-reader) the value of null map will overflow when LazyRead merges too many empty batches (#14558)
The run length of null map is saved as `uint16_t`. Previously, the run length of null map was
limited by `batch_size` in the `ParquetReader`, by setting `batch_size = std::min(batch_size, (size_t)USHRT_MAX)`.
It works well when the batch size is less than `USHRT_MAX`.
However, [Lazy read](https://github.com/apache/doris/pull/13917) will merge empty batches until reading
a non-empty batch or reaching the EOF of a row group, so the `batch_size` may be greater than `USHRT_MAX`
in non-predicate columns.
In addition, even if the `batch_size` does not exceed `USHRT_MAX`, the adjacent batches may also make
the run  length exceed the `USHRT_MAX` in `ColumnSelectVector::get_next_run`.
2022-11-25 12:22:18 +08:00
f68fa442cd [Bug](regression-test) Fix regression aggregate failed muti distinct (#14563)
Fix regression aggregate failed muti distinct
2022-11-25 10:58:10 +08:00
deef491e01 [fix](Nereids) refactor CTE and EliminateAliasNode and fix the bug that CTE reuse relationId (#14534)
This pr contribute:
- support explain CTE;
- refine CTE, fix the bug: reuse the same analyzed plan which LogicalOlapScan has the same relationId;
- change EliminateAliasNode to LogicalSubQueryAliasToLogicalProject and move to the top of rewrite stage, so we can simply observe the analyzed plan by the LogicalSubQueryAlias with alias;
- job traverse left child first, so the ExprId growth from left child to right child.
2022-11-25 10:54:53 +08:00
225e4981ed [feature](selectdb-cloud) Fix leak in VCollectorIterator (#962) (#14549)
`VCollectIterator::build_heap()` leaks memory when there is a `VCollectIterator::LevelIterator::init()` fails.
2022-11-25 10:25:24 +08:00
5ccc875824 [fix](recycle) refactor the logic of erase meta with same name (#14551)
in #14482, we implement the feature to keep specific number of meta with same name in catalog recycle bin.
But it will cause meta replay bug.
Because every time we drop db/table/partition, it will try to erase a certain number of meta with same name.
And when replay "drop" edit log, it will do same thing. But the number of meta to erase it based on current config value,
not persist in edit log, so it will cause inconsistency with "drop" and "replay drop".

In this PR, I move the "erase meta with same name" logic to the daemon thread of catalog recycle bin.
2022-11-25 09:47:24 +08:00
d12112b930 [fix](fe) Fix mem leaks (#14570)
1. Fix memory leaks in StmtExecutor::executeInternalQuery
2. Limit the number of concurrent running load task for statistics cache
2022-11-25 09:16:54 +08:00
0ae246a93b [chore](github) Optimize BE UT workflows (#14565)
In #14533 , we run BE UT workflows periodically to share the cache with brand new pull requests. However, we don't need to save the cache when the unit tests doesn't run, otherwise it may occupy huge cache space and some useful caches will be evicted by GitHub.
2022-11-25 07:52:03 +08:00
9103ded1dd [improvement](join)optimize sharing hash table for broadcast join (#14371)
This PR is to make sharing hash table for broadcast more robust:

Add a session variable to enable/disable this function.
Do not block the hash join node's close function.
Use shared pointer to share hash table and runtime filter in broadcast join nodes.
The Hash join node that doesn't need to build the hash table will close the right child without reading any data(the child will close the corresponding sender).
2022-11-24 21:06:44 +08:00
bc699511d0 [Fix](array-function) fix array_distinct null values (#14544)
in the previous the result is:
```
mysql> select array_distinct([1,1,3,3,null, null, null]);
+-----------------------------------------------------+
| array_distinct(ARRAY(1, 1, 3, 3, NULL, NULL, NULL)) |
+-----------------------------------------------------+
| [1, 3, NULL, NULL, NULL]                            |
+-----------------------------------------------------+
1 row in set (0.00 sec)
```

after this fix, the result becomes:
```
mysql> select array_distinct([1,1,3,3,null, null, null]);
+-----------------------------------------------------+
| array_distinct(ARRAY(1, 1, 3, 3, NULL, NULL, NULL)) |
+-----------------------------------------------------+
| [1, 3, NULL]                                        |
+-----------------------------------------------------+
1 row in set (0.00 sec)
```
2022-11-24 19:07:28 +08:00
ac46922433 [fix](ut) Fix failures for BE UT macOS (#14543) 2022-11-24 17:39:37 +08:00
0c4830600d test(grouping sets) add regression test case for grouping sets (#14539)
Co-authored-by: tongyang.hty <hantongyang@douyu.tv>
2022-11-24 17:38:12 +08:00
59b31a03c4 [Improvement](agg function) support group_bit_and/group_bit_or/group_bit_xor functions (#14386) 2022-11-24 16:46:42 +08:00
608cb6c4ad [test](jdbc)add new case for mysql external table (#14530) 2022-11-24 16:36:44 +08:00
b4d8ae5204 [test](jdbc)add new pg case from other source (#14445) 2022-11-24 16:35:59 +08:00
a04e1b49ec [feature](Nereids) Implement group by grouping sets, cube and rollup (#14496)
Issue Number: close #13615

The main work:

implement grouping sets/ cube/ rollup.
fix if function Infinite loop problem.
Support for isNull transitions to legacy optimizers.
2022-11-24 16:34:31 +08:00
0680b3b4d5 [opt](nereids) adjust nereids related regression test cases (#14439)
1. in dateV2, we adjust the dir structure to avoid creating a tpch-1G database
2. use `drop table XXX`  to replace `delete * from XXX where key>0`
3. remove explain cases, because 
- the explain string itself is variable, and the case is hard to maintain
- it is original planner explain, not nereids
2022-11-24 16:02:52 +08:00
fde474609e [feature](Nereids) Add dphyp job (#14485) 2022-11-24 15:50:05 +08:00
8afe298a0f [Fix](function) fix function retention lost ARRAY's element type … (#14538) 2022-11-24 15:19:50 +08:00
2389a90cd0 [enhancement](snapshot) add missed version log when make_snapshot in engine clone task (#14284) 2022-11-24 14:51:28 +08:00
7f4cc61286 [fix](cast)prevent be from crashing when cast function is not available (#14540)
* [fix](cast)prevent be from crashing when cast function is not available

* format code
2022-11-24 14:17:49 +08:00
6c7f758ef7 [improvement](hashjoin) support partitioned hash table in hash join (#14480) 2022-11-24 14:16:47 +08:00
e656dae3f0 [fix](fe) fix leaks of connect context (#14529)
Remove ConnectContext which built for internal statistics from threadlocal to avoid memory leaks
2022-11-24 13:26:59 +08:00
wxy
6472d5506f [fix](cache) fix cache overflow problem #14515 (#14516)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-11-24 11:18:46 +08:00
ae4f4b9bf1 [fix](agg)having clause should use column name first then alias (#14408)
* [fix](agg)having clause should use column name first then alias

* fix fe ut
2022-11-24 10:31:58 +08:00
f6de03eb6c [chore](github) Add a workflow to check the build for third-party libraries (#14533)
Currently, we build the third-party libraries and release them automatically (See https://github.com/apache/doris-thirdparty/pull/13). We must make sure that the changes for third-party libraries are valid.
2022-11-24 10:07:39 +08:00
70ea07bc4b [fix](nullable) Fix nullable cache to avoid function returning wrong value (#14463) 2022-11-24 09:35:08 +08:00
6ccdaf0aaf [fix](storage-policy) use Long instead of Date to persiste cooldowntime in storage policy (#14532)
Previously, we use "Date" type for cooldownTime in StoragePolicy.
But the serialization method of Date type in Gson is different in java8 and java11, which may cause inconsistent meta error.

This PR use Long to save cooldownTime.
And notice that in FE, the cooldownTime is saved in milliseconds, and in BE, it is saved in seconds.
2022-11-24 08:32:21 +08:00
724e57bb87 [feature](docker)Add runtime docker image related files (#14436) 2022-11-23 23:58:44 +08:00
496a92b668 [JavaUDF](loader) Fix compatible problem for JAVA 11 (#14519) 2022-11-23 23:36:39 +08:00
404cac42f9 [fix](multi catalog)Fix external table partition name and type inconsistent bug. (#14522)
Origin code using Set to store hms external table partition columns,
which couldn't guarantee the order of the columns.
This could cause the column name and column type doesn't match.
Using List instead of Set to fix the problem.
2022-11-23 21:40:44 +08:00
9e39a04b63 [Doc](flink connector) add flink connector faq (#14520) 2022-11-23 21:40:00 +08:00
181f1cf176 [Docs](function) add some missing function docs (#14510) 2022-11-23 21:39:17 +08:00
6770bfc7f0 [fix](pipeline) adjust mem limit to 30% (#14523) 2022-11-23 20:07:45 +08:00
648fd93dc5 [DOCS](function) add document for grouping and grouping_id (#14472) 2022-11-23 18:07:48 +08:00
d14e1d25ff [Bug](vectorized) Fix wrong column type (#14387) 2022-11-23 18:07:33 +08:00
1520e5c88a [enhancement](agg)use new method to serialize keys in batch if the key is too large (#14484)
* [enhancement](agg)use new method to serialize keys in batch if the key is too large

* fix compile error
2022-11-23 17:35:39 +08:00
fd3af489a4 [memory](chunkallocator) disable chunkallocator when reserved bytes == 0 (#14494)
disable chunkallocator when reserved bytes == 0
disable chunkallocator by default
2022-11-23 17:12:53 +08:00
8d5eabb64f [enhancement](Nereids) reduce CostAndEnforcerJob call times (#14442)
record pruned plan's cost to avoid optimize same GroupExpression more than once.
2022-11-23 16:57:41 +08:00
388f067300 [chore](workflow) Disable memory tracker by default on BE UT (macOS) (#14508) 2022-11-23 16:25:42 +08:00
6fcffd041c [test](jdbc)add new mysql jdbc case from other source (#14495) 2022-11-23 16:23:42 +08:00
09cc385caa [Docs](fucntion) Add docs for function random, mod, fmod (#14444) 2022-11-23 16:22:57 +08:00
45975dd321 [enhancement](Nereids): Change circle detector for better performance (#14438) 2022-11-23 14:31:14 +08:00
7a7e714fce [fix](nereids) width and penalty not derive when do stats derive (#14474)
a previous pr (#13883) refactor stats derive code, but missed width and penalty.
2022-11-23 14:26:51 +08:00