Commit Graph

3171 Commits

Author SHA1 Message Date
ba9a777554 [fix](function) StringRef should not be key of timezone cache (#14719) 2022-12-01 16:31:47 +08:00
9dd1d989e8 [test](decimalv3) add regression test cases for decimalv3 (#14672) 2022-12-01 15:18:40 +08:00
176f519fa1 [enhancement](memtracker) Optimize exec node memory tracking (#14711) 2022-12-01 14:52:21 +08:00
b4d32a0c44 [fix](join) runtime filter shared from other instance wasn't be published (#14717) 2022-12-01 14:17:23 +08:00
Pxl
bba77fa9dd [Enhancement](profile) enhance column predicates display on profile (#14664) 2022-12-01 13:07:12 +08:00
7873bc95a6 [Enhancement](bitmapfilter) Support bitmap filter to apply zone_map index to filter pages (#14635) 2022-12-01 10:41:09 +08:00
ce9a160d16 [enhancement](macOS) Make CLion work out of the box (#14689)
We can't build the project after import it to CLion on macOS. Some options must be provided by default.
2022-12-01 10:40:04 +08:00
6c70d794f6 [fix](bitmapfilter) fix core dump caused by bitmap filter (#14702) 2022-12-01 09:56:22 +08:00
79688a54d6 [bug](jsonb) fix be core at insert invalid json to JSONB column (#14686) 2022-11-30 14:00:50 +08:00
486a77fec0 [fix](tcmalloc) use low_watermark instead of hard_mem_limit (#14660)
* [fix](tcmalloc) use low_watermark instead of hard_mem_limit

hard_mem_limit is removed.

* format
2022-11-30 11:29:57 +08:00
9272680d00 [feature](multi-catalog) support Jdbc catalog (#14527)
Issue Number: close #xxx

I add jdbc catalog for doris multi-catalog feature.
Currently, the jdbc catalog only supports MYSQL DBMS.

TODO:

support for postgre DB
Support for other databases.
Problem summary
For jdbc catalog, we can create catalog like:

CREATE CATALOG jdbc4 PROPERTIES (
    "type"="jdbc",
    "jdbc.user"="root",
    "jdbc.password"="123456",
    "jdbc.jdbc_url" = "jdbc:mysql://127.0.0.1:13396/demo?yearIsDateType=false",
    "jdbc.driver_url" = "file:/mnt/disk2/ftw/tools/jar/mysql-connector-java-5.1.47/mysql-connector-java-5.1.47.jar",
    "jdbc.driver_class" = "com.mysql.jdbc.Driver"
);
Note:
yearIsDateType is a param of jdbc:
If yearIsDateType configuration property is set to false, then the returned object type is java.sql.Short. If set to true (the default), then the returned object is of type java.sql.Date with the date set to January 1st, at midnight.
To compat with mysql, we force the use of yearIsDateType=false in FE. if user sets yearIsDateType=true, doris FE will force to change yearIsDateType=false.
2022-11-30 11:28:08 +08:00
Pxl
7a1fde379c [Enhancement](function) optimize for decimal arithmetic calculation (#14674)
* optimize for decimal arithmetic calculation

* Apply suggestions from code review

Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>

Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2022-11-30 10:41:03 +08:00
898d0d42f1 [improvement](load)add more log for better bug tracing experience for be write (#14424)
Recently when tracing one bug happened in version 1.1.4
I found out there were some places we can add more log for a better tracing.
2022-11-29 22:28:39 +08:00
82579126cf [fix](Dictionary-codec) heap overflow with in-predicate on nullable columns (#14319) (#14641)
Losing segmentid info will mess up the _segment_id_to_value_in_dict_flags map
in InListPredicate, causing two distinct segments to collide and crash the BE
at last.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-11-29 21:22:18 +08:00
22883e7e08 [fuzzy](test) be fuzzy conf (#14654) 2022-11-29 19:38:40 +08:00
a60490651f [improvement](function) add timezone cache for convert_tz (#14616) 2022-11-29 17:00:54 +08:00
fe95b84c34 [fix](jsonb)fix CAST String to JSONB nullable problem (#14626)
fix CAST String to SONB nullable problem in DEBUG mode.
2022-11-29 16:22:22 +08:00
7a08a799e9 [Vectorized](function) support order by convert_to function (#14555) 2022-11-29 15:22:27 +08:00
3e8b3658c7 [feature-wip](decimalv3) Support basic agg and arithmetic operations for decimal v3 (#14513) 2022-11-29 15:12:41 +08:00
Pxl
82da071b45 [Chore](format) update clang-format version to 15 (#13036)
update clang-format version to 15
2022-11-29 14:46:10 +08:00
f7a827c06b [fix](new-scan) fix some bugs about new scan node and readers (#14504)
json reader DCHECK fail because of missing TYPE_STRING

fix bug that if no file is found, the tvf will throw NPE.

The predicate conjuncts can not be pushed down to parquet reader if this is a load task.
Because the predicate should be applied on column of dest table, not on column of source file.

Add a temp property "use_new_load_scan_node" of broker load to make regression test happy.
So that we can use new load scan node for a certain job and avoid setting global FE config.
2022-11-29 10:21:41 +08:00
e1f0fa069c [enhancement](memory) Refactored process memory statistics periodically refresh, and fix catch bad_alloc (#14580) 2022-11-29 10:15:25 +08:00
7513c82431 [NLJoin](conjuncts) separate join conjuncts and general conjuncts (#14608) 2022-11-29 08:55:54 +08:00
daeabcf053 [improvement](vec) optimize the logic for _has_null in ColumnNullable (#14633) 2022-11-29 08:53:30 +08:00
0702277196 [improvement](tcmalloc) add moderate mode and avoid oom with a lot of cache (#14374)
ReleaseToSystem aggressively when there are little free memory.
2022-11-28 20:17:51 +08:00
1e690ea6aa [fix](bitmapfilter) Set bitmap filter waiting time to the query timeout. (#14623)
bitmap filter is precise filter and only filter once, so it must be applied.
2022-11-28 18:57:27 +08:00
529bdfb153 [Fix](function) Fix retention function return wrong value type (#14552)
MySQL [db]> SELECT SUM(a.r[1]) as active_user_num, SUM(a.r[2]) as active_user_num_1day, SUM(a.r[3]) as active_user_num_3day, SUM(a.r[4]) as active_user_num_7day FROM ( SELECT user_id, retention( day = '2022-11-01', day = '2022-11-02', day = '2022-11-04', day = '2022-11-07') as r FROM login_event WHERE (day >= '2022-11-01') AND (day <= '2022-11-21') GROUP BY user_id ) a;
ERROR 1105 (HY000): errCode = 2, detailMessage = sum requires a numeric parameter: sum(%element_extract%(a.r, 1))
2022-11-28 15:56:18 +08:00
73a600fba8 bug fix for outfile (#14550)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-28 15:46:41 +08:00
9d087a01f3 [improvement](startup) not print error stack for OLAP_ERR_TABLE_ALREADY_DELETED_ERROR because there are too much errors during startup (#14627)
If there are too much deleted tablets in RocksDB, there are many OLAP_ERR_TABLE_ALREADY_DELETED_ERROR during startup and will try to get error stack. It will cost a lot of time and the start process taks very very long.
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-11-28 15:43:24 +08:00
Pxl
d712c4efe1 [Enhancement](predicate) move create column predicate to create_predicate_function (#14588)
move create column predicate to create_predicate_function
use same macro to create_column_predicate and create_predicate_function
2022-11-28 14:13:40 +08:00
ed92a8f81e [feature](jsonb function)change jsonb_extract_string behavior and doc (#14619)
1. change jsonb_extract_string behavior: convert to string instead of NULL if the type of json path is not string
2. move jsonb tutorial doc to JSONB data type
2022-11-28 11:36:54 +08:00
39c47d930b [improvement](load) add more log on rpc error (#14559)
* [improvement](load) add more log on rpc error

* update
2022-11-28 08:32:20 +08:00
78adecac1b [enhancemennt](be)optimize mem usage in join and set node (#14602) 2022-11-27 13:38:49 +08:00
36419fae48 [fix](JdbcExecutor) fix that JdbcExecutor did not load the class jar (#14598)
JdbcExecutor did not load jdbc driver jar, so add classloader to load jdbc jar.
2022-11-26 23:53:05 +08:00
d5d3f7e0b7 [fix](memtracker) Fix thrift BackendService thread local is not initialized, memtracker init fail (#14589) 2022-11-26 13:04:39 +08:00
70a424d6e3 [Bug](regression) Fail regression test in test_grouping_sets in fuzzy mode (#14601) 2022-11-26 12:17:31 +08:00
81fece5360 [improvement](cache) close compaction&schema_change&checksum index meta cache (#14586) 2022-11-26 12:15:32 +08:00
064b8d2aa6 [fix](multi-catalog) fix coredump when querying partitioned hive table with text format (#14604)
BE will crash when querying partitioned hive table with text format
and put partition column at first of select items.

1. FE should use file slots to set the column mapping index of csv file.
2. BE should use `get_by_name` of block to get right column in a block in csv reader.
2022-11-26 11:42:40 +08:00
52c6ba051e [feature](jsonb type)refactor JSONB type using column and add testcase (#13778)
1. Refactor JSONB type using ColumnString instead making a copy.
2. Add regression testcase for JSONB load and functions.
2022-11-26 10:06:15 +08:00
7ae7830c50 [improvement](function)add size function alias array_size (#14594)
* add size function alias

* fix
2022-11-25 22:29:48 +08:00
4728e75079 [feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340)
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
2022-11-25 15:22:44 +08:00
d5777bb1e9 [enhancement](outfile) add retry for broker pwrite #14556
Problem:
We got following error frequently while SELECT xxx INTO OUTFILE:
ERROR 1064 (HY000): RpcException, msg: Fail to write to broker, broker:TNetworkAddress(hostname=a.b.c.d, port=8111) failed:write() send(): Broken pipe

Reason:

we cache broker thrift client in BE;
thrift client check connect isOpen only return cached flag, not care the real socket is opened or closed;
after we get client from cache, the socket may already closed, then pwrite will failed.
How to fix:
Other interfaces such as open and close, will reopen and retry again, but pwrite do not retry.
As there are write offset inside pwrite, and the broker(server) side also will check the write offset, it is safe to retry pwrite.
2022-11-25 14:20:33 +08:00
7ba4cd764a [enhancement](array-function) array_position,array_contains,countequal which in FunctionArrayIndex handle target NULL (#14564)
in the previous, the result is:
```
mysql> select array_position([1, null], null);
+--------------------------------------+
| array_position(ARRAY(1, NULL), NULL) |
+--------------------------------------+
|                                 NULL |
+--------------------------------------+
1 row in set (0.02 sec)
```

but after this commit, the result become:
```
mysql> select array_position([1, null], null);
+--------------------------------------+
| array_position(ARRAY(1, NULL), NULL) |
+--------------------------------------+
|                                    2 |
+--------------------------------------+
1 row in set (0.02 sec)
```
2022-11-25 14:19:50 +08:00
d5d356b17f [vectorized](function) support order by field function (#14528)
* [vectorized](function) support order by field function

* update

* update test
2022-11-25 14:00:46 +08:00
25de068a05 [fix](parquet-reader) the value of null map will overflow when LazyRead merges too many empty batches (#14558)
The run length of null map is saved as `uint16_t`. Previously, the run length of null map was
limited by `batch_size` in the `ParquetReader`, by setting `batch_size = std::min(batch_size, (size_t)USHRT_MAX)`.
It works well when the batch size is less than `USHRT_MAX`.
However, [Lazy read](https://github.com/apache/doris/pull/13917) will merge empty batches until reading
a non-empty batch or reaching the EOF of a row group, so the `batch_size` may be greater than `USHRT_MAX`
in non-predicate columns.
In addition, even if the `batch_size` does not exceed `USHRT_MAX`, the adjacent batches may also make
the run  length exceed the `USHRT_MAX` in `ColumnSelectVector::get_next_run`.
2022-11-25 12:22:18 +08:00
f68fa442cd [Bug](regression-test) Fix regression aggregate failed muti distinct (#14563)
Fix regression aggregate failed muti distinct
2022-11-25 10:58:10 +08:00
225e4981ed [feature](selectdb-cloud) Fix leak in VCollectorIterator (#962) (#14549)
`VCollectIterator::build_heap()` leaks memory when there is a `VCollectIterator::LevelIterator::init()` fails.
2022-11-25 10:25:24 +08:00
9103ded1dd [improvement](join)optimize sharing hash table for broadcast join (#14371)
This PR is to make sharing hash table for broadcast more robust:

Add a session variable to enable/disable this function.
Do not block the hash join node's close function.
Use shared pointer to share hash table and runtime filter in broadcast join nodes.
The Hash join node that doesn't need to build the hash table will close the right child without reading any data(the child will close the corresponding sender).
2022-11-24 21:06:44 +08:00
bc699511d0 [Fix](array-function) fix array_distinct null values (#14544)
in the previous the result is:
```
mysql> select array_distinct([1,1,3,3,null, null, null]);
+-----------------------------------------------------+
| array_distinct(ARRAY(1, 1, 3, 3, NULL, NULL, NULL)) |
+-----------------------------------------------------+
| [1, 3, NULL, NULL, NULL]                            |
+-----------------------------------------------------+
1 row in set (0.00 sec)
```

after this fix, the result becomes:
```
mysql> select array_distinct([1,1,3,3,null, null, null]);
+-----------------------------------------------------+
| array_distinct(ARRAY(1, 1, 3, 3, NULL, NULL, NULL)) |
+-----------------------------------------------------+
| [1, 3, NULL]                                        |
+-----------------------------------------------------+
1 row in set (0.00 sec)
```
2022-11-24 19:07:28 +08:00
ac46922433 [fix](ut) Fix failures for BE UT macOS (#14543) 2022-11-24 17:39:37 +08:00