Commit Graph

883 Commits

Author SHA1 Message Date
1a035e2073 [fix](profile)(AggNode) fix the GetResultsTime is always zero (#14366)
add scoped_timer in _serialize_with_serialized_key_result
2022-11-17 22:30:21 +08:00
50bfd99b59 [feature](join) support nested loop semi/anti join (#14227) 2022-11-17 22:20:08 +08:00
d5af4f6558 [Neried](Profile) Add projection timer for neried (#14286) 2022-11-17 22:17:55 +08:00
6da2948283 [feature-wip](multi-catalog) support iceberg v2(step 1) (#13867)
Support position delete(part of).
2022-11-17 17:56:48 +08:00
7182f14645 [improvement][fix](multi-catalog) speed up list partition prune (#14268)
In previous implementation, when doing list partition prune, we need to generation `rangeToId`
every time we doing prune.
But `rangeToId` is actually a static data that should be create-once-use-every-where.

So for hive partition, I created the `rangeToId` and all other necessary data structures for partition prunning
in partition cache, so that we can use it directly.

In my test, the cost of partition prune for 10000 partitions reduce from 8s -> 0.2s.

Aslo add "partition" info in explain string for hive table.
```
|   0:VEXTERNAL_FILE_SCAN_NODE                           |
|      predicates: `nation` = '0024c95b'                 |
|      inputSplitNum=1, totalFileSize=4750, scanRanges=1 |
|      partition=1/10000                                 |
|      numNodes=1                                        |
|      limit: 10                                         |
```

Bug fix:
1. Fix bug that es scan node can not filter data
2. Fix bug that query es with predicate like `where substring(test2,2) = "ext2";` will fail at planner phase.
`Unexpected exception: org.apache.doris.analysis.FunctionCallExpr cannot be cast to org.apache.doris.analysis.SlotRef`

TODO:
1. Some problem when quering es version 8: ` Unexpected exception: Index: 0, Size: 0`, will be fixed later.
2022-11-17 08:30:03 +08:00
20634ab7e3 [feature-wip](multi-catalog) support partition&missing columns in parquet lazy read (#14264)
PR https://github.com/apache/doris/pull/13917 has supported lazy read for non-predicate columns in ParquetReader, 
but can't trigger lazy read when predicate columns are partition or missing columns.
This PR support such case, and fill partition and missing columns in `FileReader`.
2022-11-16 08:43:11 +08:00
70cc725649 [Vectorized](function) support avg_weighted/percentile_array/topn_wei… (#14209)
* [Vectorized](function) support avg_weighted/percentile_array/topn_weighted functions

* update add to stringRef
2022-11-15 16:38:38 +08:00
5badd70db2 [fix](csv-reader) Fix core dump when load text into doris with special delimiter (#14196) 2022-11-15 16:06:59 +08:00
6d2e6d85d3 [enhancement](be)release memory in Node's close() method (#14258)
* [enhancement](be)release memory in Node's close() method

* format code
2022-11-15 15:59:23 +08:00
f86886f8f5 [Feature](function) Support array_compact function (#14141) 2022-11-15 14:24:37 +08:00
6cc5ae077e [Improvement](Sequence function) Capitalize const variables (#14270) 2022-11-15 10:41:53 +08:00
215a4c6e02 [Bug](BHJ) Fix wrong result when use broadcast hash join for naaj (#14253) 2022-11-15 09:40:00 +08:00
cffdeff4ec [fix](memory) Fix memory leak by calling boost::stacktrace (#14269)
boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace.
The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace.
refer to:
boostorg/stacktrace#118
boostorg/stacktrace#111
2022-11-15 08:58:57 +08:00
93e5d8e660 [Vectorized](function) support bitmap_from_array function (#14259) 2022-11-15 01:55:51 +08:00
fc70179acb [multi-catalog](fix) the eof of lazy read columns may be not equal to the eof of predicate columns (#14212)
Fix three bugs:
1. The EOF of lazy read columns may be not equal to the EOF of predicate columns.
(for example: If the predicate column has 3 pages, with 400 rows for each, but the last page
is filtered by page index. When batch_size=992, the EOF of predicate column is true.
However, we should set batch_size=800 for lazy read column, so the EOF of lazy read column may be false.)
2. The array column does not count the number of nulls
3. Generate wrong NullMap for array column
2022-11-14 14:37:21 +08:00
7bb3792d51 [chore](build) Split the compliation units to build them in parallel (#14232) 2022-11-14 10:57:10 +08:00
d55faa7f6a [feature](remote)Only query can use local cache when reading remote files. (#13865)
When calling select on remote files, download cache files to local disk.
When calling alter table on remote files, read files directly from remote storage. So if tablet is too large, it will not take up too many local disk when creating local cache file.
2022-11-14 10:30:15 +08:00
139c4a77f1 [enhancement](be)close ExecNode ASAP to release resource earlier (#14203) 2022-11-14 09:41:35 +08:00
dd11d5c0a5 [enhancement](memory) Support try catch bad alloc (#14135) 2022-11-13 11:22:56 +08:00
376b4fda9f [fix](scankey) fix extended scan key errors. (#14200)
Issue Number: close #14199
2022-11-12 20:44:09 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
43490a33a5 [feature-array](array-type) Add array function array_with_constant (#14115)
Return array of constants with length num.

```
mysql> select array_with_constant(4, 1223);
+------------------------------+
| array_with_constant(4, 1223) |
+------------------------------+
| [1223, 1223, 1223, 1223]     |
+------------------------------+
1 row in set (0.01 sec)
```
co-authored-by @eldenmoon
2022-11-11 22:08:43 +08:00
0ba13af8ff [feature](running_difference) support running_difference function (#13737) 2022-11-11 21:22:56 +08:00
43f80e2633 [enhancement](load) Increase batch size of node channel to improve import performance (#13912) 2022-11-11 18:05:36 +08:00
fe2944d56d [Bug](nljoin) Keep compatibility for nljoin (#14182) 2022-11-11 15:54:55 +08:00
b6ba654f5b [Feature](Sequence) Support sequence_match and sequence_count functions (#13785) 2022-11-11 13:38:45 +08:00
118a7dff07 [chore](build) Optimize the compilation time (#14170)
Currently, it takes too much time to build BE from source in workflow environments (P0/P1) which affects the efficiency of daily development.

We can measure the time by executing the following command.

time EXTRA_CXX_FLAGS='-O3' BUILD_TYPE=ASAN ./build.sh --be --fe --clean -j "$(nproc)"
This PR optimizes the compilation time by exploiting the following methods.

Reduce the codegen by removing some useless std::visit.
Disable the optimization for some template functions which are instantiated by std::visit conditionally (except for the RELEASE build).
2022-11-11 12:09:54 +08:00
883dfa38ab [fix](decimal) change log fatal to log warning to avoid code dump on decimal type (#14150) 2022-11-11 11:22:41 +08:00
12652ebb0e [UDF](java udf) using config to enable java udf instead of macro at compile time (#14062)
* [UDF](java udf) useing config to enable java udf instead of macro at compile time
2022-11-11 09:03:52 +08:00
1ef85ae1f2 [Improvement](join) Support nested loop outer join (#13965) 2022-11-10 19:50:46 +08:00
6bd5378f66 [feature-wip](multi-catalog) lazy read for ParquetReader (#13917)
Read predicate columns firstly, and use VExprContext(push-down predicates)
to generate the select vector, which is then applied to read the non-predicate columns.
The data in non-predicate columns may be skipped by select vector, so the value-decode-time can be reduced.
If a whole page can be skipped, the decompress-time can also be reduced.
2022-11-10 16:56:14 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
a73f4dfdc1 [fix](memtracker) Fix scanner thread ending after fragment thread causing mem tracker null pointer #14143 2022-11-10 15:42:53 +08:00
90bfd87660 [feature](function) add new function uuid() (#14092) 2022-11-10 14:55:41 +08:00
184cee2d2b [Bug](outfile) Fix wrong decimal format for ORC (#14124) 2022-11-10 11:01:30 +08:00
43eb946543 [feature](table-valued-function)S3 table valued function supports parquet/orc/json file format #14130
S3 table valued function supports parquet/orc/json file format.
For example: parquet format
2022-11-10 10:33:12 +08:00
10df61b5bf [improvement](join) Share hash table in fragments for broadcast join (#13921) 2022-11-10 09:48:34 +08:00
df622d8b7d [Bug](udf) fix java-udaf process string type error and add some tests (#14106) 2022-11-10 09:30:57 +08:00
Pxl
794a551b0f [Enhancement][fix](profile)() modify some profiles (#14074)
1. add RemainedDownPredicates
2. fix core dump when _scan_ranges is empty
3. fix invalid memory access on vLiteral's debug_string()
4. enlarge mv test wait time
2022-11-09 21:59:28 +08:00
322ac5cf89 [refractor](array) refractor DataTypeArray from_string (#13905)
refractor DataTypeArray from_string, make it more clear;
support ',' and ']' inside string element, for example: ['hello,,,', 'world][]']
support empty elements, such as [,] ==> [0,0]
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-09 16:58:08 +08:00
f912d4e392 [fix](compile) fix compile error #14103
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-09 14:10:06 +08:00
e692636b4f [performance-wip] (vectorization) Opt HashJoin Performance (#12390) 2022-11-09 14:07:49 +08:00
a3c5fa8c01 [Compile](join) Boost compiling and linking (#14081) 2022-11-09 11:27:46 +08:00
55ca810445 [fix](Vectorized)fix json_object and json_array function return wrong result on vectorized engine (#13775)
Issue Number: close #13598
2022-11-09 11:26:55 +08:00
aec214b4b0 [bug](ColumnDecimal)call set_decimalv2_type when cloning ColumnDecimal (#14061)
* call set_decimalv2_type when cloning ColumnDecimal

* clang format
2022-11-09 11:23:43 +08:00
291fa499e9 [fix](JSON) Fail to parse JSONPath (libc++) (#13941) 2022-11-09 08:58:01 +08:00
cd8f0713ea [refactor](new-scan) remove old vectorized scan node (#14029) 2022-11-09 08:39:20 +08:00
826cfdaf93 [feature](information_schema) add backends information_schema table (#13086) 2022-11-08 22:15:10 +08:00
Pxl
ae3c513d74 use extern template to date_time_add (#13970) 2022-11-08 22:11:41 +08:00
c2a01e84b4 [feature-wip](multi-catalog) fix page index filter bug (#14015)
Fix page index filter not take effect when multiple columns
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-11-08 12:10:12 +08:00