Commit Graph

5948 Commits

Author SHA1 Message Date
12c4d1f4dd [feature-wip](unique-key-merge-on-write) unique key table with MOW supports sequence column (#11808) 2022-08-17 10:56:14 +08:00
c3e6a841c1 [feature-wip](unique-key-merge-on-write) fix that sort segments by segment id in descending order (#11811) 2022-08-17 10:54:30 +08:00
3a49156e30 [performance] (vectorization)optimize In Expr (#11826)
Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-08-17 10:46:37 +08:00
c715209a7e [refactor](dpp) remove original dpp writer (#11838)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-17 10:42:29 +08:00
3e13b7d2c2 [Bugfix](light-shema-change) fix _finish_clone dead lock (#11823)
In engine_clone_task.cpp, it use tablet->tablet_schema() to create rowset, but in the method, it need a lock that already locked in engine_clone_task.cpp:514. It use cloned_tablet_meta->tablet_schema() originally, but modified in #11131. It need to revert to use cloned_tablet_meta->tablet_schema().
2022-08-17 09:10:08 +08:00
a07e153419 [Feature](nereids)support view and nested view (#11589)
support view in query
and add a rewrite rule: merge consecutive projects.
the rule can merge relative consecutive projects to one project to improve efficiency
2022-08-16 19:24:01 +08:00
fadc78c6cf [fix](str_to_date) str_to_date support format without leading zero (#11817)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-16 18:23:16 +08:00
30175010c7 Fix nullptr in perform_remote_tablet_gc (#11820) 2022-08-16 16:50:21 +08:00
f39f57636b [feature-wip](parquet-reader) update column read model and add page index (#11601) 2022-08-16 15:04:07 +08:00
01383c3217 [Enhancement](stream-load-json) using simdjson to parse json (#11665)
Currently we use rapidjson to parse json document, It's fast but not fast enough compare to simdjson.And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by sprintf, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in _set_column_value we could iterate through the json document by for (auto field: object_val) {xxx}, this is much faster than looking up a field by it's field name like objectValue.FindMember("k1").The third optimization is the at_pointer interface simdjson provided, this could directly get the json field from original document.
2022-08-16 14:49:50 +08:00
f2292a3b1d [Enhancement](array-type) enable_array_type flag update (#11785)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-16 14:41:57 +08:00
fecfdd78bf [enhancement](status) Fix Status related macros to enable RVO or move ctor (#11753) 2022-08-16 14:40:35 +08:00
3e10be4ba7 [remote-udaf](sample) add some java demo (#11752) 2022-08-16 14:37:34 +08:00
cf45151a66 [Fix](audit) do not use duplicate query id in fe.audit.log (#11746)
The original logic in ConnectProcessor.java might result in duplicate query id for different query statement in fe.audit.log as follows.
2022-08-16 14:37:16 +08:00
c124470408 [enhancement](memory) Fix too much cache leads to less memory available for queries (#11751)
Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache.

For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock.

Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.
2022-08-16 14:35:57 +08:00
1e61fdf242 [remote-udaf](sample) add some c++ demo (#11761) 2022-08-16 14:32:04 +08:00
4be6e70f1c [fix](query) fix orderby keys limit return less or no result (#11757)
The bug is caused by use _num_rows_read for limit check. _num_rows_read is count of rows read from storage, but may be filtered by filter_block for WHERE predicate.

Add a _num_rows_return, which is rows after filter_block for WHERE predicate, for count for really returned rows.
2022-08-16 14:31:47 +08:00
7d836cf0c7 [fix](memtracker) Fix flush memtable to reduce load channel mem not executed (#11771)
The memory value automatically tracked by the tcmalloc hook in the DeltaWriter is smaller than the value recorded manually in the memtable, because the first 4096-byte Chunk requested by each MemPool when the memtable is initialized is not tracked to the DeltaWriter by the hook.

The values ​​of the two are not equal, causing the mem_consumption() == _mem_table->memory_usage branch judgment to fail.
2022-08-16 14:30:45 +08:00
340ee6af6a (fix)[regression-test] add qt_having1 (#11800) 2022-08-16 14:29:37 +08:00
dc18def456 [regression-test](bitmap) Add regression case for some bitmap funcations (#11783)
Co-authored-by: smallhibiscus <844981280>
2022-08-16 14:23:59 +08:00
2a1803c646 [enhancement](memtracker) Optimize query memory accuracy (#11740)
Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory.

At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.
2022-08-16 14:23:28 +08:00
573588693c [bugfix](load) get max versio in read lock (#11806)
Introduced by #11195。 Get max version from tablet meta should in read lock in multi-thread load。
2022-08-16 12:25:29 +08:00
288b440b14 [improvement](vectorized) Improve count distinct performance by using fastunion (#11516)
Improve count distinct performance by using fastunion.
Testing our user real data has a 10-40% performance improvement.
2022-08-16 12:18:46 +08:00
Pxl
7e5ec6f817 remove ant-design/compatible (#11789) 2022-08-16 12:02:49 +08:00
56574b5948 [refactor] rename data source to catalog (#11243)
In #10702, the origin Catalog class has been renamed to Env.
Now we can rename the datasource to catalog.
2022-08-16 11:45:49 +08:00
d2bb3ad08e [fix](memtracker) Fix core in logout task mem tracker (#11797) 2022-08-16 11:28:06 +08:00
decffae032 fix ALTER SYSTEM docs (#11780)
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-08-16 10:52:24 +08:00
8a3ee91bb5 [remote-udaf](sample) add some python demo (#11760) 2022-08-16 09:26:36 +08:00
f3f1bbc48c [fix](agg)disallow group by bitmap or hll data type (#11782)
* [fix](agg)disallow group by bitmap or hll data type
2022-08-16 09:25:02 +08:00
d37cf0a41b [regression-test](p0) add p0 test cases (#11624)
Add p0 test cases, including:
aggregate
join
union
order by
group by
keyword
arithmetic operators
logical operators
case function
coalesce
between
in
like
limit
where
regexp
window function
runtime filter
schema change
2022-08-15 23:12:07 +08:00
ce10151273 [docs](fix)fix some error path (#11741) 2022-08-15 21:43:56 +08:00
d2d4423c88 [feature](schema change) Light schema change support rollup (#11494)
1. Move max colUniqueId from OlapTable to IndexMeta.
2. Add updateSlotUniqueId.
2022-08-15 21:39:27 +08:00
5104982614 [enhancement](tracing) append the profile counter to trace. (#11458)
1. append the profile counter and infos to span attributes.
2. output traceid to audit log.
2022-08-15 21:36:38 +08:00
8f98357c0b [fix](array-type) disable cast function to array type on origin exec engine. (#11602)
This commit disable cast to array type on origin exec engine, except cast varchar to array type.
2022-08-15 21:30:56 +08:00
fef37990a3 [fix](array-type) Fix incorrect in function-set for array type (#11585)
There is some wrong logic in FunctionSet.java and it may causes potential risks for array functions invoke.
2022-08-15 21:29:57 +08:00
71df82696d [fix](schema change) fix memory exceeded when schema change (#11748)
In row mode schema change, it will fail sometime because memory exceeded.
When the left memory is enough for sorting but not enough for next block,
it will not flush row_block_arr which data in memory and continue to alloc next block
so it can't alloc the memory and return directly.
And if it can't alloc the memory for block, it need to flush row_block_arr and
try it again unless row_block_arr is empty.
2022-08-15 17:57:39 +08:00
0f75bd0e38 [fix](delete) fix query result error after delete (#11754)
convert dictionary code for delete predicates.
2022-08-15 17:52:03 +08:00
0b9bfd15b7 [feature-wip](parquet-reader) parquet physical type to doris logical type (#11769)
Two improvements have been added:
1. Translate parquet physical type into doris logical type.
2. Decode parquet column chunk into doris ColumnPtr, and add unit tests to show how to use related API.
2022-08-15 16:08:11 +08:00
600254855c [doc] fix typos in readme.md (#11776) 2022-08-15 14:31:32 +08:00
910d51c76f [fix](update) Fix where clause is not reanalyzed after rewrite (#11723) 2022-08-15 13:24:57 +08:00
805c13aaa1 [fix](backup) fix backup restore raise Storage backend not initialized. error (#11736)
fix backup restore raise Storage backend not initialized. error
2022-08-15 13:24:38 +08:00
1e6b8cd1a9 [feature](nereids): SimplifyCastRule (#11630)
Remove redundant cast like
```
cast(1 as int) -> 1
```
2022-08-15 12:41:36 +08:00
74b0d0da88 [chore](regression) Add badges for jenkins on home page (#11727)
Add jenkins daily test results badges for jenkins in doris home page.
2022-08-15 12:37:48 +08:00
ab9529f6b5 [enhancement](array-type) support export files in 'select into outfile' (#11703)
this pr is used to support export array type in 'select into outfile'.
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-15 12:34:31 +08:00
66116b78d1 fix multJoin bug (#11772)
In MultiJoin transform, if there is no condition, we should generate cross join instead of inner join
2022-08-15 11:59:09 +08:00
8c8f48c4c2 [feature-wip](array-type) add the array_join function (#11406)
this pr is used to add the array_join function.
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-15 11:43:17 +08:00
77e241cbb0 [refactor](date) Use uint32 as predicate type for date type (#11708)
Use uint32 as predicate type for date type
2022-08-15 11:12:33 +08:00
073e46097f [doc](community) modify verify release doc (#11766)
modify verify release doc
2022-08-15 09:59:17 +08:00
6d56755336 update english doc (#11759)
update english doc
2022-08-14 07:42:50 +08:00
0333261a75 [typo](doc) Fix sidebar version (#11764)
fix sidebar version
2022-08-14 07:41:16 +08:00