Commit Graph

2117 Commits

Author SHA1 Message Date
c8d303a82c [bugfix] Fix BE core about vectorized join build thread memtracker switch, and FileStat duplicate 2022-05-31 19:12:42 +08:00
Pxl
fa50b63cee fix core dump on vcase_expr::close (#9875) 2022-05-31 15:45:39 +08:00
0cba6b7d95 [Bug][Fix] One Rowset have same key output in unique table (#9858)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-31 12:29:16 +08:00
7199102d7c [Opt][VecLoad] Opt the vec stream load performance (#9772)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-31 11:53:32 +08:00
7b55d4cb88 [BUG] return NULL for invalid date value (#9862) 2022-05-30 21:35:41 +08:00
85f525e991 [Bugfix(Vec)] Close result_sink properly (#9849)
Close result_sink properly so that error code is reported and
expr_context is always closed.
2022-05-30 19:03:33 +08:00
f377c26bf7 [refactor][be] Optimize headers (#9708) 2022-05-30 16:12:10 +08:00
4af2493c42 [Improvement] optimize scannode concurrency query performance in vectorized engine. (#9792) 2022-05-30 16:04:40 +08:00
7b98dd438d [feature](function) Add nvl function (#9726) 2022-05-30 09:43:00 +08:00
0683181fef [API changed](parser) Remove merge join syntax (#9795)
Remove merge join sql and merge join node
2022-05-30 09:04:21 +08:00
a96b41db7a [Improvement] Simplify expressions for _vconjunct_ctx_ptr (#9816) 2022-05-29 23:05:21 +08:00
63aab5ee5d [Bugfix(Vec)] Fix some memory leak issues (#9824) 2022-05-29 23:04:11 +08:00
1aeb16d153 [improvement](load) reduce useless err_msg format in VOlapTableSink send (#9531) 2022-05-29 16:02:57 +08:00
9fe3827239 [fix](ut) fix BE ut (#9831)
introduced from #8923, the github checks has some problem that failed to check BE ut in #8923
2022-05-29 12:25:41 +08:00
Pxl
f33ef32d92 [Bug] [Bitmap] change to_bitmap to always_not_nullable (#9716) 2022-05-28 17:33:55 +08:00
4d1e926b6c [feature][config] introduce a new BE config storage_page_cache_shard_size (#9821)
Co-authored-by: gaodayue <gaodayue@bytedance.com>
2022-05-28 10:17:09 +08:00
efdb3b79a5 [feature] add zstd compression codec (#9747)
ZSTD compression is fast with high compression ratio. It can be used to archive higher compression ratio
than default Lz4f codec for storing cost sensitive data such as logs.

Compared to Lz4f codec, we see zstd codec get 35% compressed size off, 30% faster at first time read without OS page 
cache, 40% slower at second time read with OS page cache in the following comparison test.

test data: 25GB text log, 110 million rows
test table: test_table(ts varchar(30), log string)
test SQL: set enable_vectorized_engine=1; select sum(length(log)) from test_table
be.conf: disable_storage_page_cache = true
set this config to disable doris page cache to avoid all data cached in memory for test real decompression speed.
test result

master branch with lz4f codec result: 
- compressed size 4.3G
- SQL first exec time(read data from disk + decompress + little computation) : 18.3s
- SQL second exec time(read data from OS pagecache + decompress + little computation) : 2.4s

this branch with zstd codec (hardcode enable it) result:
- compressed size: 2.8G
- SQL first exec time: 12.8s
- SQL second exec time: 3.4s
2022-05-27 21:56:18 +08:00
b2c2cdb122 [feature] Support compression prop (#8923) 2022-05-27 21:52:05 +08:00
af2cfa2db4 [fix] Fix bug of bloom filter hash value calculation error (#9802)
* Fix bug of bloom filter hash value calculation error

* fix code style
2022-05-27 20:44:26 +08:00
cbbda7857b [feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541) 2022-05-26 21:39:12 +08:00
Pxl
13c1d20426 [Bug] [Vectorized] add padding when load char type data (#9734) 2022-05-26 16:51:01 +08:00
9236c2efc9 [improvement] Show detail status code string for be http api (#9771)
1. move to_json method to common/status
2. modify related usage in http folder
2022-05-26 15:09:21 +08:00
f4dd3bf013 [bugfix] fix memleak in olapscannode(#9736) 2022-05-26 15:06:54 +08:00
24631915ed [bugfix] fix correctness for vectorized compaction (#9773) 2022-05-26 15:05:50 +08:00
cd99c24844 [Improvement] remove unused code in vectorized compaction (#9774) 2022-05-26 15:05:27 +08:00
2a11a4ab99 [feature-wip][array-type] Support more sub types. (#9466)
Please refer to #9465
2022-05-26 08:41:34 +08:00
73e31a2179 [stream-load-vec]: memtable flush only if necessary after aggregated (#9459)
Co-authored-by: weixiang <weixiang06@meituan.com>
2022-05-25 21:12:24 +08:00
8470543144 [Improvement] fix typo (#9743) 2022-05-25 19:29:01 +08:00
f5bef328fe [fix] disable transfer data large than 2GB by brpc (#9770)
because of brpc and protobuf cannot transfer data large than 2GB, if large than 2GB will overflow, so add a check before send
2022-05-25 18:41:13 +08:00
2725127421 [fix] group by with two NULL rows after left join (#9688)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-05-25 16:43:55 +08:00
ca05d1ee01 [fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661)
1. Fix Lru Cache MemTracker consumption value is negative.
2. Fix compaction Cache MemTracker has no track.
3. Add USE_MEM_TRACKER compile option.
4. Make sure the malloc/free hook is not stopped at any time.
2022-05-25 08:56:17 +08:00
90e8cda5f2 [Enhancement](Vectorized)build hash table with new thread, as non-vec… (#9290)
* [Enhancement][Vectorized]build hash table with new thread, as non-vectorized past do

edit after comments

* format code with clang format

Co-authored-by: lidongyang <dongyang.li@rateup.com.cn>
Co-authored-by: stephen <hello-stephen@qq.com>
2022-05-24 10:23:15 +08:00
6353539ef7 [bugfix]teach BufferedBlockMgr2 track memory right (#9722)
The problem was introduced by e2d3d0134eee5d50b6619fd9194a2e5f9cb557dc.
2022-05-24 10:18:51 +08:00
8b7bb2d07c [bugfix]fix column reader compress codec unsafe problem (#9741)
by moving codec from shared reader to unshared iterator
2022-05-23 20:25:49 +08:00
5039ec4570 [vec][opt] opt hash join build resize hash table before insert data (#9735)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-23 15:13:57 +08:00
500c36717d [Bug-Fix][Vectorized] Full join return error result (#9690)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-23 13:29:37 +08:00
c13a6a1d8a [fix] NullPredicate should implement evaluate_vec (#9689)
select column from table where column is null
2022-05-22 21:29:53 +08:00
75b3707a28 [refactor](load) add tablet errors when close_wait return error (#9619) 2022-05-22 21:27:42 +08:00
b3a2a92bf5 [deps] libhdfs3 build enable kerberos support (#9524)
Currently, the libhdfs3 library integrated by doris BE does not support accessing the cluster with kerberos authentication 
enabled, and found that kerberos-related dependencies(gsasl and krb5) were not added when build libhdfs3.

so, this pr will enable kerberos support and rebuild libhdfs3 with dependencies gsasl and krb5:

- gsasl version: 1.8.0
- krb5 version: 1.19
2022-05-22 20:58:19 +08:00
31e40191a8 [Refactor] add vpre_filter_expr for vectorized to improve performance (#9508) 2022-05-22 11:45:57 +08:00
61a60d1dcc [code style] minor update for code style (#9695) 2022-05-20 11:47:49 +08:00
8fa677b59c [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner (#9666)
* [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner
1. fix bug of vjson scanner not support `range_from_file_path`
2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different
3. fix bug of vparquest filter_block reference of column in not 1
4. refactor code to simple all the code

It only changed vectorized load, not original row based load.

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-20 11:43:03 +08:00
6f61af7682 [Vectorized][java-udf] add datetime&&largeint&&decimal type to java-udf (#9440) 2022-05-20 10:26:09 +08:00
5fa6e892be [fix](broker-scan-node) Remove trailing spaces in broker_scanner. Make it consistent with hive and trino behavior. (#9190)
Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't.
This would cause different query result with hive.

Add a new session variable "trim_tailing_spaces_for_external_table_query".
If set to true, when reading csv from broker scan node, it will trim the tailing space of the column
2022-05-20 09:55:13 +08:00
defdae1e7d [improvement](stream-load) adjust read unit of http to optimize stream load (#9154) 2022-05-20 09:52:36 +08:00
2c79d223e4 [refactor][rowset]move rowset writer to a single place (#9368) 2022-05-19 23:57:02 +08:00
ef65f484df [Enhancement] improve parquet reader via arrow's prefetch and multi thread (#9472)
* add ArrowReaderProperties to parquet::arrow::FileReader

* support perfecth batch
2022-05-19 23:52:01 +08:00
Pxl
6951c42d5c [Bug][Vectorized] fix schema change add varchar type column default value get wrong result (#9523) 2022-05-19 23:38:57 +08:00
c09858671d [improvement][performance] improve lru cache resize performance and memory usage (#9521) 2022-05-19 23:37:59 +08:00
0f9ef26576 [Bug] Fix timestamp_diff issue when timeunit is year and month (#9574) 2022-05-19 21:24:43 +08:00