Commit Graph

10084 Commits

Author SHA1 Message Date
0c95d760fe [fix](fixed_hashtable) The incorrect implementation of copy constructor (#18921) 2023-04-24 08:36:52 +08:00
4ba6c8b6ce [community](collaborator) add more collaborators (#18976) 2023-04-24 08:30:20 +08:00
b4282641c1 [typo](doc) Fixed typos in ADMIN-SHOW-CONFIG.md (#18969)
* [typo](doc) Fixed typos in ADMIN-SHOW-CONFIG.md

* Update ADMIN-SHOW-CONFIG.md
2023-04-24 08:29:55 +08:00
e4f058bad5 modified some text errors (#18968) 2023-04-24 08:29:45 +08:00
5c31a0867c [typo](doc) Fixed typos in OUTFILE.md (#18967) 2023-04-24 08:29:35 +08:00
27b8227cb5 [typo](docs)Optimize the installation And deployment directory structure (#18966) 2023-04-24 08:29:24 +08:00
07ea350201 [Fix](inverted index) fix memory leak when create bkd reader (#18914)
The function compoundReader->openInput is called three times, and if any of these calls fail,
an error is logged, and the function returns early. If one or two of the calls succeed, but the others fail,
there might be a situation where the allocated memory for the IndexInput objects is not freed.

To fix this, you could use std::unique_ptr to manage the memory for IndexInput objects.
This would automatically clean up the memory when the function goes out of scope.
2023-04-23 23:21:44 +08:00
c3baa65de3 [feature](io) enable s3 file writer with multi part uploading concurrently (#17585)
Formerly S3FileWriter has to write each buffer with 5MB or more then upload one part, after all these works are done it could then process the incoming data, it's blocking and inefficient. This pr brings one bufferpool where the data could write into memory buffer immediately if has free buffer and then it would be uploaded into the S3.
This pr doesn't provide the ability to elegantly support cases where there is no free buffer, i'll leave it as one future work.
2023-04-23 23:19:44 +08:00
3736530585 [refactor](query context) rename query fragments context to query context and make query context safe (#18950)
* [refactor](query context) rename query fragments context to query context and make query context safe

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-23 22:53:56 +08:00
29fdf1fb7e [typo](docs) add enable_ssl config doc (#18961) 2023-04-23 22:27:28 +08:00
1e7ef35741 [fix](Nereids) two phase read for topn only support simple case (#18955)
1. topn must has merge node
2. topn must the top node of plan
2023-04-23 21:32:23 +08:00
45d0f53529 [Regression-test](Export) add regression test for export #18897 2023-04-23 19:43:22 +08:00
a9ac930e5f [Fix](mutli-catalogs) Fix jdbc regression tests. (#18927)
- Fix `test_show_where` result.
- Remove `enable_decimal_conversion = true` in `test_mysql_jdbc_catalog`.
- Remove `test_show_create_catalog`.
2023-04-23 19:42:13 +08:00
25e8c71943 [test](fix) fix postgresql test (#18900)
* [test](fix) fix postgresql test

* fix
2023-04-23 18:41:41 +08:00
2c776584e5 [doc](releasenote)release 1.2.4 (#18934)
* release 1.2.4

* Update README.md

* Update sidebars.json
2023-04-23 16:04:25 +08:00
0da2cf270a [improvement](fetch data) Merge result into batch to reduce rpc times (#17828) 2023-04-23 15:07:28 +08:00
63e8fb7300 [chore](regression) Add 'sync' after stream_load in some cases (#18945) 2023-04-23 14:39:33 +08:00
166bed11d4 [Enchancement](auth) Forbid to login doris from 127.0.0.1 without password (#18816)
* forbid to login from 127.0.0.1 without password

* add localhost limit

* rename
2023-04-23 13:56:31 +08:00
61b44108e2 [bugfix](asan) fix possible asan check bug in exception to string (#18936)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-23 12:26:36 +08:00
29f502380c [opt](FileReader) merge small IO to optimize read performace (#18796)
Add `MergeRangeFileReader` to merge small IO to optimize parquet&orc read performance.

`MergeRangeFileReader` is a FileReader that efficiently supports random access in format like parquet and orc.
In order to merge small IO in parquet and orc, the random access ranges should be generated when creating the 
reader. The random access ranges is a list of ranges that order by offset.
The range in random access ranges should be reading sequentially, can be skipped, but can't be read repeatedly.
When calling read_at, if the start offset located in random access ranges, the slice size should not span two ranges.

For example, in parquet, the random access ranges is the column offsets in a row group.

When reading at offset, if [offset, offset + 8MB) contains many random access ranges,
the reader will read data in [offset, offset + 8MB) as a whole, and copy the data in random access ranges into small 
buffers(name as box, default 1MB, 64MB in total). A box can be occupied by many ranges,
and use a reference counter to record how many ranges are cached in the box. If reference counter equals zero,
the box can be release or reused by other ranges. When there is no empty box for a new read operation,
the read operation will do directly.

## Effects
The runtime of ClickBench reduces from 102s to 77s, and the runtime of Query 24 reduces from 24.74s to 9.45s.
The profile of Query 24:
```
 VFILE_SCAN_NODE  (id=0):(Active:  8s344ms,  %  non-child:  83.06%)
    -  FileReadBytes:  534.46  MB
    -  FileReadCalls:  1.031K  (1031)
    -  FileReadTime:  28s801ms
    -  GetNextTime:  8s304ms
    -  MaxScannerThreadNum:  12
    -  MergedSmallIO:  0ns
        -  CopyTime:  157.774ms
        -  MergedBytes:  549.91  MB
        -  MergedIO:  94
        -  ReadTime:  28s642ms
        -  RequestBytes:  507.96  MB
        -  RequestIO:  1.001K  (1001)
    -  NumScanners:  18
```
1001 request IOs has been merged into 94 IOs.

## Remaining problems
1. Add p2 regression test in nest PR
2. Profiles are scattered in various codes and will be refactored in the next PR
3. Support ORC reader
2023-04-23 10:51:38 +08:00
b81b470d4f [fix](planner) fix pr "using crchash replace murmurhash in the runtime filter" (#18759) 2023-04-23 10:33:35 +08:00
9756be6bf0 [improvement](stream-load) use vector instead of skiplist when insert dup keys (#18686) 2023-04-23 09:40:09 +08:00
e7ad536a71 [scirpte](download) add 1.2.4 download script (#18932) 2023-04-23 07:40:19 +08:00
bc379eebed [doc](show-rollup)delete SHOW-ROLLUP doc. (#18924)
Co-authored-by: smallhibiscus <844981280>
2023-04-22 23:39:24 +08:00
e44aad2b86 [typo](docs)add new attention of doris flink connector (#18930) 2023-04-22 23:38:48 +08:00
34ce946f5b [tools](profile) add script file to get all tree profiles off a query (#18587)
Add a tool script that output query profiles of all fragment instances in tree form.
2023-04-22 22:10:57 +08:00
fd905b66b0 [refactor](jdbc) close datasource if no need to maintain the cache (#18724)
after pr #18670
could use jvm parameters to init jdbc datasource,
but when set JDBC_MIN_POOL=0, it can be immediately closed.
There is no need to wait for the recycling timer.
2023-04-22 22:07:34 +08:00
1ff2ccc6c5 [Fix](docker) Fix regression test docker issues. (#18928)
1. Fix not reset data after pg restarted.
2. 'docker-compose' to 'docker compose'.
2023-04-22 18:03:50 +08:00
1ffd34f6f1 [Refact](type system)refact interconversion for jsonb with column (#18819)
* refact jsonb to column

* update

* fix format

* fixed

* fix file head for compile
2023-04-22 14:01:05 +08:00
814f12981d [feat](Nereids): validate Project list. (#18868) 2023-04-22 12:32:51 +08:00
c80dc91a78 [bugfix](memleak) UserFunctionCache may have memory leak during close (#18913)
* [bugfix](memleak) UserFunctionCache may have memory leak during close

* [bugfix](memleak) UserFunctionCache may have memory leak during close

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-22 10:15:51 +08:00
04d18eec59 [Improve](be)check max open file #18888 2023-04-22 08:42:43 +08:00
a49311b48e [typo](doc) Fixed typos in DROP-CATALOG.md (#18909) 2023-04-22 08:39:42 +08:00
13894ae790 [fix](jdbc catalog) Use default value if the user does not set the pool parameter in be.conf #18919 2023-04-22 08:39:26 +08:00
a1c05b5c13 [fix](compaction) fix potential null pointer dereference (#18915) 2023-04-22 08:38:32 +08:00
b75f4c97f3 [function](string) support char function (#18878)
* [function](string) support char function

* fix
2023-04-22 08:36:48 +08:00
de0e89d1b4 [feature](function) Modified cast as time to behave more like MySQL (#18565)
Because the underlying type of time was float64, select cast("19:22:18" as time) would result in a null value in the past.
Results in the following:
2023-04-22 06:11:59 +08:00
24ee391a7e [bugfix](memoryleak) inlist is memory leak if the type is int (#18883)
* [bugfix](memoryleak) inlist is memory leak if the type is int


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-04-22 00:34:10 +08:00
5db0b66bd9 update doc (#18871)
Co-authored-by: wudi <>
2023-04-21 23:04:27 +08:00
6eea3d9e2d [Test](multi-catalog) Fix test_hive_parquet regression test order issue. (#18879)
l_orderkey cannot guarantee unique order.
2023-04-21 22:59:34 +08:00
d56fed345e [chore](doc) fix mv doc typo and cold heat separation (#18892) 2023-04-21 22:30:56 +08:00
313fab0802 [fix](mtmv) fix mtmv thread interruption issue (#18884) 2023-04-21 22:27:13 +08:00
425101bf53 [fix](test)Move broker test to p2. Move test data to cos in Beijing region (#18893)
Fix broker load p2 test case error.
1. Move test data from cos Hong kong region to Beijing region.
2. Move broker load test to p2 group.
3. Fix error message mismatch error.
2023-04-21 22:15:52 +08:00
f7651d8dfb (fix)[olap] not support in_memory=true now (#18731)
* (fix)[olap] can not set in_memory=true now

---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-21 21:55:37 +08:00
0ae3a6df7e [bug](bdbje) Add retry for reSetupBdbEnvironment() restore.execute() (#18777)
* In reSetupBdbEnvironment() `restore.execute()` may throw NullPointerException,
  add retry for `restore.execute()`
2023-04-21 20:58:42 +08:00
317d9ee152 [feat](Nereids): Simplify Agg GroupBy (#18887) 2023-04-21 18:57:15 +08:00
af20b2c95e [Bug](topn opt) Fix be crash when enable topn opt with larger thresho… (#18858)
topn opt should be inited when update it
2023-04-21 17:45:00 +08:00
5706bef2b3 [feature](common) Add unexpected/result support (#18312)
* Add unexpected/result support

* Rename result.hpp -> result.h && Add NOLINT in expected.hpp

* Add NOLINT in result.h to avoid clang-tidy checker

* Rename result.h to expected.h

* Add Apache License for be/src/util/expected.hpp

* Disable clang-format in be util/expected.hpp
2023-04-21 17:07:20 +08:00
c72a46f3df [Improvement](bitmap-filter) enable bitmap runtime filter in fuzzy mode. (#17621) 2023-04-21 16:00:13 +08:00
c82964a294 [typo](doc) Fixed typos in lateral-view.md (#18842) 2023-04-21 15:59:04 +08:00