Commit Graph

1548 Commits

Author SHA1 Message Date
380395a61f [doc](routineload)Common mistakes in adding routine load #13975 2022-11-05 19:17:33 +08:00
087488db3b [typo](doc) fixed spelling errors (#13974) 2022-11-05 15:40:55 +08:00
554f566217 [enhancement](compaction) introduce segment compaction (#12609) (#12866)
## Design

### Trigger

Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare.

### Target Selection

We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction.

### Compaction Process

A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings:

- build a MergeIterator from the target segments
- create a new segment writer
- for each block readed from MergeIterator, the Writer append it

### SegID handling

SegID must remain consecutive after segment compaction. 

If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4:

- we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3
- delete seg_0, seg_1, seg_2 and seg_3
- rename seg_0-3 to seg_0
- rename seg_4 to seg_1

It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.
2022-11-04 14:12:51 +08:00
1b36843664 [doc](jsonb type)add documents for JSONB datatype (#13792) 2022-11-03 19:33:51 +08:00
6ff306b1ea [docs](round) complement round function documentation (#13838) 2022-11-03 14:30:49 +08:00
5fe3342aa3 [Vectorized](function) support bitmap_to_array function (#13926) 2022-11-03 14:29:28 +08:00
636bdffe62 [fix](doc) fix 404 link (#13908) 2022-11-03 08:46:47 +08:00
b83744d2f6 [feature](function)add regexp functions: regexp_replace_one, regexp_extract_all (#13766) 2022-11-02 23:15:57 +08:00
374303186c [Vectorized](function) support topn_array function (#13869) 2022-11-02 19:49:23 +08:00
d5becdb4a1 [fix](dynamic-partition) fix wrong check of replication num (#13755) 2022-11-02 12:55:33 +08:00
bd6070d9b3 [doc](spark-doris-connetor)Add spark Doris connector to support streamload documentation #13834 2022-11-02 08:43:52 +08:00
wxy
3fc1b27c40 [docs](tablet-docs) fix the tablet-repair-and-balance.md doucument. (#13853)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-11-02 08:43:08 +08:00
8b3afd431e [improvement](memory) simplify memory config related to tcmalloc (#13781)
There are several configs related to tcmalloc, users do know how to config them. Actually users just want two modes, performance or compact, in performance mode, users want doris run query and load quickly while in compact mode, users want doris run with less memory usage.

If we want to config tcmalloc individually, we can use env variables which are supported by tcmalloc.
2022-11-01 21:45:19 +08:00
61c817f4cc [feature](syntax) support SELECT * EXCEPT (#13844)
* [feature](syntax) support SELECT * EXCEPT: add regression test
2022-11-01 19:41:25 +08:00
942611c185 Revert "[enhancement](compaction) opt compaction task producer and quick compaction (#13495)" (#13833)
This reverts commit 4f2ea0776ca3fe5315ab5ef7e00eefabfb5771a0.
2022-11-01 14:22:12 +08:00
4f2ea0776c [enhancement](compaction) opt compaction task producer and quick compaction (#13495)
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-31 12:24:05 +08:00
Pxl
2fab0c45c7 [Feature](runtime-filter) add runtime filter breaking change adapt (#13246)
add runtime filter breaking change adapt
2022-10-28 10:59:28 +08:00
5805011629 [Feature](string-function) Add function mask/mask_first_n/mask_last_n (#13694)
Implementation of mask function from hive.
2022-10-28 10:43:56 +08:00
c108554f14 [function](date function) add new date function 'to_monday' #13707 2022-10-28 08:41:16 +08:00
5dd052d386 [Function](array) support array_range function (#13547)
* array_range with 3 impl

* [Function](array) support array_range function

* update

* update code
2022-10-28 08:40:24 +08:00
43c6428aea [Function](string) support sub_replace function (#13736)
* [Function](string) support sub_replace function

* remove conf
2022-10-28 08:40:08 +08:00
45b31506c7 [improvement](delete) support delete from partitioned table without partition specified (#13533)
Support delete from partitioned table without partition specified in [DELETE] stmt.

## Usage
If it is a partitioned table, you can specify a partition.
If not specified, Doris will infer partition from the given conditions.
In two cases, Doris cannot infer the partition from conditions:
1) the conditions do not contain partition columns;
2) The operator of the partition column is `not in`.
When a partition table does not specify the partition,
or the partition cannot be inferred from the conditions,
the session variable `delete_without_partition` needs to be `true`
to make delete statement be applied to all partitions.

## Test case
Test case is added in `regression-test/suites/delete_p0/test_delete_from_partition.groovy`,
user can delete from partitioned table without partition specified now.
2022-10-27 21:32:45 +08:00
578d956a6b [typo](doc):Correct spelling mistakes UDAF. (#13711) 2022-10-27 21:21:29 +08:00
2697f72d77 [Improvement][SET-PROPERTY] Support for set query_timeout property (#13444) 2022-10-27 10:03:39 +08:00
3e8cd0c669 [typo](doc) Add the description of json HDFS broker load (#13683)
Add the instruction of HDFS broker load with json format file.
2022-10-27 09:36:57 +08:00
d2262bc8fb [docs]fix 404 (#13695)
[docs]fix 404
2022-10-27 08:49:36 +08:00
c5559877b4 [typo](docs)fix docs 404 link (#13677) 2022-10-26 14:56:47 +08:00
c709998faa [improvement][refactor](mysql) remove old mysql server and add keep alive option (#13663)
* [improvement][refactor](mysql) remove old mysql server and add keep alive option
2022-10-26 09:38:33 +08:00
9691db7918 [Enhancement](metrics) add more metrics (#11693)
* Add `AutoMappedMetric` to measure dynamic object.
* Add query instance and rpc metrics
* Add thrift rpc metrics
* Add txn metrics
* Reorganize metrics init routine.

Co-authored-by: 迟成 <chicheng@meituan.com>
2022-10-26 08:31:03 +08:00
17ba40f947 [feature-wip](CN Node)Support compute node (#13231)
Introduce the node role to doris, and the table creation and tablet scheduler will control the storage only assign to the BE nodes.
2022-10-25 21:44:33 +08:00
235c105554 [feature-array](array-type) Add array function array_enumerate (#13612)
Add array function array_enumerate
2022-10-25 15:12:11 +08:00
f802fc37ff add date function 'last_day' (#13609) 2022-10-25 13:46:16 +08:00
87864e40bf [doc](random_sink) Add some doc content about random sink (#13577)
1. Add some doc content about random sink
2. Fix bug of showing missing rowsets info
2022-10-23 22:51:56 +08:00
6ef891870f improve the outfile doc (#13569) 2022-10-22 23:21:12 +08:00
413d2332ce [improvement](heartbeat) Add some relaxation strategies to reduce the failure probability of regression testing (#13568)
The regression test may failed because of heartbeat failure occasionally.
So I add 2 new FE config to relax this limit

1. `disable_backend_black_list`
    Set to true to not put Backend to black list even if we failed to send task to it. Default is false.
2. `max_backend_heartbeat_failure_tolerance_count`
   Only if the failure time of heartbeat exceed this config, we can set Backend as dead. Default is 1.
2022-10-22 17:53:07 +08:00
847b80ebfa [test](jdbc) add jdbc and hive regression test (#13143)
1. Modify default behavior of `build.sh`
    The `BUILD_JAVA_UDF` is default ON, so that jvm is needed for compilation and runtime.

2. Add docker-compose for MySQL 5.7, PostgreSQL 14 and Hive 2
   See `docker/thirdparties/docker-compose`.

3. Add some regression test cases for jdbc query on MySQL, PG and Hive Catalog
   The default is `false`, if set to true, you need first start docker for MySQL/PG/Hive.

4. Support `if not exists` and `if exists` for create/drop resource and create/drop encryptkey
2022-10-21 15:29:27 +08:00
3ca8bfaf30 [Function](array) support array_difference function (#13440) 2022-10-21 10:57:37 +08:00
27d84eafc5 [feature](alter) support rename column for table with unique column id (#13410) 2022-10-21 08:45:34 +08:00
e62d3dd8e5 [opt](function) refactor extract_url to use StringValue (#13508)
change extract_url use stringvalue to repalce std::string to speed up
2022-10-21 08:33:39 +08:00
32b1456b28 [feature-wip](array) remove array config and check array nested depth (#13428)
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null. 
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
2022-10-20 15:52:31 +08:00
2b328eafbb [function](string_function) add new string function 'extract_url_parameter' (#13323) 2022-10-20 11:11:43 +08:00
bc08854a35 [doc](storage policy) add cold and hot separation docs (#13096) 2022-10-20 08:56:53 +08:00
697fa5f586 [Enhancement](profile) support configure the number of query profile (#13421)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-10-20 08:51:36 +08:00
eeb2b0acdb [doc][fix](multi-catalog) Add multi-catalog es doc (#13429)
1. Add multicatalog es doc
2. Modify es unsigned_long mapping to largeint.
3. getHost add pre judge logic.
2022-10-19 16:00:13 +08:00
29b4d8dcad [typo](docs) fix some problem #13462 2022-10-19 15:42:17 +08:00
8a068c8c92 [function](string_function) add new string function 'not_null_or_empty' (#13418) 2022-10-19 11:10:37 +08:00
a8fd76fe32 [Fix](docs) fix error description of LDAP_ADMIN_PASSWORD in the document (#13405)
co-author:@luozenglin
2022-10-18 18:53:10 +08:00
6d322f85ac [improvement](compaction) delete num based compaction policy (#13409)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-18 16:13:28 +08:00
dbf71ed3be [feature-wip](new-scan) Support stream load with csv in new scan framework (#13354)
1. Refactor the file reader creation in FileFactory, for simplicity.
    Previously, FileFactory had too many `create_file_reader` interfaces.
    Now unified into two categories: the interface used by the previous BrokerScanNode,
    and the interface used by the new FileScanNode.
    And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files.

2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode

3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside.

4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.
2022-10-17 23:33:41 +08:00
207f4e559e [feature](agg) support group_bitmap_xor agg function. (#13287)
support `group_bitmap_xor` agg function
2022-10-17 18:40:06 +08:00