Commit Graph

4481 Commits

Author SHA1 Message Date
bdf66731fa [doc] Add Flink Connector Sink Bitmap Type Doc (#9316) 2022-05-01 17:53:10 +08:00
239b6374ca [doc] fix new doc v2 bug (#9309)
change sql-references-v2 to sql-references
2022-05-01 17:51:34 +08:00
17f38cdcb1 [doc] add error doc for action_table.dat (#9263) 2022-05-01 17:50:48 +08:00
e3d7134962 fix complication guide on ubuntu (#9337) 2022-05-01 17:42:13 +08:00
784681f106 [FE Code Style][step 0]add github action to check incremental code in pr (#9328)
1. add rules to checkstyle
2. add github action to check incremental code in pr
2022-05-01 17:30:29 +08:00
4bd5d4f163 [feature-wip](statistics) step3: schedule the statistics tasks and update relevant info (#8860)
This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370), it will not affect any existing code and users will not be able to create statistics job.

After receiving the statistics statement and dividing the collection task, here we will start implementing the scheduling statistics task and updating the job information. Mainly include the following:
- Create a thread pool to schedule a certain number of tasks, and the number of concurrency is related to the configuration `cbo_concurrency_statistics_task_num`.
- After the task is completed, update the information of of the statistics Job.
2022-05-01 11:34:08 +08:00
84c2e4de29 [fix][doc]Modify readme link (#9334) 2022-04-30 16:55:43 +08:00
7431288c52 [github-action] Update regression required check in .asf.yaml (#9333)
Change `P0 regression (Doris P0 regression)` to `P0 regression (Doris P0 Regression)`
2022-04-30 15:23:34 +08:00
489581777f [fix](ut) Fix MarkDownParserTest (#9332) 2022-04-30 13:02:11 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
201cd207f9 [Enhancement][Vectorized] Improve hash table build efficiency (#9250)
1. MAP_POPULATE is missing for mmap in Allocator, because macro OS_LINUX is not defined in allocator.h;
2. MAP_POPULATE has no effect for mremap as for mmap, zero-fill enlarged memory range explicitly to pre-fault the pages
2022-04-29 14:26:33 +08:00
ce7905e983 [fix](vectorized) Query get wrong result when ColumnDict concurrent predicate eval (#9270) 2022-04-29 11:45:04 +08:00
420cc2c3d8 [fix](help-doc) fix format of all sql-manual doc (#9306) 2022-04-29 11:42:02 +08:00
2c81624765 [Features]Add dbt doris adapter (#9299)
* Add dbt doris adapter

* Add licence header to each file

* Fix licence header
2022-04-29 11:40:29 +08:00
c132abd2bd (Refactor)[Statistics] Fix lock risks in Statistics Job (#9256)
* (Refactor)[Statistics] Fix lock risks in Statistics Job

1. Remove lock nesting between job and task
2. Solve the deadlock problem during job update
3. Avoid printing the log while holding the lock

* Add log
2022-04-29 10:46:24 +08:00
2fa19113ab [fix](profile) Short-circuit and del predicate filter rows are not counted on vectorized exec (#9268) 2022-04-29 10:45:48 +08:00
cbfb4a3115 [fix](materialized-view) fix bug that can not create mv for list partitioned table (#9281) 2022-04-29 10:45:09 +08:00
c077fafe76 [test](storage)duplicate model storage layer regression-test (#9285)
* duplicate model storage layer regression-test

* use insert values for batch

Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-04-29 10:42:09 +08:00
9ef09b8354 [feature](statistics) Statistics derivation.Step 1:ScanNode implement… (#8947)
* [feature](statistics) Statistics derivation.Step 1:ScanNode implementation

Co-authored-by: jianghaochen <jianghaochen@meituan.com>
2022-04-29 10:41:12 +08:00
93a41b2625 [refactor][routineload] Remove unused client object from routine load (#9223) 2022-04-29 10:40:07 +08:00
d330bc3806 [Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709) (#9280)
Implement vectorized stream load.
Added fe configuration option `enable_vectorized_load` to enable vectorized stream load.

    Co-authored-by: tengjp@outlook.com
    Co-authored-by: mrhhsg@gmail.com
    Co-authored-by: minghong.zhou@163.com
    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
2022-04-29 09:50:51 +08:00
74a482ca7f [fix] fix docs build bug (#9293)
After this PR #9272, the `docs/build_help_zip.sh` will run failed.
This PR fix this issue.
But the help module still has some parse problem, I will fix it in next PR.

This CL mainly changes:
1. fix `docs/build_help_zip.sh` error
2. remove `sql-reference-v2` to `sql-reference`
3. modify build extension github action to run `docs/build_help_zip.sh`
2022-04-28 22:19:04 +08:00
48222f1fb0 [fix](storage)bloom filter support ColumnDict (#9167)
bloom filter support ColumnDict(#9167)
2022-04-28 20:03:26 +08:00
267e8b67c2 [refactor][doc]The new version of the document is online (#9272)
replace the `docs/` with `new-docs/`
2022-04-28 15:22:34 +08:00
1378e7e05f (Refactor)[Planner] Remove merge node (#9251) 2022-04-28 15:05:35 +08:00
2c0bccef24 [feature-wip](global-dict) global dict thrift definition (#9243)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-04-28 14:42:41 +08:00
b6b6e17eb7 [chore] (workflow)add sonarcloud workflow to check code quality and security (#9252) 2022-04-28 11:09:56 +08:00
0b6758cacd [fix](checkpoint) fix checkpoint failure when reloading new image (#9262)
Introduced from #9011
2022-04-28 09:47:16 +08:00
5cbb4a2317 [Improvement](docs) Update EN doc (#9228) 2022-04-27 23:22:38 +08:00
2ec0b98787 [fix](routine-load) Fix bug that new coming routine load tasks are rejected all the time and report TOO_MANY_TASK error (#9164)
```
CREATE ROUTINE LOAD iaas.dws_nat ON dws_nat
WITH APPEND PROPERTIES (
"desired_concurrent_number"="2",
"max_batch_interval" = "20",
"max_batch_rows" = "400000",
"max_batch_size" = "314572800",
"format" = "json",
"max_error_number" = "0"
)
FROM KAFKA (
"kafka_broker_list" = "xxxx:xxxx",
"kafka_topic" = "nat_nsq",
"property.kafka_default_offsets" = "2022-04-19 13:20:00"
);
```

In the create statement example below, you can see
The user didn't specify the custom partitions.
So that 1. Fe will get all kafka partitions from server in routine load's scheduler.
The user set the default offset by datetime.
So that 2. Fe will get kafka offset by time from server in routine load's scheduler.

When 1 is success, meanwhile 2 is failed, the progress of this routine load may not contains any partitions and offsets.
Nevertheless, since newCurrentKafkaPartition which is get by kafka server may be always equal to currentKafkaPartitions, 
the wrong progress will never be updated.
2022-04-27 23:21:17 +08:00
5a7e46fe7b [fix](planner) fix non-equal out join is not supported (#9156) 2022-04-27 23:19:13 +08:00
26bc462e1c [feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query (#9145)
1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: 731730da85/docs/en/server.md (bthread-local)

2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
2022-04-27 20:34:02 +08:00
dfbeeccd47 [feature-wip](statistics) step2: schedule the statistics job and generate executable tasks (#8859)
This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370), it will not affect any existing code and users will not be able to create statistics job.

After receiving the statistics collection statement, it generates a job. Here it implements the division of statistics collection jobs according to the following statistics categories:
table:
- `row_count`: table row count are critical in estimating cardinality and memory usage of scan nodes.
- `data_size`: table size, not applicable to CBO, mainly used to monitor and manage table size.
column:
- `num_distinct_value`: used to determine the selectivity of an equivalent expression.
- `min`: The minimum value.
- `max`: The maximum value.
- `num_nulls`: number of nulls.
- `avg_col_len`: the average length of a column, in bytes, is used for memory and network IO evaluation.
- `max_col_len`: the Max length of the column, in bytes, is used for memory and network IO evaluation.

After the job is divided, statistics tasks will be obtained.
2022-04-27 11:05:43 +08:00
4f19fe81ec remove some unused code (#9240) 2022-04-27 11:04:16 +08:00
923c38398f [test] reset default port in regression test conf (#9246)
Co-authored-by: morningman <chenmingyu@baidu.com>
2022-04-27 11:02:32 +08:00
597115c305 [feature] add SHOW TABLET STORAGE FORMAT stmt (#9037)
use this stmt to show tablets storage format in be, if verbose is set,
    will show detail message of tablet storage format.
    e.g.
    ```
    MySQL [(none)]> admin show tablet storage format;
    +-----------+---------+---------+
    | BackendId | V1Count | V2Count |
    +-----------+---------+---------+
    | 10002     | 0       | 2867    |
    +-----------+---------+---------+
    1 row in set (0.003 sec)
    MySQL [test_query_qa]> admin show tablet storage format verbose;
    +-----------+----------+---------------+
    | BackendId | TabletId | StorageFormat |
    +-----------+----------+---------------+
    | 10002     | 39227    | V2            |
    | 10002     | 39221    | V2            |
    | 10002     | 39215    | V2            |
    | 10002     | 39199    | V2            |
    +-----------+----------+---------------+
    4 rows in set (0.034 sec)
    ```
    add storage format infomation to show full table statment.
    ```
    MySQL [test_query_qa]> show full tables;
    +-------------------------+------------+---------------+
    | Tables_in_test_query_qa | Table_type | StorageFormat |
    +-------------------------+------------+---------------+
    | bigtable                | BASE TABLE | V2            |
    | test_dup                | BASE TABLE | V2            |
    | test                    | BASE TABLE | V2            |
    | baseall                 | BASE TABLE | V2            |
    | test_string             | BASE TABLE | V2            |
    +-------------------------+------------+---------------+
    5 rows in set (0.002 sec)
    ```
2022-04-27 10:53:43 +08:00
c1ae1a0fa2 remove gensrc/proto/palo_internal_service.proto, this removed in #6341 and add back in #6329 by mistake (#9233) 2022-04-27 08:25:01 +08:00
b406684486 Modify incorrect comments in ShowExecutor (#9232)
Fixed some incorrect comments in ShowExecutor
2022-04-26 19:10:49 +08:00
7076ba40ed [infra] Adjust .asf.yaml spacing to make it parse properly 2022-04-26 10:46:58 +02:00
87fc46f84c update comments in run-be-ut.sh (#9092) 2022-04-26 12:48:35 +08:00
47a59c7fe6 [fix](OlapScanner)fix bitmap or hll's OOM when loading too many unqualified data (#9205) 2022-04-26 10:25:56 +08:00
a20cf1e03e [typo](annotation): fix typo in ldap.conf (#9200) 2022-04-26 10:25:07 +08:00
Pxl
951c2a90eb [fix](Lateral-View)(Vectorized) core dump on lateral-view with nullable column (#9191) 2022-04-26 10:24:11 +08:00
da4e7ec6c2 [refactor](doc)Cluster upgrade adds metadata backup (#9189) 2022-04-26 10:22:07 +08:00
Pxl
e772163b98 [fix](script) meet error on start_fe.sh(#9187)
start_fe.sh: line 174: [: -eq: unary operator expected
2022-04-26 10:21:03 +08:00
555cc0dfce [fix] fix sequence bug in non-vec mode (#9184) 2022-04-26 10:15:59 +08:00
7cfebd05fd [fix](hierarchical-storage) Fix bug that storage medium property change back to SSD (#9158)
1. fix bug described in #9159
2. fix a `fill_tuple` bug introduced from #9173
2022-04-26 10:15:19 +08:00
62b38d7a75 [fix](spark load) fix getHashValue of string type is always zero in spark load. (#9136)
Buffer flip is used incorrectly.
When the hash key is string type, the hash value is always zero.
The reason is that the buffer of string type is obtained by wrap, which is not needed to flip.
If we do so, the buffer limit for read will be zero.
2022-04-26 10:14:21 +08:00
88115ffcb3 [feature-wip](array-type) ArrayFileColumnIterator bug fix (#9114) 2022-04-26 09:35:46 +08:00
cdd1b6d6dd [fix](function) fix lag/lead function return invalid data (#9076) 2022-04-26 09:34:46 +08:00