Commit Graph

5941 Commits

Author SHA1 Message Date
60fddd56e7 [feature-wip](unique-key-merge-on-write) opt lock and only save valid delete_bitmap (#11953)
1. use rlock in most logic instead of wrlock
2. filter stale rowset's delete bitmap in save meta
3. add a delete_bitmap lock to handle compaction and publish_txn confict

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-23 14:43:40 +08:00
30a13c8141 [Bug](error code) fix db access error code msg #11962
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-08-23 14:15:58 +08:00
bc28b7eb4f [fix](error-code) prompt error when MySQL client login password is incorrect #11973 2022-08-23 09:11:09 +08:00
05da3d947f [feature-wip](new-scan) add scanner scheduling framework (#11582)
There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including:

Runtime filter
Predicate pushdown
Scanner generation and scheduling
So I intend to unify the common logic of all ScanNodes.
Different data sources only need to implement different Scanners for data access.
So that the future optimization for scan can be applied to the scan of all data sources,
while also reducing the code duplication.

This PR mainly adds 4 new class:

VScanner
All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods.

VScanNode
The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling.

ScannerContext
ScannerContext is responsible for recording the execution status
of a group of Scanners corresponding to a ScanNode.
Including how many scanners are being scheduled, and maintaining
a producer-consumer blocks queue between scanners and scan nodes.

ScannerContext is also the scheduling unit of ScannerScheduler.
ScannerScheduler schedules a ScannerContext at a time,
and submits the Scanners to the scanner thread pool for data scanning.

ScannerScheduler
Unified responsible for all Scanner scheduling tasks

Test:
This work is still in progress and default is disabled.
I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data.
The QPS can reach about 9000.
I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready.
Co-authored-by: morningman <morningman@apache.org>
2022-08-23 08:45:18 +08:00
38c751e5eb [github](checks) change the requirement of github checks (#11978) 2022-08-23 00:01:05 +08:00
def6f5568e [feature](nereids): enable exploration job (#11867)
Enable the exploration job, and fix related problem.

correct the join reorder
2022-08-22 23:38:17 +08:00
caec862d91 [feature](Nereids)add type coercion rule for nereids (#11802)
- add an interface ExpectsInputTypes to Expression
- add an interface ImplicitCastInputTypes to Expression
- add a Expression rewrite rule for type coercion
- add a Check Analysis Rule to check whether Plan is Semantically correct

if Expression implements ImplicitCastInputTypes, type coercion rule will automatic rewrite its children that casting it to the most suitable type.
If Expression implements ExpectsInputTypes, Check Analysis will check its children's type whether accepted by expects input types.
2022-08-22 23:06:02 +08:00
b55195bd80 [FixAssist](compaction) add DCHECK in BlockReader::_unique_key_next_block to reason problem (#11951) 2022-08-22 22:33:31 +08:00
68e2b3db44 [regression](rollup) Modify test case (#11960) 2022-08-22 19:18:35 +08:00
c22d097b59 [improvement](compress) Support compress/decompress block with lz4 (#11955) 2022-08-22 17:35:43 +08:00
0c5b4ecc7c [fix](agg)repeat node shouldn't change slot's nullable property of agg node (#11859) 2022-08-22 16:28:45 +08:00
0b33824eef [fix][Vectorized] Fix nullptr deref in data sink (#11473)
brpc cache may return nullptr.
2022-08-22 11:44:55 +08:00
92cef580f3 [enhancement](memory) Reduce virtual memory used by PaddedPODArray (#11816) 2022-08-22 11:33:07 +08:00
26deebccb8 [improvement](config)Enable insert strict (#11866) 2022-08-22 11:32:17 +08:00
6d925054de [feature-wip](parquet-reader) decode parquet time & datetime & decimal (#11845)
1. Spark can set the timestamp precision by the following configuration:
spark.sql.parquet.outputTimestampType = INT96(NANOS), TIMESTAMP_MICROS, TIMESTAMP_MILLIS
DATETIME V1 only keeps the second precision, DATETIME V2 keeps the microsecond precision.
2. If using DECIMAL V2, the BE saves the value as decimal128, and keeps the precision of decimal as (precision=27, scale=9). DECIMAL V3 can maintain the right precision of decimal
2022-08-22 10:15:35 +08:00
dc8f64b3e3 [improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801) 2022-08-22 10:12:06 +08:00
915d8989c5 [feature](spark-load)Spark load supports string type data import (#11927) 2022-08-22 08:56:59 +08:00
b1fd701493 [fix](memtracker) Improve memory tracking accuracy for exec nodes (#11947) 2022-08-22 08:56:05 +08:00
83ea4ea984 [refractor](bitmap) bitmap serialize and deserialize refractor (#11921)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-22 08:52:20 +08:00
5eb5444476 [fix](memtracker) Remove useless memory exceed check #11939 2022-08-22 08:40:19 +08:00
19496ef9a0 [improve](nereids): remove FakeJoin.java (#11946) 2022-08-22 08:28:20 +08:00
adfef85c0c [improve](fe): use Pair.of to replace new Pair<>() (#11945) 2022-08-22 08:27:40 +08:00
Pxl
192cdd4d76 [Bug](cast) change binary predicate finally cast to varchar (#11796) 2022-08-21 10:13:47 +08:00
25b427d0c6 [Bugfix](inpredicate) fix in predicate in group by clause may cause NPE (#11886)
* [bug](inpredicate) fix in predicate in group by clause may cause NPE
2022-08-21 10:03:30 +08:00
161d134270 [bugfix](load) fix cancel load stmt cannot recognize key words in upper case (#11906) 2022-08-21 10:03:10 +08:00
c2efa9c3b5 [refactor](planner): refactor equals code in Catalog dir. (#11903) 2022-08-21 10:01:57 +08:00
d4749c2652 [extension](mysql-to-doris) add odbc conf and some fix (#11692) 2022-08-20 18:27:48 +08:00
982c5f06b5 [fix](build) Resolve the conflicts when building be with java-udf (#11938) 2022-08-20 18:24:32 +08:00
28dba65d74 Update basic-summary.md (#11889)
Update basic-summary
2022-08-20 11:40:50 +08:00
23c9de5f85 [README](fix) update pictures links in README.md (#11891) 2022-08-19 21:32:48 +08:00
5c8ea147b1 [Bugfix](FE) fix npe issue when exec 'show tablets' #11896 2022-08-19 21:31:58 +08:00
d83c11a032 [regression](datev2) add schema change cases for datev2/datetimev2 (#11924) 2022-08-19 21:29:24 +08:00
b4101d46f0 [fix](workflow) Fix the errors when using sh to run shell scripts (#11898) 2022-08-19 21:28:52 +08:00
ffe7af49c8 [fix](array-type) run 'show create table' return null (#11912)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-19 21:28:15 +08:00
6ca90afbfa [Bugfix](datetime) fix DateLiteral range check is no longer valid (#11917)
* [Bugfix](datetime) fix DateLiteral range check is no longer valid
2022-08-19 21:27:32 +08:00
be7a38e170 [refactor](planner): refactor and replace use NIO (#11645)
* [refactor](planner): refactor equals code in Catalog dir.
2022-08-19 21:26:39 +08:00
Pxl
64dc3b360f [Bug](function) fix dcheck fail on close vexpr ctx (#11908) 2022-08-19 19:11:10 +08:00
c82d7687b4 [Bug](shell) fix wrong condition expression in script (#11913)
* [Bug](shell) fix wrong condition expression in script
2022-08-19 19:10:11 +08:00
f66e42f848 [optimization](array-type) support the decimal/datetime as the nest type of array in print_value (#11784)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-19 17:59:09 +08:00
e63c83e8e1 [fix](script) Support starting BE without Java environment (#11910) 2022-08-19 17:58:40 +08:00
0728f08c65 [improvement](dependency) Use release mode to build cctz. Improve performance. 2022-08-19 17:56:52 +08:00
Pxl
d1e97be37e [doc](developer-guide) add vscode-lldb usage to developer guide (#11923)
* add codelldb usage

* update
2022-08-19 17:56:17 +08:00
67ad8a81eb [docs](update)udaf now supported (#11929) 2022-08-19 17:56:02 +08:00
1a8a889d56 [refactor](planner): improve enfocer job. (#11922)
- handle enforce distribution when meet sort.
- calculate stats in enforcer job
- refactor calculate stats.
2022-08-19 17:55:43 +08:00
788114c89c [docs](fix) updata pictures licks (#11890)
updata pictures licks
2022-08-19 16:35:50 +08:00
a98d808080 [Chore](benchmark) Fix benchmark scripts, cover case that $PASSWORD not empty (#11486)
Fix benchmark scripts, cover case that $PASSWORD not empty
2022-08-19 15:40:18 +08:00
1b0b5b5f09 [Enhancement](load) add hidden_columns in stream load param (#11625)
Stream load will ignore invisible columns if no http header columns
specified, but in some case user cannot get all columns if columns
changed frequently。
Add a hidden_columns header to support hidden columns import。User can
set hidden_columns such as __DORIS_DELETE_SIGN__ and add this column
in stream load data so we can delete this line.
For example:
curl -u root -v --location-trusted -H "hidden_columns: __DORIS_DELETE_SIGN__" -H
"format: json" -H "strip_outer_array: true" -H "jsonpaths: [\"$.id\",
\"$.name\",\"$.__DORIS_DELETE_SIGN__\"]" -T 1.json
http://{beip}:{be_port}/api/test/test1/_stream_load

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-19 14:57:11 +08:00
01bd7f224b [bugifx](compaction) fix filter_delete if schema has sequence column (#11909)
introduced in #11721. Use last column as delete sign, but if sequence column
exist, it's wrong.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-19 14:56:06 +08:00
1f9eec5462 [Regression](datev2) Add test cases for datev2/datetimev2 (#11831) 2022-08-19 10:57:55 +08:00
Pxl
089fe01aea [Feature](vectorized alter table) set vectorized alter table to default open (#11897) 2022-08-19 10:57:00 +08:00