Commit Graph

377 Commits

Author SHA1 Message Date
0393c9b3b9 [Optimize] Support send batch parallelism for olap table sink (#6397)
* Support send batch parallelism for olap table sink

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-30 11:03:09 +08:00
3f2fdd236f Add scan thread token (#6443) 2021-08-27 10:56:17 +08:00
fa290383dc [Doc] Modify README to add some statistical indicators (#6486)
1. Add license/total line/release badegs.
2. Add monthly active contributor and contributor growth graph
3. fix a pom.xml bug
4. Modify some routine load log on BE side
2021-08-25 09:36:26 +08:00
7e30b28f3a [Optimize] Speed up converting the data of other types to string in mysql_result_writer (#6384)
Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-24 22:30:58 +08:00
146060dfc0 [Bug]Fix result_writer may coredump (#6482)
fix result_writer may coredump, let BufferControlBlock owns the memory
2021-08-22 22:04:00 +08:00
fa382f8602 [Bug][MemLimit] Modify the memory limit of storage page cache (#6451)
This CL mainly changes:

1. the `storage_page_cache_limit` is based on config `mem_limit`

    the default is 20% of `mem_limit`. 

2. the `buffer_pool_limit` is based on config `mem_limit`

    the default is 20% of `mem_limit`. 

3. the `buffer_pool_clean_pages_limit` is based on config `buffer_pool_limit`

    the default is 50% of `buffer_pool_limit`

4. Fix some show bugs of lru cache hit ratio and usage ratio
5. Fix a create view bug that `notEvalNondeterministicFunction` should be reset after analyze.
2021-08-19 14:16:53 +08:00
0c5c3f7d87 Fixed the problem that there may be redundant retries when the query result export fails (#6436) 2021-08-18 09:06:02 +08:00
8738ce380b Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391) 2021-08-18 09:05:40 +08:00
2030c44dba [Log] Modify some log level on BE side (#6381) 2021-08-14 10:25:45 +08:00
9216735cfa [New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329)
1. FE vectorized plan code
2. Function register vec function
3. Diff function nullable type
4. New thirdparty code and new thrift struct
2021-08-11 14:54:06 +08:00
612684fb2e [DOC]Add a profile counter of local exchange send bytes (#6372)
Add a profile counter of local exchange send bytes: LocalBytesSent
2021-08-07 21:32:44 +08:00
d1007afe80 Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient (#6361)
* [Optimize] optimize the speed of converting integer to string

* Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-04 10:55:19 +08:00
2c208e932b [Bug][RoutineLoad] Avoid TOO_MANY_TASKS error (#6342)
Use `commitAsync` to commit offset to kafka, instead of using `commitSync`, which may block for a long time.
Also assign a group.id to routine load if user not specified "property.group.id" property, so that all consumer of
this job will use same group.id instead of a random id for each consume task.
2021-08-03 11:59:06 +08:00
cf1fcdd614 fix BE coredump in UserFunctionCache (#6331)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-07-30 09:24:30 +08:00
6597a338dc [Feature] Support config max length of zone map index (#6293) 2021-07-30 09:23:11 +08:00
5a7237062f remove palo_internal_service.proto and PInternalService from code base, because it is not used now (#6341) 2021-07-30 09:22:50 +08:00
776df2effc [BUG][stack-buffer-overflow] fix overflow while calculate hash code in ArrayType and fix some warning 2021-07-27 13:41:00 +08:00
02a00cdf35 [Bug] Fix the bug in from_date_format_str function (#6273) 2021-07-21 12:31:37 +08:00
7592f52d2e [Feature][Insert] Add transaction for the operation of insert #6244 (#6245)
## Proposed changes
Add transaction for the operation of insert. It will cost less time than non-transaction(it will cost 1/1000 time) when you want to insert a amount of rows.
### Syntax

```
BEGIN [ WITH LABEL label];
INSERT INTO table_name ...
[COMMIT | ROLLBACK];
```

### Example
commit a transaction:
```
begin;
insert into Tbl values(11, 22, 33);
commit;
```
rollback a transaction:
```
begin;
insert into Tbl values(11, 22, 33);
rollback;
```
commit a transaction with label:
```
begin with label test_label;
insert into Tbl values(11, 22, 33);
commit;
```

### Description
```
begin:  begin a transaction, the next insert will execute in the transaction until commit/rollback;
commit:  commit the transaction, the data in the transaction will be inserted into the table;
rollback:  abort the transaction, nothing will be inserted into the table;
```
### The main realization principle:
```
1. begin a transaction in the session. next sql is executed in the transaction;
2. insert sql will be parser and get the database name and table name, they will be used to select a be and create a pipe to accept data;
3. all inserted values will be sent to the be and write into the pipe;
4. a thread will get the data from the pipe, then write them to disk;
5. commit will complete this transaction and make these data visible;
6. rollback will abort this transaction
```

### Some restrictions on the use of update syntax.
1. Only ```insert``` can be called in a transaction.
2. If something error happened, ```commit``` will not succeed, it will ```rollback``` directly;
3. By default, if part of insert in the transaction is invalid, ```commit``` will only insert the other correct data into the table.
4. If you need ```commit``` return failed when any insert in the transaction is invalid, you need execute ```set enable_insert_strict = true``` before ```begin```.
2021-07-21 10:54:11 +08:00
a1a37c8cba [Feature] Support calc constant expr by BE (#6233)
At present, some constant expression calculations are implemented on the FE side,
but they are incomplete, and some expressions cannot be completely consistent with
the value calculated by BE (such as part of the time function)

Therefore, we provide a way to pass all the constants in SQL to BE for calculation,
and then begin to analyze and plan SQL. This method can also solve the problem that some
complex constant calculations issued by BI cannot be processed on the FE side.

Here through a session variable enable_fold_constant_by_be to control this function,
which is disabled by default.
2021-07-19 10:25:53 +08:00
fae3eff2e6 [Bug] Fix the bug of cast string to datetime return not null (#6228) 2021-07-17 10:55:08 +08:00
8de09cbd21 [Bug-fix] Decimal Divide, Mod Zero Result Should be NULL. (#6051) 2021-07-17 10:43:06 +08:00
bf5db6eefe [BUG][Timeout][QueryLeak] Fixed memory not released in time (#6221)
* Revert "[Optimize] Put _Tuple_ptrs into mempool when RowBatch is initialized (#6036)"

This reverts commit f254870aeb18752a786586ef5d7ccf952b97f895.

* [BUG][Timeout][QueryLeak] Fixed memory not released in time, Fix Core dump in bloomfilter
2021-07-16 12:32:10 +08:00
19cd42ccbd [BUG] avoid std::function copy in client cache (#6186)
* [BUG] avoid std::function copy in client cache

* Refactor ClientFactory Name
2021-07-16 09:20:28 +08:00
ed3ff470ce [ARRAY] Support array type load and select not include access by index (#5980)
This is part of the array type support and has not been fully completed. 
The following functions are implemented
1. fe array type support and implementation of array function, support array syntax analysis and planning
2. Support import array type data through insert into
3. Support select array type data
4. Only the array type is supported on the value lie of the duplicate table

this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979
2021-07-13 14:02:39 +08:00
290a844e04 [optimize] Optimize bloomfilter performance (#6180)
refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5%
import block bloom filter, for avx version obtained 40% performance improvement
before: bloomfilter size:default, about 2000W item cost about 1s400ms
after: bloomfilter size:524288, about 2000W item cost about 400ms
2021-07-10 10:12:12 +08:00
739c0268ff [refactor] Remove decimal v1 related code from code base (#6079)
remove ALL DECIMAL V1 type code , this is a part of #6073
2021-07-07 10:26:32 +08:00
149def9e42 [Feature] Support RuntimeFilter in Doris (BE Implement) (#6077)
1. support in/bloomfilter/minmax
2. support broadcast/shuffle/bucket shuffle/colocate join
3. opt memory use and cpu cache miss while build runtime filter
4. opt memory use in left semi join (works well on tpcds-95)
2021-07-04 20:59:05 +08:00
f254870aeb [Optimize] Put _Tuple_ptrs into mempool when RowBatch is initialized (#6036) 2021-06-30 09:27:53 +08:00
6651d3bf2a SIMD instruction speed up the storage layer (#6089)
* SIMD instruction speed up the storage layer

* 1. add DECHECK in power of 2 int32
2. change vector to array deduce the cost
2021-06-25 11:04:32 +08:00
1999a0c26b [optimization] open gcc strict-aliasing optimization (#6034)
* open gcc strict-aliasing optimization

* use -Werror=strick-alias
2021-06-18 11:39:24 +08:00
d57c2344e1 [MemTracker] Refactored the hierarchical structure of memtracker (#5956)
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.

OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
2021-06-16 09:44:24 +08:00
bde60280b8 [Optimize] use string_view instead of std::string in string function (#6010) 2021-06-16 09:40:13 +08:00
8b4721c941 [Bug] Fix kafka consumer reuse bug (#6007)
When judging whether consumer can be reused, it is necessary to judge whether the parameter content is equal.
2021-06-16 09:39:05 +08:00
d33a6d1b98 [Function] Support date function: yearweek(), week(), makedate(). (#6000) 2021-06-10 17:38:25 +08:00
e245aee33e [Feature] Select outfile support parquet format (#5938)
`Select outfile into` currently only supports to export data with CSV format.
This patch extends the feature to supports parquet format.

Usage:
LocaFile:
```
SELECT citycode FROM table1 INTO OUTFILE "file:///root/doris/" FORMAT AS PARQUET PROPERTIES 
("schema"="required,int32,siteid;", "parquet.compression"="snappy");
```

BrokerFile:
```
SELECT siteid FROM table1 INTO OUTFILE "hdfs://host/test_sql_prc_2019_02_19/" FORMAT AS PARQUET
PROPERTIES ( 
"broker.name" = "hdfs_broker",
"broker.hadoop.security.authentication" = "kerberos",
"broker.kerberos_principal" = "test",
"broker.kerberos_keytab_content" = "base64" ,
"schema"="required,int32,siteid;"
);
```

Field `schema` is required, which defines the schema of a parquet file.
Prefix `parquet.` is the parquet file properties, like compression, version, enable_dictionary.
2021-06-10 17:34:01 +08:00
63c99eb4cb [Cache][Enhancement] Assure sql cache only one version (#5793)
For PR #5792. This patch add a new param `cache type` to distinguish sql cache and partition cache.
When update sql cache,  we make assure one sql key only has one version cache.
2021-05-28 13:45:47 +08:00
f4ebac0210 [BUG] BE core when FE get_stream_load_record (#5913) 2021-05-27 22:06:26 +08:00
0f4a39f82d [LOG]Hiding stack info of memory exceed in the log (#5896)
If query is memory exceed, a detail info where memory exceed is required.
However it is not necessary to return the entire query stack to the end user.
The query stack only needs to be printed in the be log.
2021-05-27 22:04:17 +08:00
d6076af938 [BUG] fix BE coredump if result sink prepare failed (#5899) 2021-05-26 10:02:55 +08:00
1ec615c562 [BUG] Fixed some uninitialized variables (#5850)
Fixed some potential bugs caused by uninitialized variables
2021-05-25 10:34:35 +08:00
07ad038870 [Feature][RoutineLoad] Support for consuming kafka from the point of time (#5832)
Support when creating a kafka routine load, start consumption from a specified point in time instead of a specific offset.
eg:
```
FROM KAFKA
(
    "kafka_broker_list" = "broker1:9092,broker2:9092",
    "kafka_topic" = "my_topic",
    "property.kafka_default_offsets" = "2021-10-10 11:00:00"
);

or

FROM KAFKA
(
    "kafka_broker_list" = "broker1:9092,broker2:9092",
    "kafka_topic" = "my_topic",
    "kafka_partitions" = "0,1,2",
    "kafka_offsets" = "2021-10-10 11:00:00, 2021-10-10 11:00:00, 2021-10-10 12:00:00"
);
```

This PR also reconstructed the analysis method of properties when creating or altering
routine load jobs, and unified the analysis process in the `RoutineLoadDataSourceProperties` class.
2021-05-22 23:37:53 +08:00
1a81b9e160 [MemTracker] Some enchance of MemTracker (#5783)
1 Make some MemTracker have reasonable parent MemTracker not the root tracker
2 Make each MemTracker can be easily to trace.
3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.
2021-05-19 09:27:50 +08:00
17cd32ffee [BUG] Fixed uninitialized variables in compaction (#5828) 2021-05-18 12:13:58 +08:00
9d25bfe980 [Bug] Fix bug that database not found when replaying batch transaction remove log (#5815)
* [Bug] Fix bug that database not found when replaying batch transaction remove log

[GlobalTransactionMgr.replayBatchRemoveTransactions():353] replay batch remove transactions failed. db 0
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = databaseTransactionMgr[0] does not exist
        at org.apache.doris.transaction.GlobalTransactionMgr.getDatabaseTransactionMgr(GlobalTransactionMgr.java:84) ~[palo-fe.jar:3.4.0]
        at org.apache.doris.transaction.GlobalTransactionMgr.replayBatchRemoveTransactions(GlobalTransactionMgr.java:350) [palo-fe.jar:3.4.0]
        at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:601) [palo-fe.jar:3.4.0]
        at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2452) [palo-fe.jar:3.4.0]
        at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:101) [palo-fe.jar:3.4.0]
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:3.4.0]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]

The id of information_scheam database is 0, and it has no txn at all.
2021-05-17 11:50:46 +08:00
01a45e8691 add read buffer when use s3 reader (#5791) 2021-05-17 11:46:38 +08:00
0c83e43a67 [Optimize] Optimize profile lock conflict and view profile while query is executing (#5762)
1. Reduce lock conflicts in RuntimeProfile of be;
2. can view query profile when the query is executing;
3. reduce wait time for 'show proc /current_queries'.
2021-05-13 22:33:26 +08:00
b686205b97 [Optimize] Reduce lock conflicts in ThreadResourceMgr of be (#5772)
Removed some useless code that caused lock conflicts in ThreadResourceMgr of be.
2021-05-12 10:59:53 +08:00
d7d50f7ffa [Optimize] Speed up the bulk data load to ODBC table. (#5765)
1. Batch Insert
2. Use fmt to repalce stringstream
3. Add some profile of ODBC_TABLE_SINK
2021-05-12 10:58:52 +08:00
bd88309346 [Refactor] fix warning in gcc8+, fix warning from brpc, s2 (#5763)
Fix warning from brpc, S2
Fix -Warray-bounds
2021-05-12 10:38:46 +08:00