Commit Graph

383 Commits

Author SHA1 Message Date
d1007afe80 Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient (#6361)
* [Optimize] optimize the speed of converting integer to string

* Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-04 10:55:19 +08:00
16bc5fa585 [Bug] fix violating C/C++ aliasing rules cause a error hash value in decimal value (#6348)
In RuntimeFilter BloomFilter, decimal column will got a wrong hash value because violating  aliasing rules
decimal12_t decimal = { 12, 12 };
murmurhash3(decimal) in bloom filter: 2167721464
expect: 4203026776
2021-08-03 12:00:03 +08:00
f26e3408b2 [Profile] Support show load profile for broker load job (#6214)
1.
Add new statement:
`SHOW LOAD PROFILE "xxx";`

2.
Improve the read performance of orc scanner
2021-07-27 13:37:34 +08:00
7592f52d2e [Feature][Insert] Add transaction for the operation of insert #6244 (#6245)
## Proposed changes
Add transaction for the operation of insert. It will cost less time than non-transaction(it will cost 1/1000 time) when you want to insert a amount of rows.
### Syntax

```
BEGIN [ WITH LABEL label];
INSERT INTO table_name ...
[COMMIT | ROLLBACK];
```

### Example
commit a transaction:
```
begin;
insert into Tbl values(11, 22, 33);
commit;
```
rollback a transaction:
```
begin;
insert into Tbl values(11, 22, 33);
rollback;
```
commit a transaction with label:
```
begin with label test_label;
insert into Tbl values(11, 22, 33);
commit;
```

### Description
```
begin:  begin a transaction, the next insert will execute in the transaction until commit/rollback;
commit:  commit the transaction, the data in the transaction will be inserted into the table;
rollback:  abort the transaction, nothing will be inserted into the table;
```
### The main realization principle:
```
1. begin a transaction in the session. next sql is executed in the transaction;
2. insert sql will be parser and get the database name and table name, they will be used to select a be and create a pipe to accept data;
3. all inserted values will be sent to the be and write into the pipe;
4. a thread will get the data from the pipe, then write them to disk;
5. commit will complete this transaction and make these data visible;
6. rollback will abort this transaction
```

### Some restrictions on the use of update syntax.
1. Only ```insert``` can be called in a transaction.
2. If something error happened, ```commit``` will not succeed, it will ```rollback``` directly;
3. By default, if part of insert in the transaction is invalid, ```commit``` will only insert the other correct data into the table.
4. If you need ```commit``` return failed when any insert in the transaction is invalid, you need execute ```set enable_insert_strict = true``` before ```begin```.
2021-07-21 10:54:11 +08:00
b53ff15ef2 [Config] set spark load and odbc table feature enable by default (#6212)
1. Also use BufferedReader to speed up orc reader
2021-07-18 22:15:13 +08:00
7e77b5ed7f [Optimize] Using custom conf dir to save log config of Spring (#6205)
The log4j-config.xml will be generated at startup of FE and also when modifying FE config.
But in some deploy environment such as k8s, the conf dir is not writable.

So change the dir of log4j-config.xml to Config.custom_conf_dir.

Also fix some small bugs:

1. Typo "less then" -> "less than"
2. Duplicated `exec_mem_limit` showed in SHOW ROUTINE LOAD
3. Allow MAXVALUE in single partition column table.
4. Add IP info for "intolerate index channel failure" msg.

Change-Id: Ib4e1182084219c41eae44d3a28110c0315fdbd7d

Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2021-07-15 11:13:51 +08:00
ed3ff470ce [ARRAY] Support array type load and select not include access by index (#5980)
This is part of the array type support and has not been fully completed. 
The following functions are implemented
1. fe array type support and implementation of array function, support array syntax analysis and planning
2. Support import array type data through insert into
3. Support select array type data
4. Only the array type is supported on the value lie of the duplicate table

this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979
2021-07-13 14:02:39 +08:00
dbfe8e4753 [enhancement] Optimize load CSV file memory allocate (#6174)
Optimize load CSV file memory allocate, avoid frequent allocation,
may reduce the load time by 40%-50% when large column numbers
2021-07-12 09:58:45 +08:00
290a844e04 [optimize] Optimize bloomfilter performance (#6180)
refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5%
import block bloom filter, for avx version obtained 40% performance improvement
before: bloomfilter size:default, about 2000W item cost about 1s400ms
after: bloomfilter size:524288, about 2000W item cost about 400ms
2021-07-10 10:12:12 +08:00
01bef4b40d [Load] Add "LOAD WITH HDFS" model, and make hdfs_reader support hdfs ha (#6161)
Support load data from HDFS by using `LOAD WITH HDFS` syntax and read data directly via libhdfs3
2021-07-10 10:11:52 +08:00
198ba78595 [Feature] Add update time to show table status (#6117)
Add update time to show table status

```
MySQL [test_query_qa]> show table status;
+----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------+----------+----------------+---------+
| Name     | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time          | Collation | Checksum | Create_options | Comment |
+----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------+----------+----------------+---------+
| bigtable | Doris  |    NULL | NULL       | NULL |           NULL |        NULL |            NULL |         NULL |      NULL |           NULL | 2021-06-29 17:09:28 | 2021-06-29 17:17:28 | 1970-01-01 07:59:59 | utf-8     |     NULL | NULL           | OLAP    |
| test     | Doris  |    NULL | NULL       | NULL |           NULL |        NULL |            NULL |         NULL |      NULL |           NULL | 2021-06-29 17:09:26 | 2021-06-29 17:17:28 | 1970-01-01 07:59:59 | utf-8     |     NULL | NULL           | OLAP    |
| baseall  | Doris  |    NULL | NULL       | NULL |           NULL |        NULL |            NULL |         NULL |      NULL |           NULL | 2021-06-29 17:09:26 | 2021-06-29 17:17:26 | 1970-01-01 07:59:59 | utf-8     |     NULL | NULL           | OLAP    |
+----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------+----------+----------------+---------+
3 rows in set (0.002 sec)
```
2021-07-07 10:27:14 +08:00
739c0268ff [refactor] Remove decimal v1 related code from code base (#6079)
remove ALL DECIMAL V1 type code , this is a part of #6073
2021-07-07 10:26:32 +08:00
149def9e42 [Feature] Support RuntimeFilter in Doris (BE Implement) (#6077)
1. support in/bloomfilter/minmax
2. support broadcast/shuffle/bucket shuffle/colocate join
3. opt memory use and cpu cache miss while build runtime filter
4. opt memory use in left semi join (works well on tpcds-95)
2021-07-04 20:59:05 +08:00
c8899ee5bd [Build][ARM] Fix some compilation problems on ARM64 (#6076)
1. Disable libhdfs3 on ARM, because it doesn't support ARM now.
2. Add compilation doc for ARM64
2021-06-23 09:38:16 +08:00
1999a0c26b [optimization] open gcc strict-aliasing optimization (#6034)
* open gcc strict-aliasing optimization

* use -Werror=strick-alias
2021-06-18 11:39:24 +08:00
9f52f4f9e5 fix stream load error msg missing (#6050)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-06-18 09:21:12 +08:00
d57c2344e1 [MemTracker] Refactored the hierarchical structure of memtracker (#5956)
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.

OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
2021-06-16 09:44:24 +08:00
bde60280b8 [Optimize] use string_view instead of std::string in string function (#6010) 2021-06-16 09:40:13 +08:00
e245aee33e [Feature] Select outfile support parquet format (#5938)
`Select outfile into` currently only supports to export data with CSV format.
This patch extends the feature to supports parquet format.

Usage:
LocaFile:
```
SELECT citycode FROM table1 INTO OUTFILE "file:///root/doris/" FORMAT AS PARQUET PROPERTIES 
("schema"="required,int32,siteid;", "parquet.compression"="snappy");
```

BrokerFile:
```
SELECT siteid FROM table1 INTO OUTFILE "hdfs://host/test_sql_prc_2019_02_19/" FORMAT AS PARQUET
PROPERTIES ( 
"broker.name" = "hdfs_broker",
"broker.hadoop.security.authentication" = "kerberos",
"broker.kerberos_principal" = "test",
"broker.kerberos_keytab_content" = "base64" ,
"schema"="required,int32,siteid;"
);
```

Field `schema` is required, which defines the schema of a parquet file.
Prefix `parquet.` is the parquet file properties, like compression, version, enable_dictionary.
2021-06-10 17:34:01 +08:00
d9c128b744 [BrokerLoad] Support read properties for broker load when read data (#5845)
* [BrokerLoad] support read properties for broker load when read data

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-06-09 14:59:55 +08:00
ba868c610f [Optimize] Optimize some tablet scheduling logic (#5926)
1. The partitions set by the admin repair command are prioritized
   to ensure that the tablets of these partitions can be repaired as soon as possible.

2. Add an FE metric "query_begin" to monitor the number of queries submitted to the Doris.
2021-05-30 23:08:59 +08:00
ba38973209 use virtual hosted-style request to access object store (#5894)
* use virtual hosted-style access request object store
2021-05-27 15:52:07 +08:00
1ec615c562 [BUG] Fixed some uninitialized variables (#5850)
Fixed some potential bugs caused by uninitialized variables
2021-05-25 10:34:35 +08:00
63662194ab [BUG] Fix Stream Load cost too much memory (#5875) 2021-05-25 10:34:10 +08:00
591d391bbc [Bug] Fix bug that the buffered reader may read at wrong position. (#5847)
The buffered reader's _cur_offset should be initialized as same as the inner file reader's,
to make sure that the reader will start to read at rignt position.
2021-05-22 23:38:10 +08:00
1a81b9e160 [MemTracker] Some enchance of MemTracker (#5783)
1 Make some MemTracker have reasonable parent MemTracker not the root tracker
2 Make each MemTracker can be easily to trace.
3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.
2021-05-19 09:27:50 +08:00
5748241dab [Bug-fix] When query cancel, transfer_thread does not continue to schedule scanner_thread (#5768)
The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule 
OlapScanNode::scanner_thread until all tasks are scheduled.

Although each task does not scan data and exits quickly, it still consumes a lot of resources.
(Guess)This may be the cause of the BUG (#5767) causing the I/O to be full.

So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for
the end of all scanner_threads, transfer_thread will also exit.
2021-05-19 09:26:58 +08:00
add8c4bb74 [Load] Support reading multi-line json objects for JsonScanner (#5774)
Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-05-18 15:44:45 +08:00
01a45e8691 add read buffer when use s3 reader (#5791) 2021-05-17 11:46:38 +08:00
d7d50f7ffa [Optimize] Speed up the bulk data load to ODBC table. (#5765)
1. Batch Insert
2. Use fmt to repalce stringstream
3. Add some profile of ODBC_TABLE_SINK
2021-05-12 10:58:52 +08:00
98e80aa65e [refactor] Replace boost::function with std::function (#5700)
Replace boost::function with std::function
2021-05-09 22:00:48 +08:00
efd51b47e5 [Bug] Fix some little bugs in FE (#5758)
1. Fix NPE in ReplicasProcNode when backend does not exist
2. Forbid the create table like statement to specify the view.
3. Check self ip when starting FE to see if it use the origin ip.
4. Modify the error msg of tablet sink to show more detail errors.
2021-05-08 10:56:10 +08:00
6ad1bf7d7e [Bug] Fix dead lock in olap scan node and refactor some code in FE profile (#5713)
* [Bug] Fix dead lock in olap scan node and refactor some code in FE profile

* Add some comment
2021-04-30 10:12:18 +08:00
de87f4ae84 [Feature] Add list partition support (#5529)
Add list partition support
2021-04-24 17:42:27 +08:00
29a3fa1084 [Feature] Support read data with format of parquet from hdfs, using libhdfs3 (#5686)
Add new lib, Backend can read data from hdfs without broker,
this patch include libhdfs3.a which can read file on hdfs.
This patch will make reading the data from hdfs with parquet possible.
By this, we will support more format of file on hdfs in the future,
and we will support other metadata in the future.
2021-04-24 17:41:48 +08:00
ec29322c10 [Bug] Avoid waiting too long when rpc is slow. (#5669)
Total execution time should not longer than stream load timeout.
2021-04-23 09:46:40 +08:00
a803ceea86 [refactor] Remove boost mutex, use std::mutex instead (#5684)
* Remove boost mutex, use std::mutex instead

* replace shared_mutex
2021-04-22 11:29:36 +08:00
c4cc681d14 remove boost_foreach, using c++ foreach instead (#5611) 2021-04-15 10:52:29 +08:00
50ffae44b1 [BUG] Fix bug that Unique/AGG key will read all key columns when there are two rowsets (#5632) 2021-04-14 00:12:05 +08:00
75db273b93 [Doris On ES][WIP] Support external ES table with SSL secured and configurable node sniffing (#5325)
Support external ES  table with `SSL` secured and configurable node sniffing
2021-04-12 11:23:49 +08:00
9c7d8d2e98 [Bug] Fix bug that isPreAggregation is incorrectly set (#5608)
1. The MaterializedViewSelector should be reset for each scan node
2. On the BE side, columns with delete conditions must be added to the return column.
2021-04-09 14:13:06 +08:00
40f53ac71f fix bitmap unit test failed (#5610) 2021-04-08 10:25:59 +08:00
ad67dd34a0 update gcc to gcc 10 and support c++17 (#5394)
* update gcc to gcc 10 and support c++17
    update brpc to 0.9.7
    update boost to 1.73
    remove third-party boost 1.54 for mysql

* update cmake version

* ignore jdk version

* remove unused patch

* avoid use SYS_getrandom call
2021-03-25 09:30:38 +08:00
cef3cbc53a [Bug] Fix bug that the last column may be null when using multibytes separator (#5534) 2021-03-23 09:35:30 +08:00
a91888a68b [BUG] fix memory limit failure and optimize memory usage in join stage (#5514)
This patch works well on tpcds-1T query-24
2021-03-21 11:32:51 +08:00
8343abaad1 [Feature] Local Exechange (#5470)
Avoid network transmission when the data stream sender node and destination exchange node
are in same BE, to improve performance and save CPU.
2021-03-21 11:25:33 +08:00
19b3a950de [ODBC] change SQL_DRIVER_COMPLETE_REQUIRED to SQL_DRIVER_NOPROMPT make mysql connect err clear (#5538) 2021-03-21 11:20:25 +08:00
a1bce25677 [BUG] Fix Memory Leak in SchemaChange And Fix some DCHECK error (#5491) 2021-03-17 09:27:05 +08:00
1100a0f3a0 [Profile] Add more timer for scan thread (#5511)
1.
Add timer to count the time the transfer thread waits for the scaner thread to return rowbatch.
2.
Add timer to count the time that the scanner thread waits for the available worker threads in the thread pool.

Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2021-03-15 10:07:11 +08:00
689602e686 [Enhancement] Support Pallralel Merge In Exchange Node (#5468)
Support Parallel Merge In Exchange Node
2021-03-11 22:34:18 +08:00