Commit Graph

373 Commits

Author SHA1 Message Date
198ba78595 [Feature] Add update time to show table status (#6117)
Add update time to show table status

```
MySQL [test_query_qa]> show table status;
+----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------+----------+----------------+---------+
| Name     | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time          | Collation | Checksum | Create_options | Comment |
+----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------+----------+----------------+---------+
| bigtable | Doris  |    NULL | NULL       | NULL |           NULL |        NULL |            NULL |         NULL |      NULL |           NULL | 2021-06-29 17:09:28 | 2021-06-29 17:17:28 | 1970-01-01 07:59:59 | utf-8     |     NULL | NULL           | OLAP    |
| test     | Doris  |    NULL | NULL       | NULL |           NULL |        NULL |            NULL |         NULL |      NULL |           NULL | 2021-06-29 17:09:26 | 2021-06-29 17:17:28 | 1970-01-01 07:59:59 | utf-8     |     NULL | NULL           | OLAP    |
| baseall  | Doris  |    NULL | NULL       | NULL |           NULL |        NULL |            NULL |         NULL |      NULL |           NULL | 2021-06-29 17:09:26 | 2021-06-29 17:17:26 | 1970-01-01 07:59:59 | utf-8     |     NULL | NULL           | OLAP    |
+----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------+----------+----------------+---------+
3 rows in set (0.002 sec)
```
2021-07-07 10:27:14 +08:00
739c0268ff [refactor] Remove decimal v1 related code from code base (#6079)
remove ALL DECIMAL V1 type code , this is a part of #6073
2021-07-07 10:26:32 +08:00
149def9e42 [Feature] Support RuntimeFilter in Doris (BE Implement) (#6077)
1. support in/bloomfilter/minmax
2. support broadcast/shuffle/bucket shuffle/colocate join
3. opt memory use and cpu cache miss while build runtime filter
4. opt memory use in left semi join (works well on tpcds-95)
2021-07-04 20:59:05 +08:00
c8899ee5bd [Build][ARM] Fix some compilation problems on ARM64 (#6076)
1. Disable libhdfs3 on ARM, because it doesn't support ARM now.
2. Add compilation doc for ARM64
2021-06-23 09:38:16 +08:00
1999a0c26b [optimization] open gcc strict-aliasing optimization (#6034)
* open gcc strict-aliasing optimization

* use -Werror=strick-alias
2021-06-18 11:39:24 +08:00
9f52f4f9e5 fix stream load error msg missing (#6050)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-06-18 09:21:12 +08:00
d57c2344e1 [MemTracker] Refactored the hierarchical structure of memtracker (#5956)
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.

OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
2021-06-16 09:44:24 +08:00
bde60280b8 [Optimize] use string_view instead of std::string in string function (#6010) 2021-06-16 09:40:13 +08:00
e245aee33e [Feature] Select outfile support parquet format (#5938)
`Select outfile into` currently only supports to export data with CSV format.
This patch extends the feature to supports parquet format.

Usage:
LocaFile:
```
SELECT citycode FROM table1 INTO OUTFILE "file:///root/doris/" FORMAT AS PARQUET PROPERTIES 
("schema"="required,int32,siteid;", "parquet.compression"="snappy");
```

BrokerFile:
```
SELECT siteid FROM table1 INTO OUTFILE "hdfs://host/test_sql_prc_2019_02_19/" FORMAT AS PARQUET
PROPERTIES ( 
"broker.name" = "hdfs_broker",
"broker.hadoop.security.authentication" = "kerberos",
"broker.kerberos_principal" = "test",
"broker.kerberos_keytab_content" = "base64" ,
"schema"="required,int32,siteid;"
);
```

Field `schema` is required, which defines the schema of a parquet file.
Prefix `parquet.` is the parquet file properties, like compression, version, enable_dictionary.
2021-06-10 17:34:01 +08:00
d9c128b744 [BrokerLoad] Support read properties for broker load when read data (#5845)
* [BrokerLoad] support read properties for broker load when read data

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-06-09 14:59:55 +08:00
ba868c610f [Optimize] Optimize some tablet scheduling logic (#5926)
1. The partitions set by the admin repair command are prioritized
   to ensure that the tablets of these partitions can be repaired as soon as possible.

2. Add an FE metric "query_begin" to monitor the number of queries submitted to the Doris.
2021-05-30 23:08:59 +08:00
ba38973209 use virtual hosted-style request to access object store (#5894)
* use virtual hosted-style access request object store
2021-05-27 15:52:07 +08:00
1ec615c562 [BUG] Fixed some uninitialized variables (#5850)
Fixed some potential bugs caused by uninitialized variables
2021-05-25 10:34:35 +08:00
63662194ab [BUG] Fix Stream Load cost too much memory (#5875) 2021-05-25 10:34:10 +08:00
591d391bbc [Bug] Fix bug that the buffered reader may read at wrong position. (#5847)
The buffered reader's _cur_offset should be initialized as same as the inner file reader's,
to make sure that the reader will start to read at rignt position.
2021-05-22 23:38:10 +08:00
1a81b9e160 [MemTracker] Some enchance of MemTracker (#5783)
1 Make some MemTracker have reasonable parent MemTracker not the root tracker
2 Make each MemTracker can be easily to trace.
3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.
2021-05-19 09:27:50 +08:00
5748241dab [Bug-fix] When query cancel, transfer_thread does not continue to schedule scanner_thread (#5768)
The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule 
OlapScanNode::scanner_thread until all tasks are scheduled.

Although each task does not scan data and exits quickly, it still consumes a lot of resources.
(Guess)This may be the cause of the BUG (#5767) causing the I/O to be full.

So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for
the end of all scanner_threads, transfer_thread will also exit.
2021-05-19 09:26:58 +08:00
add8c4bb74 [Load] Support reading multi-line json objects for JsonScanner (#5774)
Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-05-18 15:44:45 +08:00
01a45e8691 add read buffer when use s3 reader (#5791) 2021-05-17 11:46:38 +08:00
d7d50f7ffa [Optimize] Speed up the bulk data load to ODBC table. (#5765)
1. Batch Insert
2. Use fmt to repalce stringstream
3. Add some profile of ODBC_TABLE_SINK
2021-05-12 10:58:52 +08:00
98e80aa65e [refactor] Replace boost::function with std::function (#5700)
Replace boost::function with std::function
2021-05-09 22:00:48 +08:00
efd51b47e5 [Bug] Fix some little bugs in FE (#5758)
1. Fix NPE in ReplicasProcNode when backend does not exist
2. Forbid the create table like statement to specify the view.
3. Check self ip when starting FE to see if it use the origin ip.
4. Modify the error msg of tablet sink to show more detail errors.
2021-05-08 10:56:10 +08:00
6ad1bf7d7e [Bug] Fix dead lock in olap scan node and refactor some code in FE profile (#5713)
* [Bug] Fix dead lock in olap scan node and refactor some code in FE profile

* Add some comment
2021-04-30 10:12:18 +08:00
de87f4ae84 [Feature] Add list partition support (#5529)
Add list partition support
2021-04-24 17:42:27 +08:00
29a3fa1084 [Feature] Support read data with format of parquet from hdfs, using libhdfs3 (#5686)
Add new lib, Backend can read data from hdfs without broker,
this patch include libhdfs3.a which can read file on hdfs.
This patch will make reading the data from hdfs with parquet possible.
By this, we will support more format of file on hdfs in the future,
and we will support other metadata in the future.
2021-04-24 17:41:48 +08:00
ec29322c10 [Bug] Avoid waiting too long when rpc is slow. (#5669)
Total execution time should not longer than stream load timeout.
2021-04-23 09:46:40 +08:00
a803ceea86 [refactor] Remove boost mutex, use std::mutex instead (#5684)
* Remove boost mutex, use std::mutex instead

* replace shared_mutex
2021-04-22 11:29:36 +08:00
c4cc681d14 remove boost_foreach, using c++ foreach instead (#5611) 2021-04-15 10:52:29 +08:00
50ffae44b1 [BUG] Fix bug that Unique/AGG key will read all key columns when there are two rowsets (#5632) 2021-04-14 00:12:05 +08:00
75db273b93 [Doris On ES][WIP] Support external ES table with SSL secured and configurable node sniffing (#5325)
Support external ES  table with `SSL` secured and configurable node sniffing
2021-04-12 11:23:49 +08:00
9c7d8d2e98 [Bug] Fix bug that isPreAggregation is incorrectly set (#5608)
1. The MaterializedViewSelector should be reset for each scan node
2. On the BE side, columns with delete conditions must be added to the return column.
2021-04-09 14:13:06 +08:00
40f53ac71f fix bitmap unit test failed (#5610) 2021-04-08 10:25:59 +08:00
ad67dd34a0 update gcc to gcc 10 and support c++17 (#5394)
* update gcc to gcc 10 and support c++17
    update brpc to 0.9.7
    update boost to 1.73
    remove third-party boost 1.54 for mysql

* update cmake version

* ignore jdk version

* remove unused patch

* avoid use SYS_getrandom call
2021-03-25 09:30:38 +08:00
cef3cbc53a [Bug] Fix bug that the last column may be null when using multibytes separator (#5534) 2021-03-23 09:35:30 +08:00
a91888a68b [BUG] fix memory limit failure and optimize memory usage in join stage (#5514)
This patch works well on tpcds-1T query-24
2021-03-21 11:32:51 +08:00
8343abaad1 [Feature] Local Exechange (#5470)
Avoid network transmission when the data stream sender node and destination exchange node
are in same BE, to improve performance and save CPU.
2021-03-21 11:25:33 +08:00
19b3a950de [ODBC] change SQL_DRIVER_COMPLETE_REQUIRED to SQL_DRIVER_NOPROMPT make mysql connect err clear (#5538) 2021-03-21 11:20:25 +08:00
a1bce25677 [BUG] Fix Memory Leak in SchemaChange And Fix some DCHECK error (#5491) 2021-03-17 09:27:05 +08:00
1100a0f3a0 [Profile] Add more timer for scan thread (#5511)
1.
Add timer to count the time the transfer thread waits for the scaner thread to return rowbatch.
2.
Add timer to count the time that the scanner thread waits for the available worker threads in the thread pool.

Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2021-03-15 10:07:11 +08:00
689602e686 [Enhancement] Support Pallralel Merge In Exchange Node (#5468)
Support Parallel Merge In Exchange Node
2021-03-11 22:34:18 +08:00
0131c33966 [Enhance] Improve the readability of memtrackers' name (#5455)
Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker
2021-03-11 22:33:31 +08:00
7a8fbe5db8 [internal] [doris-1084] support compressed csv file in stream load (#5463) 2021-03-11 10:53:05 +08:00
e023ef5404 [Load] Support multi bytes LineDelimiter and ColumnSeparator (#5462)
* [Internal][Support Multibytes Separator] doris-1079
support multi bytes LineDelimiter and ColumnSeparator
2021-03-09 09:35:39 +08:00
805f98e0f9 [Bug] Set dest tuple to null when src_tuple is NULL. (#5431) 2021-03-04 22:26:05 +08:00
4e1b6b3eef [ODBC] Let the type conversion of the fail in query in ODBC of MySQL table to prompt the information of the column (#5422)
Let the type conversion of the fail in query in ODBC of MySQL table to prompt the information of the column
2021-03-04 22:23:37 +08:00
6dcc1b0a55 [Doris on ES] Fix query failed when ES field value is null (#5363)
* Update fe-idea-dev.md

use `brew install thrift@0.9` to install thrift 0.9.3.1
`brew edit thrift090 | head` shows thrift@0.9 uses thrift 0.9.3.1

* [Refactor] Remove the unnecessary if statement

Future<?> submit(Runnable task)
Submits a Runnable task for execution and returns a Future representing that task. The Future's get method will return null upon successful completion.

* Fix null type

* add comment

Co-authored-by: tanhao <tanhao.0902@bytedance.com>
2021-02-23 10:42:25 +08:00
6ede4c6ec1 [Feature] Support backup,restore,load,export directly connect to s3 (#5399)
* [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol

* Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3
1. Support load and export data from/to s3 directly.
2. Add a config to auto convert broker access to s3 acces when available

Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7

(cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201)

* [Internal][S3DirectAccess] File path glob compatible with broker

Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68
(cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f)

* [internal] [doris-1008] fix log4j class not found

Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3
(cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232)

* add poms

Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>
2021-02-22 16:07:56 +08:00
7eae3e280a [optimization] use inline optimize ExprContext::get_value (#5385) 2021-02-16 22:35:14 +08:00
51ccd44865 [Load Parallel][3/3] Support parallel delta writer (#5369)
In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel,
and because of the lock granularity problem, LoadChannel could only process these requests serially,
which made it impossible to make full use of cluster resources.

This CL modifies the related locks so that LoadChannel can process these requests in parallel.

In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been
increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min.

Also modify the profile of load job.
2021-02-07 22:42:18 +08:00
462efeaf39 [Performance Optimization and Refactor] (#5358) (#5364)
1. Add BlockColumnPredicate support OR and AND column predicate in RowBlockV2
2. Support evaluate vectorization delete predicate in storage engine not in Reader in SegmentV2
2021-02-07 22:41:33 +08:00