doris

Author	SHA1	Message	Date
caiconghui	d1007afe80	Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient (#6361 ) * [Optimize] optimize the speed of converting integer to string * Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-04 10:55:19 +08:00
stdpain	16bc5fa585	[Bug] fix violating C/C++ aliasing rules cause a error hash value in decimal value (#6348 ) In RuntimeFilter BloomFilter, decimal column will got a wrong hash value because violating aliasing rules decimal12_t decimal = { 12, 12 }; murmurhash3(decimal) in bloom filter: 2167721464 expect: 4203026776	2021-08-03 12:00:03 +08:00
Mingyu Chen	f26e3408b2	[Profile] Support show load profile for broker load job (#6214 ) 1. Add new statement: `SHOW LOAD PROFILE "xxx";` 2. Improve the read performance of orc scanner	2021-07-27 13:37:34 +08:00
pengxiangyu	7592f52d2e	[Feature][Insert] Add transaction for the operation of insert #6244 (#6245 ) ## Proposed changes Add transaction for the operation of insert. It will cost less time than non-transaction(it will cost 1/1000 time) when you want to insert a amount of rows. ### Syntax ``` BEGIN [ WITH LABEL label]; INSERT INTO table_name ... [COMMIT \| ROLLBACK]; ``` ### Example commit a transaction: ``` begin; insert into Tbl values(11, 22, 33); commit; ``` rollback a transaction: ``` begin; insert into Tbl values(11, 22, 33); rollback; ``` commit a transaction with label: ``` begin with label test_label; insert into Tbl values(11, 22, 33); commit; ``` ### Description ``` begin: begin a transaction, the next insert will execute in the transaction until commit/rollback; commit: commit the transaction, the data in the transaction will be inserted into the table; rollback: abort the transaction, nothing will be inserted into the table; ``` ### The main realization principle: ``` 1. begin a transaction in the session. next sql is executed in the transaction; 2. insert sql will be parser and get the database name and table name, they will be used to select a be and create a pipe to accept data; 3. all inserted values will be sent to the be and write into the pipe; 4. a thread will get the data from the pipe, then write them to disk; 5. commit will complete this transaction and make these data visible; 6. rollback will abort this transaction ``` ### Some restrictions on the use of update syntax. 1. Only ```insert``` can be called in a transaction. 2. If something error happened, ```commit``` will not succeed, it will ```rollback``` directly; 3. By default, if part of insert in the transaction is invalid, ```commit``` will only insert the other correct data into the table. 4. If you need ```commit``` return failed when any insert in the transaction is invalid, you need execute ```set enable_insert_strict = true``` before ```begin```.	2021-07-21 10:54:11 +08:00
Mingyu Chen	b53ff15ef2	[Config] set spark load and odbc table feature enable by default (#6212 ) 1. Also use BufferedReader to speed up orc reader	2021-07-18 22:15:13 +08:00
Mingyu Chen	7e77b5ed7f	[Optimize] Using custom conf dir to save log config of Spring (#6205 ) The log4j-config.xml will be generated at startup of FE and also when modifying FE config. But in some deploy environment such as k8s, the conf dir is not writable. So change the dir of log4j-config.xml to Config.custom_conf_dir. Also fix some small bugs: 1. Typo "less then" -> "less than" 2. Duplicated `exec_mem_limit` showed in SHOW ROUTINE LOAD 3. Allow MAXVALUE in single partition column table. 4. Add IP info for "intolerate index channel failure" msg. Change-Id: Ib4e1182084219c41eae44d3a28110c0315fdbd7d Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2021-07-15 11:13:51 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
Zhengguo Yang	dbfe8e4753	[enhancement] Optimize load CSV file memory allocate (#6174 ) Optimize load CSV file memory allocate, avoid frequent allocation, may reduce the load time by 40%-50% when large column numbers	2021-07-12 09:58:45 +08:00
stdpain	290a844e04	[optimize] Optimize bloomfilter performance (#6180 ) refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5% import block bloom filter, for avx version obtained 40% performance improvement before: bloomfilter size:default, about 2000W item cost about 1s400ms after: bloomfilter size:524288, about 2000W item cost about 400ms	2021-07-10 10:12:12 +08:00
pengxiangyu	01bef4b40d	[Load] Add "LOAD WITH HDFS" model, and make hdfs_reader support hdfs ha (#6161 ) Support load data from HDFS by using `LOAD WITH HDFS` syntax and read data directly via libhdfs3	2021-07-10 10:11:52 +08:00
Zhengguo Yang	198ba78595	[Feature] Add update time to show table status (#6117 ) Add update time to show table status ``` MySQL [test_query_qa]> show table status; +----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------+----------+----------------+---------+ \| Name \| Engine \| Version \| Row_format \| Rows \| Avg_row_length \| Data_length \| Max_data_length \| Index_length \| Data_free \| Auto_increment \| Create_time \| Update_time \| Check_time \| Collation \| Checksum \| Create_options \| Comment \| +----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------+----------+----------------+---------+ \| bigtable \| Doris \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| 2021-06-29 17:09:28 \| 2021-06-29 17:17:28 \| 1970-01-01 07:59:59 \| utf-8 \| NULL \| NULL \| OLAP \| \| test \| Doris \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| 2021-06-29 17:09:26 \| 2021-06-29 17:17:28 \| 1970-01-01 07:59:59 \| utf-8 \| NULL \| NULL \| OLAP \| \| baseall \| Doris \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| 2021-06-29 17:09:26 \| 2021-06-29 17:17:26 \| 1970-01-01 07:59:59 \| utf-8 \| NULL \| NULL \| OLAP \| +----------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------+----------+----------------+---------+ 3 rows in set (0.002 sec) ```	2021-07-07 10:27:14 +08:00
Zhengguo Yang	739c0268ff	[refactor] Remove decimal v1 related code from code base (#6079 ) remove ALL DECIMAL V1 type code ， this is a part of #6073	2021-07-07 10:26:32 +08:00
stdpain	149def9e42	[Feature] Support RuntimeFilter in Doris (BE Implement) (#6077 ) 1. support in/bloomfilter/minmax 2. support broadcast/shuffle/bucket shuffle/colocate join 3. opt memory use and cpu cache miss while build runtime filter 4. opt memory use in left semi join (works well on tpcds-95)	2021-07-04 20:59:05 +08:00
Mingyu Chen	c8899ee5bd	[Build][ARM] Fix some compilation problems on ARM64 (#6076 ) 1. Disable libhdfs3 on ARM, because it doesn't support ARM now. 2. Add compilation doc for ARM64	2021-06-23 09:38:16 +08:00
stdpain	1999a0c26b	[optimization] open gcc strict-aliasing optimization (#6034 ) * open gcc strict-aliasing optimization * use -Werror=strick-alias	2021-06-18 11:39:24 +08:00
weizuo93	9f52f4f9e5	fix stream load error msg missing (#6050 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-06-18 09:21:12 +08:00
Mingyu Chen	d57c2344e1	[MemTracker] Refactored the hierarchical structure of memtracker (#5956 ) To avoid showing too many memtracker on BE web pages. The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE. OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata. TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task. VERBOSE is used for other more detailed memtrackers.	2021-06-16 09:44:24 +08:00
stdpain	bde60280b8	[Optimize] use string_view instead of std::string in string function (#6010 )	2021-06-16 09:40:13 +08:00
xinghuayu007	e245aee33e	[Feature] Select outfile support parquet format (#5938 ) `Select outfile into` currently only supports to export data with CSV format. This patch extends the feature to supports parquet format. Usage: LocaFile: ``` SELECT citycode FROM table1 INTO OUTFILE "file:///root/doris/" FORMAT AS PARQUET PROPERTIES ("schema"="required,int32,siteid;", "parquet.compression"="snappy"); ``` BrokerFile: ``` SELECT siteid FROM table1 INTO OUTFILE "hdfs://host/test_sql_prc_2019_02_19/" FORMAT AS PARQUET PROPERTIES ( "broker.name" = "hdfs_broker", "broker.hadoop.security.authentication" = "kerberos", "broker.kerberos_principal" = "test", "broker.kerberos_keytab_content" = "base64" , "schema"="required,int32,siteid;" ); ``` Field `schema` is required, which defines the schema of a parquet file. Prefix `parquet.` is the parquet file properties, like compression, version, enable_dictionary.	2021-06-10 17:34:01 +08:00
caiconghui	d9c128b744	[BrokerLoad] Support read properties for broker load when read data (#5845 ) * [BrokerLoad] support read properties for broker load when read data Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-06-09 14:59:55 +08:00
Mingyu Chen	ba868c610f	[Optimize] Optimize some tablet scheduling logic (#5926 ) 1. The partitions set by the admin repair command are prioritized to ensure that the tablets of these partitions can be repaired as soon as possible. 2. Add an FE metric "query_begin" to monitor the number of queries submitted to the Doris.	2021-05-30 23:08:59 +08:00
Zhengguo Yang	ba38973209	use virtual hosted-style request to access object store (#5894 ) * use virtual hosted-style access request object store	2021-05-27 15:52:07 +08:00
stdpain	1ec615c562	[BUG] Fixed some uninitialized variables (#5850 ) Fixed some potential bugs caused by uninitialized variables	2021-05-25 10:34:35 +08:00
stdpain	63662194ab	[BUG] Fix Stream Load cost too much memory (#5875 )	2021-05-25 10:34:10 +08:00
Mingyu Chen	591d391bbc	[Bug] Fix bug that the buffered reader may read at wrong position. (#5847 ) The buffered reader's _cur_offset should be initialized as same as the inner file reader's, to make sure that the reader will start to read at rignt position.	2021-05-22 23:38:10 +08:00
HappenLee	1a81b9e160	[MemTracker] Some enchance of MemTracker (#5783 ) 1 Make some MemTracker have reasonable parent MemTracker not the root tracker 2 Make each MemTracker can be easily to trace. 3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.	2021-05-19 09:27:50 +08:00
Xinyi Zou	5748241dab	[Bug-fix] When query cancel, transfer_thread does not continue to schedule scanner_thread (#5768 ) The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule OlapScanNode::scanner_thread until all tasks are scheduled. Although each task does not scan data and exits quickly, it still consumes a lot of resources. (Guess)This may be the cause of the BUG (#5767) causing the I/O to be full. So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for the end of all scanner_threads, transfer_thread will also exit.	2021-05-19 09:26:58 +08:00
caiconghui	add8c4bb74	[Load] Support reading multi-line json objects for JsonScanner (#5774 ) Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-05-18 15:44:45 +08:00
Zhengguo Yang	01a45e8691	add read buffer when use s3 reader (#5791 )	2021-05-17 11:46:38 +08:00
HappenLee	d7d50f7ffa	[Optimize] Speed up the bulk data load to ODBC table. (#5765 ) 1. Batch Insert 2. Use fmt to repalce stringstream 3. Add some profile of ODBC_TABLE_SINK	2021-05-12 10:58:52 +08:00
Zhengguo Yang	98e80aa65e	[refactor] Replace boost::function with std::function (#5700 ) Replace boost::function with std::function	2021-05-09 22:00:48 +08:00
xxiao2018	efd51b47e5	[Bug] Fix some little bugs in FE (#5758 ) 1. Fix NPE in ReplicasProcNode when backend does not exist 2. Forbid the create table like statement to specify the view. 3. Check self ip when starting FE to see if it use the origin ip. 4. Modify the error msg of tablet sink to show more detail errors.	2021-05-08 10:56:10 +08:00
HappenLee	6ad1bf7d7e	[Bug] Fix dead lock in olap scan node and refactor some code in FE profile (#5713 ) * [Bug] Fix dead lock in olap scan node and refactor some code in FE profile * Add some comment	2021-04-30 10:12:18 +08:00
qiye	de87f4ae84	[Feature] Add list partition support (#5529 ) Add list partition support	2021-04-24 17:42:27 +08:00
pengxiangyu	29a3fa1084	[Feature] Support read data with format of parquet from hdfs, using libhdfs3 (#5686 ) Add new lib, Backend can read data from hdfs without broker, this patch include libhdfs3.a which can read file on hdfs. This patch will make reading the data from hdfs with parquet possible. By this, we will support more format of file on hdfs in the future, and we will support other metadata in the future.	2021-04-24 17:41:48 +08:00
Lijia Liu	ec29322c10	[Bug] Avoid waiting too long when rpc is slow. (#5669 ) Total execution time should not longer than stream load timeout.	2021-04-23 09:46:40 +08:00
Zhengguo Yang	a803ceea86	[refactor] Remove boost mutex, use std::mutex instead (#5684 ) * Remove boost mutex, use std::mutex instead * replace shared_mutex	2021-04-22 11:29:36 +08:00
Zhengguo Yang	c4cc681d14	remove boost_foreach, using c++ foreach instead (#5611 )	2021-04-15 10:52:29 +08:00
stdpain	50ffae44b1	[BUG] Fix bug that Unique/AGG key will read all key columns when there are two rowsets (#5632 )	2021-04-14 00:12:05 +08:00
Stalary	75db273b93	[Doris On ES][WIP] Support external ES table with `SSL` secured and configurable node sniffing (#5325 ) Support external ES table with `SSL` secured and configurable node sniffing	2021-04-12 11:23:49 +08:00
Mingyu Chen	9c7d8d2e98	[Bug] Fix bug that isPreAggregation is incorrectly set (#5608 ) 1. The MaterializedViewSelector should be reset for each scan node 2. On the BE side, columns with delete conditions must be added to the return column.	2021-04-09 14:13:06 +08:00
Zhengguo Yang	40f53ac71f	fix bitmap unit test failed (#5610 )	2021-04-08 10:25:59 +08:00
stdpain	ad67dd34a0	update gcc to gcc 10 and support c++17 (#5394 ) * update gcc to gcc 10 and support c++17 update brpc to 0.9.7 update boost to 1.73 remove third-party boost 1.54 for mysql * update cmake version * ignore jdk version * remove unused patch * avoid use SYS_getrandom call	2021-03-25 09:30:38 +08:00
Mingyu Chen	cef3cbc53a	[Bug] Fix bug that the last column may be null when using multibytes separator (#5534 )	2021-03-23 09:35:30 +08:00
stdpain	a91888a68b	[BUG] fix memory limit failure and optimize memory usage in join stage (#5514 ) This patch works well on tpcds-1T query-24	2021-03-21 11:32:51 +08:00
stdpain	8343abaad1	[Feature] Local Exechange (#5470 ) Avoid network transmission when the data stream sender node and destination exchange node are in same BE, to improve performance and save CPU.	2021-03-21 11:25:33 +08:00
HappenLee	19b3a950de	[ODBC] change SQL_DRIVER_COMPLETE_REQUIRED to SQL_DRIVER_NOPROMPT make mysql connect err clear (#5538 )	2021-03-21 11:20:25 +08:00
stdpain	a1bce25677	[BUG] Fix Memory Leak in SchemaChange And Fix some DCHECK error (#5491 )	2021-03-17 09:27:05 +08:00
xxiao2018	1100a0f3a0	[Profile] Add more timer for scan thread (#5511 ) 1. Add timer to count the time the transfer thread waits for the scaner thread to return rowbatch. 2. Add timer to count the time that the scanner thread waits for the available worker threads in the thread pool. Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2021-03-15 10:07:11 +08:00
HappenLee	689602e686	[Enhancement] Support Pallralel Merge In Exchange Node (#5468 ) Support Parallel Merge In Exchange Node	2021-03-11 22:34:18 +08:00

1 2 3 4 5 ...

383 Commits