doris

Author	SHA1	Message	Date
worker24h	a6d3099a68	Fix bug: localtime is not thread-safe,then changed to localtime_r. (#1614 )	2019-08-08 22:00:43 +08:00
kangkaisen	f4ad2381e6	Fix error DCHECK for partition_columns (#1606 )	2019-08-08 16:29:08 +08:00
worker24h	dc4a5e6c10	Support Decimal Type when load Parquet File (#1595 )	2019-08-07 19:52:23 +08:00
HangyuanLiu	9402456f5b	Fix parquet directory have empty file (#1593 )	2019-08-07 15:08:22 +08:00
Mingyu Chen	93a3577baa	Support multi partition column when creating table (#1574 ) When creating table with OLAP engine, use can specify multi parition columns. eg: PARTITION BY RANGE(`date`, `id`) ( PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"), PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"), PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01") ) Notice that load by hadoop cluster does not support multi parition column table.	2019-08-05 16:16:43 +08:00
EmmyMiao87	9128af6499	Broker load hang when rpc failed (#1567 ) Broker load hang on broker reader when the thrift request between broker and be is failed.	2019-07-31 19:03:38 +08:00
ZHAO Chun	c5edf9dae0	Unify Field and ColumnSchema in Storage (#1561 ) Currently, we have Field and ColumnSchema to access column data in a row. These two classes are mostly the same. So we should unify these to one class. Now, Field has offset information, which is an row attribute, so we remove offset in Field. RowCursor now has some logic which belong to Schema, so in this patch I add Schema attribute to RowCursor to make RowCursor simple. After this change, only Schema will handle Field/ColumnSchema. I extract some logic from RowCursor to be/src/olap/row.h, then we can use same logic to handle different types of row. Each type of row has same function that to get Cell of this row. A cell represent a column content with a null indicator.	2019-07-30 14:01:57 +08:00
Mingyu Chen	97718a35a2	Do not get file size in Broker openReader() method (#1560 ) The file is already got when listing files. Get file size in openReader() again is unnecessary and inefficient.	2019-07-29 23:05:01 +08:00
Mingyu Chen	0694b6a6fa	Fix bugs of Broker load (#1546 ) Use same UUID as query ID and load ID of a load execution plan. Each load execution plan has a load ID, and as a plan, there is also a query ID. We can use same UUID as query ID and load ID, for tracing the load process more easily. Change the load ID when retrying a load execution plan. When a load execution plan retry, the load ID should be changed, otherwise BE can not distinguish the old and new load requests. Cancel the running loading task when cancelling the broker load. When user cancel a broker load, the running loading task should also be cancelled, or it may occupies the worker thread for a long time. Remove the unnecessary query report when doing load execution plan. Only the last query report is needed. Add a new BE config tablet_writer_rpc_timeout_sec. It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading. Use streaming_load_max_mb instead of mini_load_max_mb in BE config. Add more logs for tracing a broker load process easily.	2019-07-27 20:17:05 +08:00
Mingyu Chen	a88b55e649	Add more logs and metrics to trace the broker load process (#1530 ) The Operator wants to known when the job being scheduled as PENDING and LOADING. And how long it takes to finish these sub states. Also add 2 metrics on BE to monitor the memtable's flush time. `memtable_flush_total` and `memtable_flush_duration_us`	2019-07-23 21:42:44 +08:00
Mingyu Chen	69040572fb	Use different ID instead of table ID for base index of an OLAP table (#1524 )	2019-07-23 15:48:45 +08:00
Mingyu Chen	6c1f95c3a0	Fix bug that BE may crash when closing OlapTableSink (#1507 ) The `_profile` in OlapTableSink may not be initialized if `prepare()` method is not called. So when close the OlapTableSink, we should check if `_profile` is initialized.	2019-07-19 10:30:44 +08:00
Mingyu Chen	4e043e66e2	Modify the result json format of mini load (#1487 ) Mini load is now using stream load framework. But we should keep the mini load return behavior and result json format be same as old. So PUBLISH_TIMEOUT error should be treated as OK in mini load. Also add 2 counters for OlapTableSink profile: SerializeBatchTime: time of serializing all row batch. WaitInFlightPacketTime: time of waiting last send packet	2019-07-16 19:15:41 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
WingC	ae6f2d99c5	Fix bug when use SELECT * FROM TABLE LIMIT 1 (#1469 )	2019-07-13 23:57:14 +08:00
worker24h	aff1559c4d	FixBug: if columns of doris table less than parquet file columns , BE will be crash (#1464 )	2019-07-12 15:23:13 +08:00
HangyuanLiu	b9c79d4b1b	Fix importing non-parquet format file causing be crash (#1454 )	2019-07-11 16:04:36 +08:00
Mingyu Chen	51c92a0bec	Validate the UTF-8 encode of loading data (#1457 ) Currently, Doris only support UTF-8 encoded data. All data will be shown to user in UTF-8 format. So if data loaded in Doris does not UTF-8 encoded, user will see garbled data when querying. I introduce a fast UTF-8 validator from https://github.com/lemire/fastvalidate-utf-8 This validator is highly optimized that it only takes 0.7 CPU cycles to validata a 64k string. And by testing 1GB data load to Doris, the validator has no impact on performance.	2019-07-11 09:46:38 +08:00
chenhao	615c979727	Fix bug that BE crashes when inserting null value to non-nullable columns (#1447 )	2019-07-10 09:20:09 +08:00
worker24h	7eab12a40e	Support reading Parquet file when loading data (#1173 )	2019-07-01 18:39:27 +08:00
kangkaisen	1422414e43	Add varchar column name to stream load error msg (#1366 )	2019-06-24 14:52:59 +08:00
chenhao	687d57be66	Fix bug that query statistics in audit log are wrong (#1354 )	2019-06-21 19:16:05 +08:00
ZHAO Chun	ba44249f80	Remove unused code (#1320 )	2019-06-15 20:41:48 +08:00
ZHAO Chun	30028bc35b	Deny specify partition for unpartitioned table (#1319 )	2019-06-15 18:19:56 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
EmmyMiao87	53062122ea	Change strategy of incorrect data (#1255 ) This change adds a load property named strict_mode which is used to prohibit the incorrect data. When it is set to false, the incorrect data will be loaded by NULL just like before. When it is set to true, the incorrect data which belongs to a column without expr will be filtered. The strict_mode is supported in broker load v2 now. It will be supported in stream load later.	2019-06-10 20:39:45 +08:00
Mingyu Chen	e4e04e8203	Make LZO support optional (#1263 )	2019-06-07 22:26:54 +08:00
ZHAO Chun	934ca2481a	Make MySQL support optional (#1248 )	2019-06-05 12:28:15 +08:00
ZHAO Chun	9f5f44ec48	Reduce memory RowBlock needed (#1238 ) Before RowBlock will reserve memory for all columns in schema, even if it is not queried. Which will cause bad performance when quering wide table. In this patch, RowBlock will reserve memory for needed columns. In a case, this reduce ConvertBatchTime from 10s to 60ms when quering a wide table who has 178 columns. #1236	2019-06-04 12:58:41 +08:00
EmmyMiao87	85b4619d54	Change insert into to streaming (#1191 ) The non-streaming hint of insert into will use the streamin plan which is same as the plan of stream insert. It will also record the load info and return the label of insert stmt. The partition is supportted in insert into stmt. The result which meet the target partitions will be loaded. The introduction of example has been changed especially non-streaming insert. Also, the param of partition_names is added in sql syntax which is used to declare the target partition_names in target table. Change META_VERSION to 50	2019-05-23 20:53:30 +08:00
Mingyu Chen	02f36c23ed	Set tablet as bad when loading index failed (#1146 ) Bad tablet will be reported to FE and be handled And add a config auto_recover_index_loading_failure to control the index loading failure processing	2019-05-13 10:22:04 +08:00
EmmyMiao87	79ab7f4413	Change label of broker load txn (#1134 ) * Change label of broker load txn 1. put broker load label into txn label 2. fix the bug of `label is already used` 3. fix partition error of new broker load * Fix count error in mini load and broker load There are three params (num_rows_load_total, num_rows_load_filtered, num_rows_load_unselected) which are used to count dpp.norm.ALL and dpp.abnorm.ALL. num_rows_load_total is the number rows of source file. num_rows_load_unselected is the not satisfied (where conjuncts) rows of num_rows_load_total num_rows_load_filtered is the rows (quality not good enough) of (num_rows_load_total-num_rows_load_unselected)	2019-05-10 16:53:46 +08:00
Mingyu Chen	afa3aa9069	Add some pre-calculated metrics (#1079 ) 1. max io util of disks 2. max network send/receive bytes rate of all network devices 3. base/cumulative compaction request counter and failure counter	2019-04-30 11:12:23 +08:00
Mingyu Chen	310a375aec	Fix bug that null value is not correctly handled when loading data (#1070 ) When partition column's value is NULL, it should be loaded into the partition which include MIN VALUE	2019-04-29 13:55:28 +08:00
lide	9c82d41981	Support Doris query ES by HTTP way (#925 )	2019-04-28 17:14:44 +08:00
Mingyu Chen	b7b66527ce	Fix some load bugs (#961 ) 1. Use load job's timeout as its txn timeout 2. Add a new session variable 'forward_to_master' for SHOW PROC and ADMIN stmt	2019-04-28 10:33:50 +08:00
Mingyu Chen	2b4d02b2fa	Add error load log url for routine load job (#938 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	9d08be3c5f	Add metrics for routine load (#795 ) * Add metrics for routine load * limit the max number of routine load task in backend to 10 * Fix bug that some partitions will no be assigned	2019-04-28 10:33:50 +08:00
Mingyu Chen	8474061d63	Add some logs (#711 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	0820a29b8d	Implement the routine load process of Kafka on Backend (#671 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	da308da17c	Fix bug that empty stream load return unexpected error msg (#1052 )	2019-04-28 09:36:19 +08:00
chenhao	c0fbc84381	Fix bug that ScanBytes is when collect executing query's infos (#869 )	2019-04-03 18:27:50 +08:00
yiguolei	348c61c69f	Fix doris on es bug (#826 ) * Get in pred from hybridset * ignore new_filter_in when push down * Ignore cast case in to_ext_literal	2019-03-28 12:54:17 +08:00
yiguolei	f4a63b29d8	Fix doris on es bug (#791 )	2019-03-22 19:03:27 +08:00
lide	c34b306b4f	Decimal optimize branch #695 (#727 )	2019-03-22 17:22:16 +08:00
ZHAO Chun	11307b23c8	Fix bug: stream load ignore last line with no-newline (#785 ) #783	2019-03-21 19:18:22 +08:00
Salieri1969	7965a7129a	Add esquery function (#652 )	2019-03-08 09:27:41 +08:00
chenhao	397747af2c	Fix bug that push down the predicates past AggregateNode (#658 )	2019-02-26 10:55:14 +08:00
Mingyu Chen	aba1b9e5d6	Reopen the thrift client when got exception (#610 ) To avoid broken connection being reused.	2019-01-31 16:54:49 +08:00
Mingyu Chen	af445b6cc2	Optimize something (#607 ) 1. Unify the thrift rpc timeout from BE to FE. Add a BE config 'thrift_rpc_timeout_ms', default is 5000 2. Add hostname in "show proc '/frontends';" stmt result. 3. Fix a lock order bug in Load.java	2019-01-31 13:30:45 +08:00

1 2

97 Commits