doris

Author	SHA1	Message	Date
kangkaisen	3f22238012	Add check for to_bitmap function argument (#1747 )	2019-09-05 18:11:38 +08:00
ZHAO Chun	0dc0dadad1	Reduce unnecessary memory allocat and copy in OlapScanNode (#1742 )	2019-09-04 21:05:12 +08:00
EmmyMiao87	9f5e5717d4	Unify the msg of 'Memory exceed limit' (#1737 ) The new msg of limit exceed: "Memory exceed limit. %msg, Backend:%ip, fragment:%id Used:% , Limit:%. xxx". This commit unifies the msg of 'Memory exceed limit' such as check_query_state, RETURN_IF_LIMIT_EXCEEDED and LIMIT_EXCEEDED.	2019-09-03 10:42:16 +08:00
Yunfeng,Wu	8034d83e20	Add scroll keepalive and http timeout configuration (#1731 )	2019-09-02 19:04:30 +08:00
ZHAO Chun	81ca3e3abf	Free olap scanner out of lock (#1733 ) Close scanner out of OlapScanner's batch lock, which will lead all scanners wait for one scanner to finish.	2019-09-02 16:49:28 +08:00
kangkaisen	3a33f3d350	Make bitmap_union agg column support insert into and broker load (#1721 )	2019-08-30 14:44:51 +08:00
Yunfeng,Wu	c6dfe83b6d	Add particular log info for doris on es (#1711 )	2019-08-27 22:16:28 +08:00
ZHAO Chun	dc2d49fe07	Make StringValue's memory layout same with Slice (#1712 ) In our storage engine's code, we cast StringValue to Slice. Because their memory layout is different, it may cause BE process crash. We make their memory layout same in this patch to resolve this problem temporary. We should improve it some day.	2019-08-27 22:15:46 +08:00
Mingyu Chen	a1b92768dd	Add a loaded rows in SHOW LOAD result (#1686 ) Loaded rows will be updated periodically by query report. So that user can see that a load job is still running or being blocked.	2019-08-27 14:13:47 +08:00
kangkaisen	1e4dd77d2a	Add bitmap agg type and udaf (#1610 )	2019-08-26 14:24:42 +08:00
EmmyMiao87	4449316d85	Add error msg when memory limit exceeded (#1685 )	2019-08-23 11:13:01 +08:00
Mingyu Chen	0a27ef030b	Reduce the number of partition info in BrokerScanNode param (#1675 ) And we should reduce the number of partition info in BrokerScanNode param if user already set target partitions to load, instead of adding all partitions' info. It will cause the size of RPC packet too large.	2019-08-20 19:30:57 +08:00
kangkaisen	cd2b8373c2	Fix Stream load double NumberTotalRows (#1664 )	2019-08-19 12:23:43 +08:00
yuanli	ba6d728f26	Enable parsing columns from file path for Broker Load (#1582 ) (#1635 ) Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv This patch is able to parse columns from file path like in Spark(Partition Discovery). This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.	2019-08-19 09:39:21 +08:00
Mingyu Chen	6d73658207	Support checking error data row when doing INSERT (#1597 ) If strict mode is true, and at least one row is filtered, the insert operation will fail and a url will be given to get the error rows. ``` ERROR 1064 (HY000): all partitions have no load data. url: http://host:ip/api/_load_error_log?file=__shard_2/error_log_insert_stmt_e0a620e93dc54461-b89ec64768367d25_e0a620e93dc54461_b89ec64768367d25 ``` If all rows are good, insert will return OK with affected rows: ``` Query OK, 1 row affected (0.26 sec) ``` If strict mode is false, and at least one row is good, the insert operation will return OK with affected rows and warnings. If has error row num, a label will be returned: ``` Query OK, 1 row affected, 1 warning (0.32 sec) {'label':'7d66c457-658b-4a3e-bdcf-8beee872ef2c'} ```	2019-08-16 21:40:29 +08:00
yiguolei	57a1a718c7	print logs when parse scroll result failed (#1661 )	2019-08-16 17:48:23 +08:00
EmmyMiao87	85e89b79d5	Print src tuple in error_sample file (#1641 ) The src tuple could not be print in error_sample file when the value is filtered by strict mode. This commit fix this issue.	2019-08-14 19:58:09 +08:00
HangyuanLiu	69af50aa8c	Time zone related BE function (#1598 ) Details can be found in time-zone.md document	2019-08-12 20:57:59 +08:00
Yunfeng,Wu	e3348c46a9	Expose data pruned-filter-scan ability (#1527 )	2019-08-11 12:59:24 +08:00
worker24h	a6d3099a68	Fix bug: localtime is not thread-safe,then changed to localtime_r. (#1614 )	2019-08-08 22:00:43 +08:00
kangkaisen	f4ad2381e6	Fix error DCHECK for partition_columns (#1606 )	2019-08-08 16:29:08 +08:00
worker24h	dc4a5e6c10	Support Decimal Type when load Parquet File (#1595 )	2019-08-07 19:52:23 +08:00
HangyuanLiu	9402456f5b	Fix parquet directory have empty file (#1593 )	2019-08-07 15:08:22 +08:00
Mingyu Chen	93a3577baa	Support multi partition column when creating table (#1574 ) When creating table with OLAP engine, use can specify multi parition columns. eg: PARTITION BY RANGE(`date`, `id`) ( PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"), PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"), PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01") ) Notice that load by hadoop cluster does not support multi parition column table.	2019-08-05 16:16:43 +08:00
EmmyMiao87	9128af6499	Broker load hang when rpc failed (#1567 ) Broker load hang on broker reader when the thrift request between broker and be is failed.	2019-07-31 19:03:38 +08:00
ZHAO Chun	c5edf9dae0	Unify Field and ColumnSchema in Storage (#1561 ) Currently, we have Field and ColumnSchema to access column data in a row. These two classes are mostly the same. So we should unify these to one class. Now, Field has offset information, which is an row attribute, so we remove offset in Field. RowCursor now has some logic which belong to Schema, so in this patch I add Schema attribute to RowCursor to make RowCursor simple. After this change, only Schema will handle Field/ColumnSchema. I extract some logic from RowCursor to be/src/olap/row.h, then we can use same logic to handle different types of row. Each type of row has same function that to get Cell of this row. A cell represent a column content with a null indicator.	2019-07-30 14:01:57 +08:00
Mingyu Chen	97718a35a2	Do not get file size in Broker openReader() method (#1560 ) The file is already got when listing files. Get file size in openReader() again is unnecessary and inefficient.	2019-07-29 23:05:01 +08:00
Mingyu Chen	0694b6a6fa	Fix bugs of Broker load (#1546 ) Use same UUID as query ID and load ID of a load execution plan. Each load execution plan has a load ID, and as a plan, there is also a query ID. We can use same UUID as query ID and load ID, for tracing the load process more easily. Change the load ID when retrying a load execution plan. When a load execution plan retry, the load ID should be changed, otherwise BE can not distinguish the old and new load requests. Cancel the running loading task when cancelling the broker load. When user cancel a broker load, the running loading task should also be cancelled, or it may occupies the worker thread for a long time. Remove the unnecessary query report when doing load execution plan. Only the last query report is needed. Add a new BE config tablet_writer_rpc_timeout_sec. It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading. Use streaming_load_max_mb instead of mini_load_max_mb in BE config. Add more logs for tracing a broker load process easily.	2019-07-27 20:17:05 +08:00
Mingyu Chen	a88b55e649	Add more logs and metrics to trace the broker load process (#1530 ) The Operator wants to known when the job being scheduled as PENDING and LOADING. And how long it takes to finish these sub states. Also add 2 metrics on BE to monitor the memtable's flush time. `memtable_flush_total` and `memtable_flush_duration_us`	2019-07-23 21:42:44 +08:00
Mingyu Chen	69040572fb	Use different ID instead of table ID for base index of an OLAP table (#1524 )	2019-07-23 15:48:45 +08:00
Mingyu Chen	6c1f95c3a0	Fix bug that BE may crash when closing OlapTableSink (#1507 ) The `_profile` in OlapTableSink may not be initialized if `prepare()` method is not called. So when close the OlapTableSink, we should check if `_profile` is initialized.	2019-07-19 10:30:44 +08:00
Mingyu Chen	4e043e66e2	Modify the result json format of mini load (#1487 ) Mini load is now using stream load framework. But we should keep the mini load return behavior and result json format be same as old. So PUBLISH_TIMEOUT error should be treated as OK in mini load. Also add 2 counters for OlapTableSink profile: SerializeBatchTime: time of serializing all row batch. WaitInFlightPacketTime: time of waiting last send packet	2019-07-16 19:15:41 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
WingC	ae6f2d99c5	Fix bug when use SELECT * FROM TABLE LIMIT 1 (#1469 )	2019-07-13 23:57:14 +08:00
worker24h	aff1559c4d	FixBug: if columns of doris table less than parquet file columns , BE will be crash (#1464 )	2019-07-12 15:23:13 +08:00
HangyuanLiu	b9c79d4b1b	Fix importing non-parquet format file causing be crash (#1454 )	2019-07-11 16:04:36 +08:00
Mingyu Chen	51c92a0bec	Validate the UTF-8 encode of loading data (#1457 ) Currently, Doris only support UTF-8 encoded data. All data will be shown to user in UTF-8 format. So if data loaded in Doris does not UTF-8 encoded, user will see garbled data when querying. I introduce a fast UTF-8 validator from https://github.com/lemire/fastvalidate-utf-8 This validator is highly optimized that it only takes 0.7 CPU cycles to validata a 64k string. And by testing 1GB data load to Doris, the validator has no impact on performance.	2019-07-11 09:46:38 +08:00
chenhao	615c979727	Fix bug that BE crashes when inserting null value to non-nullable columns (#1447 )	2019-07-10 09:20:09 +08:00
worker24h	7eab12a40e	Support reading Parquet file when loading data (#1173 )	2019-07-01 18:39:27 +08:00
kangkaisen	1422414e43	Add varchar column name to stream load error msg (#1366 )	2019-06-24 14:52:59 +08:00
chenhao	687d57be66	Fix bug that query statistics in audit log are wrong (#1354 )	2019-06-21 19:16:05 +08:00
ZHAO Chun	ba44249f80	Remove unused code (#1320 )	2019-06-15 20:41:48 +08:00
ZHAO Chun	30028bc35b	Deny specify partition for unpartitioned table (#1319 )	2019-06-15 18:19:56 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
EmmyMiao87	53062122ea	Change strategy of incorrect data (#1255 ) This change adds a load property named strict_mode which is used to prohibit the incorrect data. When it is set to false, the incorrect data will be loaded by NULL just like before. When it is set to true, the incorrect data which belongs to a column without expr will be filtered. The strict_mode is supported in broker load v2 now. It will be supported in stream load later.	2019-06-10 20:39:45 +08:00
Mingyu Chen	e4e04e8203	Make LZO support optional (#1263 )	2019-06-07 22:26:54 +08:00
ZHAO Chun	934ca2481a	Make MySQL support optional (#1248 )	2019-06-05 12:28:15 +08:00
ZHAO Chun	9f5f44ec48	Reduce memory RowBlock needed (#1238 ) Before RowBlock will reserve memory for all columns in schema, even if it is not queried. Which will cause bad performance when quering wide table. In this patch, RowBlock will reserve memory for needed columns. In a case, this reduce ConvertBatchTime from 10s to 60ms when quering a wide table who has 178 columns. #1236	2019-06-04 12:58:41 +08:00
EmmyMiao87	85b4619d54	Change insert into to streaming (#1191 ) The non-streaming hint of insert into will use the streamin plan which is same as the plan of stream insert. It will also record the load info and return the label of insert stmt. The partition is supportted in insert into stmt. The result which meet the target partitions will be loaded. The introduction of example has been changed especially non-streaming insert. Also, the param of partition_names is added in sql syntax which is used to declare the target partition_names in target table. Change META_VERSION to 50	2019-05-23 20:53:30 +08:00
Mingyu Chen	02f36c23ed	Set tablet as bad when loading index failed (#1146 ) Bad tablet will be reported to FE and be handled And add a config auto_recover_index_loading_failure to control the index loading failure processing	2019-05-13 10:22:04 +08:00

1 2 3

116 Commits