doris

Author	SHA1	Message	Date
lichaoyong	9ee1704859	[util] Import util tools from KUDU (#2905 ) 1. MonoTime/MonoDelta MonoTime: The MonoTime represents a particular point in time, relative to some fixed but unspecified reference point. MonoDelta: The MonoDelta class represents an elapsed duration of time, the delta between two MonoTime instances. 2. CountDownLatch This is a C++ implementation of the Java CountDownLatch	2020-02-14 18:01:16 +08:00
LingBin	3c539aac54	[Refactor] Some tiny refactor on streaming-load related code (#2891 ) Mainly contains the following modifications: 1. Use `std::unique_ptr` to replace some naked pointers 2. Modify some methods from member-method to local-static-function 3. Modify some methods do not need to be public to private 4. Some formatting changes: such as wrapping lines that are too long 5. Remove some useless variables 6. Add or modify some comments for easier understanding No functional changes in this patch.	2020-02-13 10:42:52 +08:00
Dayue Gao	7037754978	Fix a bug that TabletsChannel may be written after cancel (#2870 ) TabletsChannel may be written after cancelation, leading to core at DeltaWriter::write. We should check the state of TabletsChannel at the beginning of each operations.	2020-02-10 14:49:00 +08:00
Youngwb	1550401d4b	Support param exec_mem_limit for spark-doris-connctor (#2775 )	2020-01-18 00:14:39 +08:00
HangyuanLiu	0ddca59d36	Add timestampadd/timestampdiff function (#2725 )	2020-01-15 21:47:07 +08:00
kangkaisen	7d2610d091	Change bitmap functions return type to BITMAP (#2690 )	2020-01-07 19:27:21 +08:00
Mingyu Chen	9c90b09a3f	[Alter Table] No need to check whether table is stable when doing some kinds of alter operation (#2617 ) * [Alter Table] No need to check whether table is stable when doing some kinds of alter operation. Not all alter table operation require table to be stable. Such as rename, modify meta data.	2020-01-02 20:51:23 +08:00
Youngwb	feda66f99f	Spark return error to users when spark on doris query failed (#2531 )	2019-12-30 21:58:13 +08:00
Mingyu Chen	a511042397	[Export] Forget to set timeout for export job (#2516 )	2019-12-23 18:14:41 +08:00
Youngwb	48f559600f	Fix bug when spark on doris run long time (#2485 )	2019-12-18 13:08:21 +08:00
kangpinghuang	c07f37d78c	[Segment V2] Add a control framework between FE and BE through heartbeat #2247 (#2364 ) The control framework is implemented through heartbeat message. Use uint64_t as flags to control different functions. Now add a flag to set the default rowset type to beta.	2019-12-12 12:18:32 +08:00
Dayue Gao	83b5455be5	[Load] Fix several races in stream load that could cause BE crash (#2414 ) This CL fixes the following problems 1. check whether TabletsChannel has been closed/cancelled in `reduce_mem_usage` to avoid using a closed DeltaWriter 2. make `FlushHandle.wait` wait for all submitted tasks to finish so that memtable is deallocated before its delta writer 3. make `~MemTracker()` release its consumption bytes to accommodate situations in aggregate_func.h that bitmap and hll call `MemTracker::consume` without corresponding `MemTracker::release`, which cause the consumption of root tracker never drops to zero	2019-12-10 21:59:05 +08:00
Mingyu Chen	a3b7cf484b	Set the load channel's timeout to be the same as the load job's timeout (#2405 ) [Load] When performing a long-time load job, the following errors may occur. Causes the load to fail. load channel manager add batch with unknown load id: xxx There is a case of this error because Doris opened an unrelated channel during the load process. This channel will not receive any data during the entire load process. Therefore, after a fixed timeout, the channel will be released. And after the entire load job is completed, it will try to close all open channels. When it try to close this channel, it will find that the channel no longer exists and an error is reported. This CL will pass the timeout of load job to the load channel, so that the timeout of load channels will be same as load job's.	2019-12-06 21:51:00 +08:00
LingBin	c5f7f7e0f4	Check the return status of `_flush_memtable_async()` (#2332 ) This commit also contains some adjustments of the forward declaration	2019-11-29 21:05:17 +08:00
Mingyu Chen	a2d7c42042	Add a variable to specifically limit the memory usage of the load part in the insert operation (#2305 ) This variable is mainly for INSERT operation, because INSERT operation has both query and load part. Using only the exec_mem_limit variable does not make a good distinction of memory limit between the two parts.	2019-11-28 13:03:11 +08:00
LingBin	569d0bb3af	Replace all remaining boost::split() with strings::split() (#2302 )	2019-11-26 22:22:14 +08:00
Mingyu Chen	d5aeb9a6b7	Add document for session variables. (#2284 ) Also make the variable effective in current session when setting it globally.	2019-11-24 22:47:05 +08:00
ZHAO Chun	e98bbb5bc5	Refactor clone task (#2285 ) In the previous implementation, clone task will continue download files even if some error happened. This may cause unexpected problem. This Change List refactor it to that when error happends, clone task will fail total and try to clone from another remote source. Besides above change, I call FileUtils::remove_all and create_dir instead of boost one, which may cause exception. What's more AgentMasterClient is replaced with ThriftRpcHelper, by this change conncection can be reused.	2019-11-24 22:36:10 +08:00
kangkaisen	9247da9bcc	Fix deregister_recvr no cancel_stream bug (#2286 )	2019-11-24 20:13:08 +08:00
kangkaisen	885019a75b	Make DataStreamRecvr cancel_stream out of lock (#2281 )	2019-11-23 16:52:49 +08:00
ZHAO Chun	3dcb8c991c	Make RowBatch compatible with old version (#2190 ) Field len of StringValue is changed from int to int64. This will cause invalid length of StringValue when deserializing RowBatch sent from 0.10 Doris. And then this will lead fail to allocate memory and make BE crash.	2019-11-13 23:26:26 +08:00
ZHAO Chun	89dc461f91	Fix UT and remove unused code (#2160 )	2019-11-08 08:47:48 +08:00
Yunfeng,Wu	188d97c215	Add null bit verification for row_batch transformation (#2139 )	2019-11-07 14:05:23 +08:00
Mingyu Chen	45df6aae08	Fix some routine load bugs (#2093 ) Mainly fix the following issues: 1. A null pointer exception is raised when a database or table is dropped. The expected behavior is that the routine load job is stopped. 2. Memory leaks. Batch routine load task submissions are no longer performed, and modifications are submitted separately for each task. 3. Unreasonable task timeout. Routine load tasks should not be queued in the BE thread pool for execution. The task sent to the BE should be executed immediately, otherwise the task in the FE will be timeout first. Eventually leads to constant timeout for all subsequent tasks. 4. All routine load job should be scheduled once it being submitted. Not waiting the available BE slot. Otherwise, all later submitted jobs may not be scheduled forever.	2019-10-31 21:53:03 +08:00
kangkaisen	95a3b4ccfe	Add object type (#1948 ) Add a new type: Object. Currently, it's mainly for complex aggregate metrics(HLL , Bitmap). The Object type has the following constraints： 1 Object type could not as key column type 2 Object type doesn't support all indices (BloomFilter, short key, zone map, invert index) 3 Object type doesn't support filter and group by In the implementation： The Object type reuse the StringValue and StringVal, because in storage engine, the Object type is binary, it has a pointer and length.	2019-10-31 21:42:58 +08:00
Yunfeng,Wu	f53f188c5d	Add arrow IPC serialization for Doris-Spark-Connector (#2013 )	2019-10-31 10:32:06 +08:00
kangpinghuang	6b4ef34162	fix AlphaRowsetTest by remove StorageEngine #2078 (#2091 )	2019-10-30 19:39:41 +08:00
Mingyu Chen	c3b5046940	Fix bug of invalid stream load task rollback (#1999 ) If stream load be committed with result PUBLISH_TIMEOUT, it should not rollback this transaction, but only return this message to user.	2019-10-17 21:08:29 +08:00
Mingyu Chen	ee5b79ac2b	Fix bug that memtable should be destroyed before finishing the load process (#1983 ) The parent mem tracker may be release before visiting it in child mem tracker, which cause segfault.	2019-10-15 22:46:19 +08:00
Mingyu Chen	62acf5d098	Limit the memory usage of Loading process (#1954 )	2019-10-15 09:26:20 +08:00
ZHAO Chun	f130bd3e7b	Use Env function to operate directory (#1980 ) Now Env has unify all environment operation, such as file operation. However some of our old functions don't leverage it. This change unify FileUtils::scan_dir to use Env's function.	2019-10-15 09:25:12 +08:00
yiguolei	f852f50acb	Improve unique id performance (#1911 ) Remove the default constructor for UniqueID Add a gen_uid method in UniqueId. If need to generate a new uid, users should call this api explicitly. Reuse boost random generator not generate a new one every time.	2019-09-29 18:20:02 +08:00
kangkaisen	d3a445ee09	Fix memory_scratch_sink_test in debug mode (#1906 )	2019-09-28 10:33:24 +08:00
kangkaisen	cafb9f1e62	Replace Arena with MemPool first step (#1899 )	2019-09-28 01:12:22 +08:00
Mingyu Chen	e67b398916	Fix bug that backup may create an empty file on remote storage. (#1869 ) Sometime the broker writer failed to close, but we do not handle this failure. This may create an empty file on remote storage but be treated as normal. Also enhance some usabilities: 1. getting latest 2000 transactions instead of getting the earliest. 2. Show backend which download and upload tasks are being executed.	2019-09-28 00:11:43 +08:00
yiguolei	2f0808137a	Refactor FrontendHelper (#1888 )	2019-09-27 13:21:14 +08:00
kangkaisen	b246d93128	Avoid SerDe for aggregation query with object pool (#1854 )	2019-09-26 13:51:13 +08:00
Mingyu Chen	c643cbd30c	Optimize the load performance for large file (#1798 ) The current load process is: Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk In the path of Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk, the following operations are performed: Insert tuple into different memtables according to tablet ID When the memtable size reaches the threshold, it is written to disk. The above operations are equivalent to single thread execution for a single load task. In fact, the insertion of memtable and the flush of memtable can be executed synchronously. Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing. In the new implementation, I added a MemTableFlushExecutor class with a set of flush queues and corresponding worker threads. By default, each data directory uses two worker threads for flush, which can be modified by the parameter flush_thread_num_per_store of BE. DeltaWriter will push the full memtable to MemTableFlushExecutor for flush operation and generate a new memtable for receiving new data. This design can improve the performance of load large files. In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.	2019-09-25 13:49:32 +08:00
ZHAO Chun	abd27dfcca	Remove unused debug (#1836 )	2019-09-20 09:31:56 +08:00
ZHAO Chun	aaabf97471	Split channel close operation into two phase (#1830 ) In this change, channel close is finished into two phases. So we can close channels parallel, which can make query faster.	2019-09-19 18:14:30 +08:00
ZHAO Chun	17e52a4bac	Improve LRUCache to get better performance (#1826 ) In this CL, I move the entry's deleter out of LRUCache's mutex block, which can let others access this cache without waiting free cache entry.	2019-09-19 17:37:02 +08:00
EmmyMiao87	054a3f48bc	Add where expr in broker load (#1812 ) The where predicate in broker load is responsible for filtering transformed data. The docs of help and operator has been changed.	2019-09-17 11:32:40 +08:00
ZHAO Chun	11eafe524f	Add ChunkAllocator to accelerate chunk allocation (#1792 ) I add ChunkAllocator in this CL to put unused memory chunk to a chunk pool other than return it to system allocator. Now we only change MemPool's chunk allocation and free to this. And two configuration are introduduced too. 'chunk_reserved_bytes_limit' is the limit of how many bytes this chunk pool can reserve in total and its default value is 2147483648(2GB). 'use_mmap_allocate_chunk': if chunk is allocated via mmap and default value is false. And in my test case with default configuration a simple like "select * from table limit 10", this can improve throughput from 280 QPS to to 650 QPS. And when I config 'chunk_reserved_bytes_limit' to 0, which means this is disabled, the throughput is the same with origin's.	2019-09-13 08:27:24 +08:00
Mingyu Chen	9aa2045987	Refactor alter job (#1695 )	2019-09-12 16:31:29 +08:00
kangkaisen	5a12a1d7df	Fix compile error (#1780 )	2019-09-10 23:48:42 +08:00
HangyuanLiu	235cdb0ecd	Commit kafka offset (#1734 ) Commit kafka offset in routine load Kafka will decide whether to delete data based on whether all consumer group is commit offset or not. If there is no commit offset, the kafka server disk may be full	2019-09-10 14:27:06 +08:00
Mingyu Chen	044489b92f	Optimize some kinds of load jobs (#1762 ) 1. Support specifying label to Insert Into stmt. INSERT INTO tbl1 WITH LABEL label1 ...; 2. Return job' state corresponding to the existing label in result of stream load. ... "Status": "Label Already Exists", "ExistingJobStatus": "FINISHED" ... 3. Return the recent 2000 transactions in SHOW PROC '/transactions'	2019-09-09 22:11:12 +08:00
EmmyMiao87	9f5e5717d4	Unify the msg of 'Memory exceed limit' (#1737 ) The new msg of limit exceed: "Memory exceed limit. %msg, Backend:%ip, fragment:%id Used:% , Limit:%. xxx". This commit unifies the msg of 'Memory exceed limit' such as check_query_state, RETURN_IF_LIMIT_EXCEEDED and LIMIT_EXCEEDED.	2019-09-03 10:42:16 +08:00
ZHAO Chun	b4f6f755f1	Add exchange in MemPool to reduce alloc/free operation (#1732 ) Reuse allocated chunks when storage read operation.	2019-09-02 19:29:30 +08:00
Mingyu Chen	76987275b9	Fix result of unix_timestamp() (#1727 )	2019-08-30 21:39:16 +08:00

1 2 3 4

184 Commits