doris

Author	SHA1	Message	Date
Mingyu Chen	521fb15a9b	[Bug] Fix some memory bugs (#6699 ) 1. Fix a memory leak in `collect_iterator.cpp` (Fix #6700) 2. Add a new BE config `max_segment_num_per_rowset` to limit the num of segment in new rowset.(Fix #6701) 3. Make the error msg of stream load more friendly.	2021-09-22 12:30:14 +08:00
Mingyu Chen	3f2fdd236f	Add scan thread token (#6443 )	2021-08-27 10:56:17 +08:00
caiconghui	7e30b28f3a	[Optimize] Speed up converting the data of other types to string in mysql_result_writer (#6384 ) Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-24 22:30:58 +08:00
HappenLee	9216735cfa	[New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329 ) 1. FE vectorized plan code 2. Function register vec function 3. Diff function nullable type 4. New thirdparty code and new thrift struct	2021-08-11 14:54:06 +08:00
caiconghui	d1007afe80	Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient (#6361 ) * [Optimize] optimize the speed of converting integer to string * Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-04 10:55:19 +08:00
HappenLee	02a00cdf35	[Bug] Fix the bug in `from_date_format_str` function (#6273 )	2021-07-21 12:31:37 +08:00
HappenLee	fae3eff2e6	[Bug] Fix the bug of cast string to datetime return not null (#6228 )	2021-07-17 10:55:08 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
Zhengguo Yang	739c0268ff	[refactor] Remove decimal v1 related code from code base (#6079 ) remove ALL DECIMAL V1 type code ， this is a part of #6073	2021-07-07 10:26:32 +08:00
stdpain	1999a0c26b	[optimization] open gcc strict-aliasing optimization (#6034 ) * open gcc strict-aliasing optimization * use -Werror=strick-alias	2021-06-18 11:39:24 +08:00
xinghuayu007	63c99eb4cb	[Cache][Enhancement] Assure sql cache only one version (#5793 ) For PR #5792. This patch add a new param `cache type` to distinguish sql cache and partition cache. When update sql cache, we make assure one sql key only has one version cache.	2021-05-28 13:45:47 +08:00
HappenLee	d0462f4383	[Bug] Fix Backend UT Problem (#5784 ) (#5785 ) 1. relocation R_X86_64_32 against `__gxx_personality_v0' can not be used when making a shared object; recompile with -fPIC 2. warning: the use of `tmpnam' is dangerous, better use `mkstemp' 3. Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test couldn't detect the number of threads.	2021-05-17 11:51:59 +08:00
Zhengguo Yang	01a45e8691	add read buffer when use s3 reader (#5791 )	2021-05-17 11:46:38 +08:00
luozenglin	b686205b97	[Optimize] Reduce lock conflicts in ThreadResourceMgr of be (#5772 ) Removed some useless code that caused lock conflicts in ThreadResourceMgr of be.	2021-05-12 10:59:53 +08:00
Zhengguo Yang	98e80aa65e	[refactor] Replace boost::function with std::function (#5700 ) Replace boost::function with std::function	2021-05-09 22:00:48 +08:00
Zhengguo Yang	a803ceea86	[refactor] Remove boost mutex, use std::mutex instead (#5684 ) * Remove boost mutex, use std::mutex instead * replace shared_mutex	2021-04-22 11:29:36 +08:00
Zhengguo Yang	c4cc681d14	remove boost_foreach, using c++ foreach instead (#5611 )	2021-04-15 10:52:29 +08:00
Zhengguo Yang	d641a26490	[Refactor] Remove boost filesystem (#5579 ) * use std::filesystem instead of boost Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2021-04-08 09:11:59 +08:00
stdpain	c9a25aa29e	[UT] fix memory tracker ut (#5501 ) * [UT] fix memory tracker ut * Update mem_limit_test.cpp	2021-03-12 13:45:04 +08:00
Yingchun Lai	0131c33966	[Enhance] Improve the readability of memtrackers' name (#5455 ) Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker	2021-03-11 22:33:31 +08:00
Yingchun Lai	c38a1c799f	[Config] Support config validating when BE bootstrap and update BE's config by API (#5379 ) Some invalid config value may cause BE work in an unexpected behavior, this patch aim to support config validating when BE bootstrap and update BE's config by API to reject invalid value. This is a work to accomplish PR #4423	2021-03-04 22:21:49 +08:00
Mingyu Chen	51ccd44865	[Load Parallel][3/3] Support parallel delta writer (#5369 ) In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel, and because of the lock granularity problem, LoadChannel could only process these requests serially, which made it impossible to make full use of cluster resources. This CL modifies the related locks so that LoadChannel can process these requests in parallel. In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min. Also modify the profile of load job.	2021-02-07 22:42:18 +08:00
wyb	128752b4f9	[Routine load] Fix kafka load too many task bug (#5327 )	2021-02-03 13:23:30 +08:00
Yingchun Lai	11c0aafa5c	[UT] Speed up BE unit test (#5131 ) There are some long loops and sleeps in unit tests, it will cost a very long time to run all unit tests, especially run in TSAN mode. This patch speed up unit tests by shortening long loops and sleeps, on my environment all unit tests finished in 1 minite. It's useful to do basic functional unit tests. You can switch to run in this mode by adding a new environment variable 'DORIS_ALLOW_SLOW_TESTS'. For example, you can set: export DORIS_ALLOW_SLOW_TESTS=1 and also you can disable it by setting: export DORIS_ALLOW_SLOW_TESTS=0	2020-12-27 22:19:56 +08:00
HuangWei	85076b5678	[UT] fix test_env & add a sample (#5085 ) Easily create tests.	2020-12-27 22:14:30 +08:00
xinghuayu007	9ddf434f6b	[Bug-Fix] Fix partition cache match bug (#5060 ) When partition cache is not cached continuely, range query may fail. For example, partition key 20201011 and 20201013 is cached, but rang query is between 20201011 and 20201013, the query will not hit the cache. issue:#5059	2020-12-19 11:17:44 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
sduzh	10e1e29711	Remove header file common/names.h (#4945 )	2020-11-26 17:00:48 +08:00
HappenLee	2682712349	[Bug] Fix be ut compile failed and core in delta_writer_test when ulimit < 60000. (#4941 )	2020-11-24 22:21:19 +08:00
Mingyu Chen	f1b57c4418	[Optimize] Avoid repeated sending of common components in Fragments (#4904 ) This CL mainly changes: 1. Avoid repeated sending of common components in Fragments In the previous implementation, a query may generate multiple Fragments, these Fragments contain some common information, such as DescriptorTable. Fragment will be sent to BE in a certain order, so these public information will be sent repeatedly and generated repeatedly on the BE side. In some complex SQL, these public information may be very large, thereby increasing the execution time of Fragment. So I improved this. For multiple Fragments sent to the same BE, only the first Fragment will carry these public information, and it will be cached on the BE side, and subsequent Fragments no longer need to carry this information. In the local test, the execution time of some complex SQL can be reduced from 3 seconds to 1 second. 2. Add the time-consuming part of FE logic in Profile Including SQL analysis, planning, Fragment scheduling and sending on the FE side, and the time to fetch data.	2020-11-22 20:38:05 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
HaiBo Li	5199a17a4b	[cache][be]Fix the bug of cross-border access cache (#4639 ) * When the different partition of the table is updated frequently, the partition key list of the cache is discontinuous, and the partition key in the request cannot hit the key list in the cache, resulting in the access overrun，the BE will crash. * Add some unit test case，add test cases that fail to hit the boundary value of cache	2020-09-28 13:35:52 +08:00
HaiBo Li	5f43fb3bde	[Cache][BE] LRU cache for sql/partition cache #2581 (#4005 ) 1. Find the cache node by SQL Key, then find the corresponding partition data by Partition Key, and then decide whether to hit Cache by LastVersion and LastVersionTime 2. Refers to the classic cache algorithm LRU, which is the least recently used algorithm, using a three-layer data structure to achieve 3. The Cache elimination algorithm is implemented by ensuring the range of the partition as much as possible, to avoid the situation of partition discontinuity, which will reduce the hit rate of the Cache partition, 4. Use the two thresholds of maximum memory and elastic memory to control to avoid frequent elimination of data	2020-09-20 20:50:51 +08:00
qiye	065b979f35	[Bug] behavior of function str_to_date() and date_format() on BE and FE is inconsistent (#4612 ) 1. add date range check in `DateLiteral` for `FEFunctions` 2. `select str_to_date(202009,'%Y%m')` and `select str_to_date(str,'%Y%m') from tb where tb.str = '202009'` will return same output `2020-09-00`. 3. add support of zero-date to function `str_to_date()`,`date_format()` 4. fix FE can calculate negative value bug, eg: `select str_to_date('-2020', '%Y')` will return `NULL` instead of date value. current behavior is same as MySQL without sql_mode `NO_ZERO_IN_DATE` and `NO_ZERO_DATE`. current behavior ``` mysql> select siteid,str_to_date(siteid,'%Y%m%d') from table2 order by siteid; +------------+---------------------------------+ \| siteid \| str_to_date(`siteid`, '%Y%m%d') \| +------------+---------------------------------+ \| 1 \| 2001-00-00 \| \| 2 \| 2002-00-00 \| \| 2 \| 2002-00-00 \| \| 3 \| 2003-00-00 \| \| 4 \| 2004-00-00 \| \| 5 \| 2005-00-00 \| \| 20 \| 2020-00-00 \| \| 202 \| 0202-00-00 \| \| 2020 \| 2020-00-00 \| \| 20209 \| 2020-09-00 \| \| 202008 \| 2020-08-00 \| \| 202009 \| 2020-09-00 \| \| 2020009 \| 2020-00-09 \| \| 20200009 \| 2020-00-09 \| \| 20201309 \| NULL \| \| 2020090909 \| 2020-09-09 \| +------------+---------------------------------+ mysql> select str_to_date('2','%Y%m%d'),str_to_date('20','%Y%m%d'),str_to_date('202','%Y%m%d'),str_to_date('2020','%Y%m%d'),str_to_date('20209','%Y%m%d'),str_to_date('202009','%Y%m%d'),str_to_date('2020099','%Y%m%d'),str_to_date('20200909','%Y%m%d'),str_to_date('2020090909','%Y%m%d'),str_to_date('2020009','%Y%m%d'),str_to_date('20200009','%Y%m%d'),str_to_date('20201309','%Y%m%d'); +----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+ \| str_to_date('2', '%Y%m%d') \| str_to_date('20', '%Y%m%d') \| str_to_date('202', '%Y%m%d') \| str_to_date('2020', '%Y%m%d') \| str_to_date('20209', '%Y%m%d') \| str_to_date('202009', '%Y%m%d') \| str_to_date('2020099', '%Y%m%d') \| str_to_date('20200909', '%Y%m%d') \| str_to_date('2020090909', '%Y%m%d') \| str_to_date('2020009', '%Y%m%d') \| str_to_date('20200009', '%Y%m%d') \| str_to_date('20201309', '%Y%m%d') \| +----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+ \| 2002-00-00 \| 2020-00-00 \| 0202-00-00 \| 2020-00-00 \| 2020-09-00 \| 2020-09-00 \| 2020-09-09 \| 2020-09-09 \| 2020-09-09 \| 2020-00-09 \| 2020-00-09 \| NULL \| +----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+ ```	2020-09-17 10:10:19 +08:00
Yingchun Lai	b780df697a	[refactor] Optimize threads usage mode in BE (#4440 ) BE can not graceful exit because some threads are running in endless loop. This patch do the following optimization: - Use the well encapsulated Thread and ThreadPool instead of std::thread and std::vector<std::thread> - Use CountDownLatch in thread's loop condition to avoid endless loop - Introduce a new class Daemon for daemon works, like tcmalloc_gc, memory_maintenance and calculate_metrics - Decouple statistics type TaskWorkerPool and StorageEngine notification by submit tasks to TaskWorkerPool's queue - Reorder objects' stop and deconstruct in main(), i.e. stop network services at first, then internal services - Use libevent in pthreads mode, by calling evthread_use_pthreads(), then EvHttpServer can exit gracefully in multi-threads - Call brpc::Server's Stop() and ClearServices() explicitly	2020-09-06 20:19:14 +08:00
Mingyu Chen	5166a6c6bc	[Bug] function str_to_date()'s behavior on BE and FE is inconsistent (#4495 ) Main CL: 1. Copy the code from BE to implement the `str_to_date()` function in FE. 2. `str_to_date("2020-08-08", "%Y-%m-%d %H:%i:%s")` will return `2020-08-08 00:00:00` instead of `2020-08-08`.	2020-09-03 17:16:19 +08:00
ZhangYu0123	97d963468a	[Code Cleanup] Template nest convert to c++11 syntax and style (#4442 )	2020-08-26 10:51:52 +08:00
Yingchun Lai	e71152132c	[metrics] Redesign metrics to 3 layers (#4115 ) Redesign metrics to 3 layers: MetricRegistry - MetricEntity - Metrics MetricRegistry : the register center MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift clients and servers, and so on. Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, thrift_opened_clients on a thrift_client entity. MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across different MetricEntities.	2020-08-08 11:23:01 +08:00
HuangWei	bfb8c654c1	[Bug] Fix UT bug after making MemTracker shared (#4243 ) after making MemTracker shared(#4135), some code haven't been fixed, and add some useless ut back to build. Fixed in this pr.	2020-08-04 17:52:11 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
HuangWei	9b0ad66b78	[runtime] Replace the thread pool in FragmentMgr (#4057 )	2020-07-15 10:03:48 +08:00
Mingyu Chen	1bfb105ec1	[Bug] Fix bug that routine load task throw exception when calling afterVisible() (#3979 )	2020-07-01 09:22:33 +08:00
Mingyu Chen	51367abce7	[Bug] Fix bug that BE crash when doing Insert Operation (#3872 ) Mainly change: 1. Fix the bug in `update_status(status)` of `PlanFragmentExecutor`. 2. When the FE Coordinator executes `execRemoteFragmentAsync()`, if it finds an RPC error, return a Future with an error code instead of exception. 3. Protect the `_status` in RuntimeState with lock 4. Move the `_runtime_profile` of RuntimeState before the `_obj_pool`, so that the profile will be deconstructed after the object pool. 5. Remove the unused `ObjectPool` param in RuntimeProfile constructor. If I don't remove it, RuntimeProfile will depends on the `_obj_pool` in RuntimeProfile.	2020-06-19 17:09:04 +08:00
Mingyu Chen	3ffc447b38	[OUTFILE] Support `INTO OUTFILE` to export query result (#3584 ) This CL mainly changes: 1. Support `SELECT INTO OUTFILE` command. 2. Support export query result to a file via Broker. 3. Support CSV export format with specified column separator and line delimiter.	2020-05-25 21:24:56 +08:00
yangzhg	6788cacb94	Fix unit test failed (#3642 ) Fix some unittest failed due to glog， this may be we change the ut build dir，and the log path is not exist in new build dir， so we change the log from file to stdout	2020-05-25 18:55:19 +08:00
Binglin Chang	63fecc7954	Remove unused ColumnType (#3532 )	2020-05-11 18:57:47 +08:00
Yingchun Lai	b576e54fe6	[ASAN] Fix some address problems detected by ASAN (#3495 ) LSAN detected errors have been fixed by a prior pathch (#3326), but there are still some ASAN detected errors. This patch try to fix these errors to make Doris BE more robustness. And then we can add CI run in LSAN/ASAN mode to detect memory errors as early as possible.	2020-05-11 10:30:45 +08:00
Yingchun Lai	e2c3c84e8d	[ut] disable backgrounp scan context gc to speed up unit test (#3524 ) Each test case in ExternalScanContextMgrTest may cost 1 minitue which is too long, we'd better disable backgrounp scan context gc to speed up unit test.	2020-05-09 09:01:05 +08:00
Yingchun Lai	b58b1b3953	[metrics] Make DorisMetrics to be a real singleton (#3417 )	2020-05-04 09:20:53 +08:00

1 2 3

108 Commits