doris

Author	SHA1	Message	Date
Skysheepwang	6c098e45fc	[Optimize][Cache]Implementation of Separated Page Cache (#5008 ) #4995 Implementation of Separated Page Cache - Add config "index_page_cache_ratio" to set the ratio of capacity of index page cache - Change the member of StoragePageCache to maintain two type of cache - Change the interface of StoragePageCache for selecting type of cache - Change the usage of page cache in read_and_decompress_page in page_io.cpp - add page type as argument - check if current page type is available in StoragePageCache (cover the situation of ratio == 0 or 1) - Add type as argument in superior call of read_and_decompress_page - Change Unit Test	2021-01-04 12:19:24 +08:00
Skysheepwang	0d3564c2e1	[Feature] Implementation of histogram metric (#5148 ) #5146 Add histogram metrics into util/metrics.h. The data structure of histogram is implemented in util/histogram.h, which could also be used in other situations that in need of histogram. Unit tests added as well.	2021-01-04 09:32:46 +08:00
HappenLee	5807413ad0	[UT] Add ut for column predicate of comlumnblock (#5123 ) Add ut for column predicate of ColumnBlock	2021-01-04 09:29:30 +08:00
HappenLee	f2cf8d2c5e	[Bug-Fix] Fix the bug of `PERCENTILE_APPROX` return error result `nan` and add `PERCENTILE_APPROX` UT (#5172 )	2021-01-03 15:45:22 +08:00
HappenLee	9e19b6b133	[Performance Improve] Push Down _conjunct of 'A is NULL' and 'B is not NULL' to Storage Engine. (#5092 ) This patch mainly do the following: - Support #5086 - Refactor ColumnRangeValue to support contain null	2021-01-03 15:45:07 +08:00
HuangWei	5e1a80bb22	[UT][Bug] fix LOOP_LESS_OR_MORE (#5157 ) This bug introduced by #5131. When AllowSlowTests() is true, we should loop more.	2020-12-29 09:48:19 +08:00
Yingchun Lai	11c0aafa5c	[UT] Speed up BE unit test (#5131 ) There are some long loops and sleeps in unit tests, it will cost a very long time to run all unit tests, especially run in TSAN mode. This patch speed up unit tests by shortening long loops and sleeps, on my environment all unit tests finished in 1 minite. It's useful to do basic functional unit tests. You can switch to run in this mode by adding a new environment variable 'DORIS_ALLOW_SLOW_TESTS'. For example, you can set: export DORIS_ALLOW_SLOW_TESTS=1 and also you can disable it by setting: export DORIS_ALLOW_SLOW_TESTS=0	2020-12-27 22:19:56 +08:00
HuangWei	85076b5678	[UT] fix test_env & add a sample (#5085 ) Easily create tests.	2020-12-27 22:14:30 +08:00
xinghuayu007	9ddf434f6b	[Bug-Fix] Fix partition cache match bug (#5060 ) When partition cache is not cached continuely, range query may fail. For example, partition key 20201011 and 20201013 is cached, but rang query is between 20201011 and 20201013, the query will not hit the cache. issue:#5059	2020-12-19 11:17:44 +08:00
Youngwb	650536d53e	[Feature] Add Topn udaf (#4803 ) For #4674 This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate the frequent items and their frequencies in a certain column, based on which we can implement similar topN functions supported by Kylin in the future. I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result. The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is used to set the counter number in the space-saving algorithm ``` zf exponent = 0.5 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 94% 98% 99% zf exponent = 0.6，1 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 100% 100% 100% ```	2020-12-16 21:58:34 +08:00
HuangWei	49f26f4413	[UT] cleanup storage engine creation in tablet_mgr_test etc (#5077 ) Mistakenly use the string '_engine_data_path' as the path, actually the storage engine is not open, so option/path is needless. Cleanup it to avoid any doubt about the file path management.	2020-12-15 09:30:32 +08:00
Yingchun Lai	49f7eb69bf	[Refactor] Refactor DeleteHandler and Cond module (2nd) (#5030 ) * [Refactor] Refactor DeleteHandler and Cond module (#4925) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-08 10:01:18 +08:00
Zhengguo Yang	b9dabc3b5b	[Enhance] Push down predicate on value column of unique table to base rowset (#5022 )	2020-12-06 08:50:37 +08:00
HappenLee	b954dfd82d	[Bug] Fix the bug of Largetint and Decimal json load failed. (#4983 ) Use param of json load "num_as_string" to use flag kParseNumbersAsStringsFlag to parse json data.	2020-12-06 08:49:30 +08:00
Mingyu Chen	c440aa07d1	Revert "[Refactor] Refactor DeleteHandler and Cond module (#4925 )" (#5028 ) This reverts commit 9c9992e0aa28ee85364eebf86a6675f1073e08fb. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-12-05 21:39:49 +08:00
Yingchun Lai	9c9992e0aa	[Refactor] Refactor DeleteHandler and Cond module (#4925 ) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-04 12:13:30 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
sduzh	10e1e29711	Remove header file common/names.h (#4945 )	2020-11-26 17:00:48 +08:00
HappenLee	2682712349	[Bug] Fix be ut compile failed and core in delta_writer_test when ulimit < 60000. (#4941 )	2020-11-24 22:21:19 +08:00
Mingyu Chen	f1b57c4418	[Optimize] Avoid repeated sending of common components in Fragments (#4904 ) This CL mainly changes: 1. Avoid repeated sending of common components in Fragments In the previous implementation, a query may generate multiple Fragments, these Fragments contain some common information, such as DescriptorTable. Fragment will be sent to BE in a certain order, so these public information will be sent repeatedly and generated repeatedly on the BE side. In some complex SQL, these public information may be very large, thereby increasing the execution time of Fragment. So I improved this. For multiple Fragments sent to the same BE, only the first Fragment will carry these public information, and it will be cached on the BE side, and subsequent Fragments no longer need to carry this information. In the local test, the execution time of some complex SQL can be reduced from 3 seconds to 1 second. 2. Add the time-consuming part of FE logic in Profile Including SQL analysis, planning, Fragment scheduling and sending on the FE side, and the time to fetch data.	2020-11-22 20:38:05 +08:00
Lijia Liu	d1a7f1d2c6	Fix column_reader_writer_test UT (#4924 )	2020-11-20 09:47:01 +08:00
Lijia Liu	b48c768dc7	[ComplexType] Restructure storage type to support complex types expending (#4905 ) This CL includes: * Change the column metadata to a tree structure. * Refactor the segment_v2.ColumnReader and sgment_v2.ColumnWriter to support complex type. * Implements the reading and writing of array type.	2020-11-16 21:59:41 +08:00
Yingchun Lai	f40868a480	[Optimize] Improve LRU cache's performance (#4781 ) When LRUCache insert and evict a large number of entries, there are frequently calls of HandleTable::remove(e->key, e->hash), it will lookup the entry in the hash table. Now that we know the entry to remove 'e', we can remove it directly from hash table's collision list if it's a double linked list. This patch refactor the collision list to double linked list, the simple benchmark CacheTest.SimpleBenchmark shows that time cost reduced about 18% in my test environment.	2020-11-06 10:56:27 +08:00
Mingyu Chen	f239f44b37	[Compaction][Bug-Fix] Fix bug that meta lock need to be held when calculating compaction score (#4829 ) * [Compaction][Buf] Fix bug that meta lock need to be held when calucating compaction score * fix Co-authored-by: morningman <chenmingyu@baidu.com>	2020-11-05 20:29:01 +08:00
Yingchun Lai	d1c2b3ed0d	[Optimize] Add an unordered_map for TabletSchema to speed up column name lookup (#4779 ) Reduce column name lookup for TabletSchema and Tablet from O(N) to O(1).	2020-11-03 19:53:44 +08:00
Mingyu Chen	bfdb15c730	[Bug] Fix some date functions to make their result same as MySQL (#4786 ) dayofweek, dayofmonth, dayofyear, weekofyear, timediff Also fix ut compilation problem	2020-10-27 12:52:44 +08:00
Yingchun Lai	6cbefd5621	[LRUCache] Expose LRU Cache status to metrics (#4688 ) Expose LRU Cache status to metrics would be helpful to diagnose problems like high usage, low hit rate.	2020-10-22 21:37:02 +08:00
Mingyu Chen	588e5bee47	[Bug] Fix bug of cumulative compaction and deletion of stale version (#4593 ) When selecting candidate rowsets to do the cumulative compaction, some rowsets may not be selected because the protection time has not expired. Therefore, we need to find the current longest continuous version path in the candidate rowsets.	2020-10-21 10:03:55 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Yingchun Lai	45fa67aa71	[Refactor] Remove objects which are only used for unit test (#4751 ) We create some objects which are only used for unit tests, it's not necessary, and it may cause create duplicate instances for some classes. This patch remove unnecessary instance of class BlockManager and StoragePageCache.	2020-10-18 21:37:12 +08:00
Yingchun Lai	3438a746ac	[Typo] Fix typo in metrics macros (#4739 ) Just fix typo. Rename DEFINE_GAUGE_METRIC_PROTOTYPE_5ARG(name, unit) to DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) Rename DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) witch define core metrics to DEFINE_GAUGE_CORE_METRIC_PROTOTYPE_2ARG(name, unit)	2020-10-15 19:56:43 +08:00
HappenLee	c00a5cb543	[Bug] Fix the core problem of function `split_part` and add the UT of core case (#4721 ) issue:#4720	2020-10-13 10:09:39 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
Zhengguo Yang	98e71a8b9f	[Bug][Function] Fix rand() function return same value (#4709 ) fix rand function return same value when no parameter	2020-10-11 15:40:38 +08:00
ccoffline	f3cdf167d1	[Feature] Add time_round builtin functions (#4640 ) #4619 Add time_round functions that provides `time_floor` & `time_ceil` at each time unit. Fix two related bugs. - #4618 - Fix `struct TimeInterval` to use `int64_t` instead of `int32_t`, in case when the second diff overflow	2020-10-09 16:05:51 +08:00
Yingchun Lai	b1853caeed	[UDF] Improve performance of function money_format (#4672 ) Use static local variable instead of create it every calls. Time cost of the new added unit benchmark test could reduce from about 60 seconds to 10 seconds.	2020-09-28 13:39:41 +08:00
HaiBo Li	5199a17a4b	[cache][be]Fix the bug of cross-border access cache (#4639 ) * When the different partition of the table is updated frequently, the partition key list of the cache is discontinuous, and the partition key in the request cannot hit the key list in the cache, resulting in the access overrun，the BE will crash. * Add some unit test case，add test cases that fail to hit the boundary value of cache	2020-09-28 13:35:52 +08:00
Yingchun Lai	2a637f848d	[Refactor] Remove meaningless return value of RowBlock::init (#4627 ) Simplify some code, mainly remove meaningless return value of RowBlock::init.	2020-09-20 20:57:00 +08:00
HaiBo Li	5f43fb3bde	[Cache][BE] LRU cache for sql/partition cache #2581 (#4005 ) 1. Find the cache node by SQL Key, then find the corresponding partition data by Partition Key, and then decide whether to hit Cache by LastVersion and LastVersionTime 2. Refers to the classic cache algorithm LRU, which is the least recently used algorithm, using a three-layer data structure to achieve 3. The Cache elimination algorithm is implemented by ensuring the range of the partition as much as possible, to avoid the situation of partition discontinuity, which will reduce the hit rate of the Cache partition, 4. Use the two thresholds of maximum memory and elastic memory to control to avoid frequent elimination of data	2020-09-20 20:50:51 +08:00
qiye	065b979f35	[Bug] behavior of function str_to_date() and date_format() on BE and FE is inconsistent (#4612 ) 1. add date range check in `DateLiteral` for `FEFunctions` 2. `select str_to_date(202009,'%Y%m')` and `select str_to_date(str,'%Y%m') from tb where tb.str = '202009'` will return same output `2020-09-00`. 3. add support of zero-date to function `str_to_date()`,`date_format()` 4. fix FE can calculate negative value bug, eg: `select str_to_date('-2020', '%Y')` will return `NULL` instead of date value. current behavior is same as MySQL without sql_mode `NO_ZERO_IN_DATE` and `NO_ZERO_DATE`. current behavior ``` mysql> select siteid,str_to_date(siteid,'%Y%m%d') from table2 order by siteid; +------------+---------------------------------+ \| siteid \| str_to_date(`siteid`, '%Y%m%d') \| +------------+---------------------------------+ \| 1 \| 2001-00-00 \| \| 2 \| 2002-00-00 \| \| 2 \| 2002-00-00 \| \| 3 \| 2003-00-00 \| \| 4 \| 2004-00-00 \| \| 5 \| 2005-00-00 \| \| 20 \| 2020-00-00 \| \| 202 \| 0202-00-00 \| \| 2020 \| 2020-00-00 \| \| 20209 \| 2020-09-00 \| \| 202008 \| 2020-08-00 \| \| 202009 \| 2020-09-00 \| \| 2020009 \| 2020-00-09 \| \| 20200009 \| 2020-00-09 \| \| 20201309 \| NULL \| \| 2020090909 \| 2020-09-09 \| +------------+---------------------------------+ mysql> select str_to_date('2','%Y%m%d'),str_to_date('20','%Y%m%d'),str_to_date('202','%Y%m%d'),str_to_date('2020','%Y%m%d'),str_to_date('20209','%Y%m%d'),str_to_date('202009','%Y%m%d'),str_to_date('2020099','%Y%m%d'),str_to_date('20200909','%Y%m%d'),str_to_date('2020090909','%Y%m%d'),str_to_date('2020009','%Y%m%d'),str_to_date('20200009','%Y%m%d'),str_to_date('20201309','%Y%m%d'); +----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+ \| str_to_date('2', '%Y%m%d') \| str_to_date('20', '%Y%m%d') \| str_to_date('202', '%Y%m%d') \| str_to_date('2020', '%Y%m%d') \| str_to_date('20209', '%Y%m%d') \| str_to_date('202009', '%Y%m%d') \| str_to_date('2020099', '%Y%m%d') \| str_to_date('20200909', '%Y%m%d') \| str_to_date('2020090909', '%Y%m%d') \| str_to_date('2020009', '%Y%m%d') \| str_to_date('20200009', '%Y%m%d') \| str_to_date('20201309', '%Y%m%d') \| +----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+ \| 2002-00-00 \| 2020-00-00 \| 0202-00-00 \| 2020-00-00 \| 2020-09-00 \| 2020-09-00 \| 2020-09-09 \| 2020-09-09 \| 2020-09-09 \| 2020-00-09 \| 2020-00-09 \| NULL \| +----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+ ```	2020-09-17 10:10:19 +08:00
Yingchun Lai	64ebea2e43	[Feature] Support gzip compression for http response (#4533 ) After tablet level metrics is supported, the http metrics API may response a very large body when a BE holds a large number of tablets, and cause heavy network traffic. This patch introduce http content compression to reduce network traffic.	2020-09-06 20:30:12 +08:00
Yingchun Lai	b780df697a	[refactor] Optimize threads usage mode in BE (#4440 ) BE can not graceful exit because some threads are running in endless loop. This patch do the following optimization: - Use the well encapsulated Thread and ThreadPool instead of std::thread and std::vector<std::thread> - Use CountDownLatch in thread's loop condition to avoid endless loop - Introduce a new class Daemon for daemon works, like tcmalloc_gc, memory_maintenance and calculate_metrics - Decouple statistics type TaskWorkerPool and StorageEngine notification by submit tasks to TaskWorkerPool's queue - Reorder objects' stop and deconstruct in main(), i.e. stop network services at first, then internal services - Use libevent in pthreads mode, by calling evthread_use_pthreads(), then EvHttpServer can exit gracefully in multi-threads - Call brpc::Server's Stop() and ClearServices() explicitly	2020-09-06 20:19:14 +08:00
Youngwb	068707484d	Support sequence column for UNIQUE_KEYS Table (#4256 ) * add sequence col Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>	2020-09-04 10:10:17 +08:00
Mingyu Chen	5166a6c6bc	[Bug] function str_to_date()'s behavior on BE and FE is inconsistent (#4495 ) Main CL: 1. Copy the code from BE to implement the `str_to_date()` function in FE. 2. `str_to_date("2020-08-08", "%Y-%m-%d %H:%i:%s")` will return `2020-08-08 00:00:00` instead of `2020-08-08`.	2020-09-03 17:16:19 +08:00
xinghuayu007	1a30bcbf36	[SQL Function][Bug] Fix parse_url() bug (#4429 ) The parameter 'part' of parse_url function does not support lower case, and parse protocol not right. And This function does not support parse 'port'. This PR tries to make parse_url function case insensitive and support parse 'port'. The issue: #4451	2020-09-03 17:06:09 +08:00
Yingchun Lai	498b06fbe2	[Metrics] Support tablet level metrics (#4428 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-09-02 10:39:41 +08:00
ZhangYu0123	123237afb7	[Compaction] Persistence stale rowsets meta (#4454 ) Persistence stale rowsets meta. When BE reboots, stale rowsets meta can resume and the stale version can also be readable before stale gc time. ISSUE: #4453	2020-08-30 21:05:48 +08:00
HangyuanLiu	ad738fa198	Add OLAP_ERR_DATE_QUALITY_ERR error status to display schema change failure (#4388 ) In the process of historical data transformation of materialized views, it may occur that the transformation fails due to data quality. Add an error status code ：OLAP_ERR_DATE_QUALITY_ERR to determine if a data problem is causing the failure #3344	2020-08-27 17:52:53 +08:00
ZhangYu0123	97d963468a	[Code Cleanup] Template nest convert to c++11 syntax and style (#4442 )	2020-08-26 10:51:52 +08:00
Mingyu Chen	67b842ce04	[License] Organize and modify the license of the code (#4371 ) 1. Disable the MySQL client and LZO library by default when building the Doris. MySQL client library is used for MySQL external table feature. This feature will be replaced by the new ODBC external table soon. LZO library is used to compress/decompress data of some old data format of Doris, which is no longer used anymore. 2. Add missing license to some files. 3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared. 4. Remove the js source code from webroot, it will be downloaded as thirdparty	2020-08-24 21:51:55 +08:00

1 2 3 4 5 ...

399 Commits