doris

Author	SHA1	Message	Date
Youngwb	068707484d	Support sequence column for UNIQUE_KEYS Table (#4256 ) * add sequence col Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>	2020-09-04 10:10:17 +08:00
Yingchun Lai	498b06fbe2	[Metrics] Support tablet level metrics (#4428 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-09-02 10:39:41 +08:00
ZhangYu0123	123237afb7	[Compaction] Persistence stale rowsets meta (#4454 ) Persistence stale rowsets meta. When BE reboots, stale rowsets meta can resume and the stale version can also be readable before stale gc time. ISSUE: #4453	2020-08-30 21:05:48 +08:00
HangyuanLiu	ad738fa198	Add OLAP_ERR_DATE_QUALITY_ERR error status to display schema change failure (#4388 ) In the process of historical data transformation of materialized views, it may occur that the transformation fails due to data quality. Add an error status code ：OLAP_ERR_DATE_QUALITY_ERR to determine if a data problem is causing the failure #3344	2020-08-27 17:52:53 +08:00
ZhangYu0123	97d963468a	[Code Cleanup] Template nest convert to c++11 syntax and style (#4442 )	2020-08-26 10:51:52 +08:00
Mingyu Chen	67b842ce04	[License] Organize and modify the license of the code (#4371 ) 1. Disable the MySQL client and LZO library by default when building the Doris. MySQL client library is used for MySQL external table feature. This feature will be replaced by the new ODBC external table soon. LZO library is used to compress/decompress data of some old data format of Doris, which is no longer used anymore. 2. Add missing license to some files. 3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared. 4. Remove the js source code from webroot, it will be downloaded as thirdparty	2020-08-24 21:51:55 +08:00
Zhengguo Yang	d61c10b761	[Delete] Support batch delete [part 1] (#4310 ) * Implements the grammar of the batch delete #4051 * Process create, alter table when table has delete sign column * Support the syntax for enabling the delete column * Automatically filtered deleted data in the select statement. * Automatically add delete sign when create rollup table TODO: * Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction	2020-08-21 22:57:16 +08:00
ZhangYu0123	a7422ee142	[UT][Bug-Fix] Resolve UT memory leak problem (#4406 ) Fix ut memory leak on Fix #4164	2020-08-21 10:41:54 +08:00
ZhangYu0123	dc3ed1c525	[Compaction]Compaction rules optimization (#4212 ) Compaction rules optimization, the detail problem description and design to see #4164. This pr commits 2 functions: (1) add the cumulative policy configable, and implement original policy. (2) implement universal policy, the optimization version in #4164.	2020-08-19 09:34:13 +08:00
ZhangYu0123	11ec7bbe24	[Bug]Add LargeInt cast to Date and Datatime, add timezone to stale_version_path_json_doc (#4321 ) (1) Add LargeInt cast to date and datatime, see #3864 LargeInt can cast to date and datatime. Fix this error: Unable to find _ZN5doris13CastFunctions16cast_to_date_valEPN9doris_udf15FunctionContextERKNS1_11LargeIntValE (2) Add local timezone info to stale_version_path_json_doc rest api Add timezone to "last create time" field. { "path id": "1", "last create time": "1970-01-01 10:46:40 +0800", "path list": "1 -> [2-3] -> [4-5]" }, and add timezone to the test unix, see #4121 .	2020-08-13 23:38:30 +08:00
Zhengguo Yang	10e3fc2778	[BUG] Fix abs function cannot handle bigint or bigger data type (#4326 )	2020-08-12 20:58:35 +08:00
Mingyu Chen	912547260a	[UnitTest] Refactor BE unit test script (#4266 ) 1. Rename run-ut.sh to run-be-ut.sh 2. Find all test files from build dir instead of declaring separately in the script 3. Add gtest output to collect the result of unit test.	2020-08-11 10:23:51 +08:00
Yingchun Lai	e71152132c	[metrics] Redesign metrics to 3 layers (#4115 ) Redesign metrics to 3 layers: MetricRegistry - MetricEntity - Metrics MetricRegistry : the register center MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift clients and servers, and so on. Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, thrift_opened_clients on a thrift_client entity. MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across different MetricEntities.	2020-08-08 11:23:01 +08:00
caiconghui	eefad13107	[Feature] Support InPredicate in delete statement (#4006 ) This PR is to add inPredicate support to delete statement, and add max_allowed_in_element_num_of_delete variable to limit element num of InPredicate in delete statement.	2020-08-06 23:19:40 +08:00
ZhangYu0123	16c89c7d56	[BUG]Fix remove expired stale rowset path order error (#4214 ) Delete stale rowset path order error. This bug leads to stale rowsets version inconsistents. #4213	2020-08-01 17:44:39 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
HuangWei	46c8c250a6	[Bug] fix use-after-poison bug in ut schema_change_test (#4118 ) Using slice->data to create HyperLogLog, it will exec HyperLogLog(Slice(const char)). Then Slice(const char) will use strlen(data) to calc the size. But the slice in this unit test isn't a C-string. Need to use Slice.	2020-07-22 09:33:41 +08:00
ZhangYu0123	03cf9b2a24	[Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039 ) Related issue #4017, main changes as follows: 1. Add expired_snapshot_rs_version_map，_expired_snapshot_rs_metas， 2. Add VersionedRowsetTracker record compacted path version 3. Record path version when rowsets compact 4. In gc process, add expired snapshot rowsets to unused set to remove.	2020-07-19 22:03:59 +08:00
HangyuanLiu	5032b7fe7a	Support materialized view schema change in bitmap hll and count field [#3739 ] (#3873 ) + Building the materialized view function for schema_change here based on defineExpr. + This is a trick because the current storage layer does not support expression evaluation. + count distinct materialized view will set mv_expr with to_bitmap or hll_hash. + count materialized view will set mv_expr with count. + Support to regenerate historical data when a new materialized view is created in BE。 + Support to_bitmap function + Support hll_hash function + Support count(field) function For #3344	2020-07-16 10:45:15 +08:00
lichaoyong	93a0b47d22	Revert "[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )" (#3931 ) This reverts commit ca96ea30560c9e9837c28cfd2cdd8ed24196f787.	2020-06-24 10:13:45 +08:00
xy720	f189a2e7b8	[Spark load][Be 1/1] Be handle push task (#3742 ) 1、Add a PushBrokerReader in push_handle.cpp. 2、PushBrokerReader wraps the ParquetScanner to support reading data from parquet format file through broker.	2020-06-22 19:57:58 +08:00
wangbo	8cd36f1c5d	[Spark Load] Support java version hyperloglog (#3320 ) mainly used for Spark Load process to calculate approximate deduplication value and then serialize to parquet file. Try to keep the same calculation semantic with be's C++ version	2020-06-21 09:37:05 +08:00
Binglin Chang	ca96ea3056	[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )	2020-06-18 09:56:07 +08:00
Mingyu Chen	0224d49842	[Fix][Bug] Fix compile bug (#3888 ) Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2020-06-16 18:42:04 +08:00
HuangWei	8caedadb67	use scoped_refptr to new HashIndex (#3818 )	2020-06-10 23:47:10 +08:00
Yingchun Lai	e4dc2ec440	[StorageEngine] Make StorageEngine::open return more detailed info (#3761 ) StorageEngine::open just return a very vague status info when failed, we have to check logs to find out the root reason, and it's not convenient to check logs if we run unit tests in CI dockers. It would be better to return more detailed failure info to point out the root reason, for example, it may return error status with message "file descriptors limit is too small".	2020-06-07 10:21:33 +08:00
Yingchun Lai	3b6a781862	[Bug] Fix a bug that tablet's _preferred_rowset_type may be modified to BETA_ROWSET after cloned (#3750 ) TabletMeta's _preferred_rowset_type is not initialized after object constructing and may be a random value, and this field is not updated when create ALPHA_ROWSET tablet, and it will not be serialized into pb in this case. So if cloning an ALPHA_ROWSET tablet from another BE, this new created local tablet's _preferred_rowset_type field may be random as BETA_ROWSET and can not be overwrote after cloned, then new input rows will be wrote as BETA_ROWSET format which is not we expect. This patch fix this bug by giving _preferred_rowset_type a default value and updating this field when create any type of tablet, and add an unit test and related overwrite equal operator functions.	2020-06-06 11:36:28 +08:00
Binglin Chang	70aa9d6ca8	[Memory Engine] Add MemTabletScan (#3734 )	2020-06-03 15:42:38 +08:00
Binglin Chang	7524c5ef63	[Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch (#3637 )	2020-05-30 10:33:10 +08:00
Binglin Chang	c967eaf496	[Memory Engine] Add TabletType to PartitionInfo and TabletMeta (#3668 )	2020-05-29 20:20:44 +08:00
yangzhg	6788cacb94	Fix unit test failed (#3642 ) Fix some unittest failed due to glog， this may be we change the ut build dir，and the log path is not exist in new build dir， so we change the log from file to stdout	2020-05-25 18:55:19 +08:00
Binglin Chang	12ebd5d82b	Remove some outdate test (#3672 )	2020-05-25 09:23:56 +08:00
Binglin Chang	c54cb4b14e	[Memory Engine] Add column reader/writer (#3580 )	2020-05-20 11:09:30 +08:00
yangzhg	123e1394b1	[Delete] Allow delete duplicated non-key column using delete from (#3424 )	2020-05-15 09:26:36 +08:00
Binglin Chang	a7cfafe076	[Memory Engine] add core column related classes (#3508 ) add core column related classes	2020-05-13 16:30:32 +08:00
Yingchun Lai	b576e54fe6	[ASAN] Fix some address problems detected by ASAN (#3495 ) LSAN detected errors have been fixed by a prior pathch (#3326), but there are still some ASAN detected errors. This patch try to fix these errors to make Doris BE more robustness. And then we can add CI run in LSAN/ASAN mode to detect memory errors as early as possible.	2020-05-11 10:30:45 +08:00
Binglin Chang	7399997433	[Memory Engine] Add hash index implementation (#3462 )	2020-05-06 23:37:25 +08:00
Yingchun Lai	b58b1b3953	[metrics] Make DorisMetrics to be a real singleton (#3417 )	2020-05-04 09:20:53 +08:00
Yingchun Lai	72f3082358	[Metrics] Add some metrics for container size in BE (#3246 ) We can observe the workload of BE, and also it's a way to check whether there is any problem in BE, like some container increase too large and lead to OOM. This patch add the following metrics: ``` Name Description rowset_count_generated_and_in_use The total count of rowset id generated and in use since BE last start unused_rowsets_count The total count of unused rowset waiting to be GC broker_count The total count of brokers in management data_stream_receiver_count The total count of data stream receivers in management fragment_endpoint_count The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count active_scan_context_count The total count of active scan contexts plan_fragment_count The total count of plan fragments in executing load_channel_count The total count of load channels in management result_buffer_block_count The total count of result buffer blocks for queries, each block has a limited queue size (default 1024) result_block_queue_count The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count) routine_load_task_count The total count of routine load tasks in executing small_file_cache_count The total count of cached small files' digest info stream_load_pipe_count The total count of stream load pipes, each pipe has a limited buffer size (default 1M) tablet_writer_count The total count of tablet writers brpc_endpoint_stub_count The total count of brpc endpoints ```	2020-04-25 16:13:39 +08:00
Yingchun Lai	4a7a88ede1	[LSAN] Fix some memory leak detected by LSAN (#3326 )	2020-04-22 22:59:44 +08:00
Yingchun Lai	22e90f7260	[SegmentV2] Fix bloom filter bits buffer not initialize as 0 (#3372 )	2020-04-22 19:50:05 +08:00
lichaoyong	3086790e06	Fix bug when use ZoneMap/BloomFiter on column with REPLACE/REPLACE_IF_NOT_NULL (#3288 ) Now, column with REPLACE/REPLACE_IF_NOT_NULL can be filtered by ZoneMap/BloomFilter when the rowset is base(version starts with zero). Always we think is an optimization. But when some case, it will occurs bug. create table test( k1 int, v1 int replace, v2 int sum ); If I have two records on different two versions 1 2 2 on version [0-10] 1 3 1 on version 11 If I perform a query select * from test where k1 = 1 and v1 = 3; The result will be 1 3 1, this is not right because of the first record is filtered. The right answer is 1 3 3, the v2 should be summed. Remove this optimization is necessity to make the result is right.	2020-04-10 10:22:21 +08:00
Yingchun Lai	f39c8b156d	[refactor] A small refactor on class DataDir (#3276 ) main refactor points are: - Use a single get_absolute_tablet_path function instead of 3 independent functions - Remove meaningless return value of register_tablet and deregister_tablet - Some typo and format	2020-04-10 00:32:22 +08:00
caiconghui	a5703ef114	[Performance] Support sharding txn_map_lock into more small map locks to make good performance for txn manage task (#3222 ) This PR is to enhance the performance for txn manage task, when there are so many txn in BE, the only one txn_map_lock and additional _txn_locks may cause poor performance, and now we remove the additional _txn_locks and split the txn_map_lock into many small locks.	2020-04-09 22:35:15 +08:00
Yingchun Lai	8fc284d593	[config] Support to modify configs when BE is running without restarting (#3264 ) In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE. This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag' when BE is running without restarting it. You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`	2020-04-08 11:17:47 +08:00
lichaoyong	d2307c719c	Fix be unit test error (#3259 )	2020-04-03 15:02:49 +08:00
lichaoyong	a86161f6ce	[Bug]Fix compile error (#3257 )	2020-04-03 13:38:44 +08:00
Dayue Gao	8a2eb8fbcf	[Bug][segment_v2] Fix a bug that NullBitmapBuilder is not reset when data page doesn't have null (#3240 ) This CL fixes a bug that could cause wrong answer for beta rowset with nullable column. The root cause is that NullBitmapBuilder is not reset when the current page doesn't contain NULL, which leads to wrong null map to be written for the next page. Added a test case to reproduce the problem.	2020-04-01 18:39:04 +08:00
Yingchun Lai	cc31bf9cf9	[rowset id] A little improvement of rowset id generator (#3203 ) The main optimization points: 1. Use std::unordered_set instead of std::set, and use RowsetId.hi as RowsetId's hash value. 2. Minimize the scope of SpinLock in UniqueRowsetIdGenerator. Profile comparation: * Run UniqueRowsetIdGeneratorTest.GenerateIdBenchmark 10 times old version \| new version 6s962ms \| 3s647ms 6s139ms \| 3s393ms 6s234ms \| 3s686ms 6s060ms \| 3s447ms 5s966ms \| 4s127ms 5s786ms \| 3s994ms 5s778ms \| 4s072ms 6s193ms \| 4s082ms 6s159ms \| 3s560ms 5s591ms \| 3s654ms	2020-03-26 20:24:26 +08:00
Mingyu Chen	8aa8b8c96d	[Code Refactor] Using block manager to unify the data file access. (#3189 ) Earlier we introduced `BlockManager` to separate data access logic from underlying file read and write logic. This CL further unifies all `SegmentV2` data access to the `BlockManager`, removes the previous `FileManager` class, and move the file cache to the `FileBlockManager`. There are no logical changes to this CL. After this CL, all user table data is read through the `WritableBlock` and `ReadableBlock` returned by the `BlockManager`, and no file operations are performed directly.	2020-03-25 20:39:07 +08:00

1 2 3 4

192 Commits