doris

Author	SHA1	Message	Date
Mingyu Chen	9c12060db3	[Compile] Fix FE compile problem (#7029 ) Co-authored-by: morningman <chenmingyu@baidu.com>	2021-11-08 10:35:49 +08:00
ccoffline	ca8268f1c9	[Feature] Extend logger interface, support structured log output (#6600 ) Support structured logging.	2021-11-07 17:39:53 +08:00
ccoffline	3dd55701ba	[Config] Support custom config handler (#6577 ) Support custom config handler callback and types.	2021-11-07 17:39:24 +08:00
zh0122	974a894688	Update Spring version to fix CVE-2020-5421 (#7023 )	2021-11-06 13:29:24 +08:00
EmmyMiao87	3cef2fb0a8	Union stmt support 'OutFileClause' (#7026 ) The union(set operation) stmt also need to analyze 'OutFileClause'. Whether the fragment is colocate only needs to check the plan node belonging to this fragment.	2021-11-06 13:28:52 +08:00
Zhengguo Yang	5ca271299a	[refactor] set `forward_to_master` true by default (#7017 ) * ot set forward_to_master true by default * Update docs/zh-CN/administrator-guide/variables.md	2021-11-06 13:27:26 +08:00
Zhengguo Yang	760fc02bfe	Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache (#6916 ) Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache add a config used for auto check and reset bprc stub	2021-11-05 09:45:37 +08:00
GoGoWen	9171859c38	fix issue for JournalEntity (#7005 ) fix a log class incorrect issue in JournalEntity.java.	2021-11-05 09:45:10 +08:00
Zhengguo Yang	995fa992f7	Fix hadoop load failed when enable batch delete in unique table (#6996 )	2021-11-05 09:43:28 +08:00
Mingyu Chen	29838f07da	[HTTP][API] Add backends info API for spark/flink connector (#6984 ) Doris should provide a http api to return backends list for connectors to submit stream load, and without privilege checking, which can let common user to use it	2021-11-05 09:43:06 +08:00
jiafeng.zhang	2351c421b4	Revert "[HTTP][API] Add Backend By Rest API (#6999 )" (#7004 ) This reverts commit f509e936573f8d6fdaf4de036bc3c6abef26a182.	2021-11-04 10:25:09 +08:00
weajun	d268d17f2a	Fix the SQL execution error caused by tablet not being found due to Colocate join (#7002 ) * fixbug sql execution sometimes due to failed to get tablet	2021-11-04 09:21:52 +08:00
wudi	f509e93657	[HTTP][API] Add Backend By Rest API (#6999 ) * [HTTP][API] add backend rest api * [HTTP][API] add backends rest api * change api response Co-authored-by: wudi <wud3@shuhaisc.com>	2021-11-04 09:21:07 +08:00
Lijia Liu	9c24334956	[BUG][Schedule] Fix getMixLoadScore error. (#6975 )	2021-11-02 16:36:05 +08:00
tarepanda1024	019e60e7bc	[BUG] fix Calc capacityCoefficient mistake #6898 (#6899 ) fix #6898	2021-11-02 16:32:44 +08:00
Mingyu Chen	db1c281be5	[Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947 ) ## Case In the load process, each tablet will have a memtable to save the incoming data, and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then a new memtable will be created to save the following data/ Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`. If N is large, it will cost too much memory. So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will try to flush all current memtables to disk(even if their size are not reach 100MB). So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller than 100MB, resulting in too many small segment files. ## Solution When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part of them. For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach 20MB, the total size reach 1GB, and flush will occur. If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger than 20MB. The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough. In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB, after modification, the average size is 82MB	2021-11-01 10:51:50 +08:00
Mingyu Chen	e8cabfff27	[S3] Support path style endpoint (#6962 ) Add a use_path_style property for S3 Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property Fix some S3 URI bugs Add some logs for tracing load process.	2021-11-01 10:48:10 +08:00
xy720	f47919136a	[Bug] Fix failure to stop sync job (#6950 )	2021-10-30 18:17:15 +08:00
HappenLee	c3b133bdb3	[Refactor] Refactor the reader code (#6866 ) 1. Removed useless redundant code logic 2. Change reader to interface, add tuple reader to simplify the structure of reader	2021-10-30 18:15:28 +08:00
Zhengguo Yang	4170aabf83	[Optimize] optimize some session variable and profile (#6920 ) 1. optimize error message when using batch delete 2. rename session variable is_report_success to enable_profile 3. add table name to OlapScanner profile	2021-10-27 18:03:12 +08:00
dh-cloud	a4a7e642b4	[Enhance] Add BackendHbResponse info (#6929 ) when be has excepiton, fe doesn't log the BackendHbResponse info, so we can't know which be has exception the exception log is： `WARN (heartbeat mgr\|31) [HeartbeatMgr.runAfterCatalogReady():141] get bad heartbeat response: type: BACKEND, status: BAD, msg: java.net.ConnectException: Connection refused (Connection refused) ` so need add toString(), then fe can log the BackendHbResponse info	2021-10-27 09:56:07 +08:00
Mingyu Chen	00fe9deaeb	[Benchmark] Add star schema benchmark tools (#6925 ) This CL mainly changes: 1. Add star schema benchmark tools in `tools/ssb-tools`, for user to easy load and test with SSB data set. 2. Disable the segment cache for some read scenario such as compaction and alter operation.(Fix #6924 ) 3. Fix a bug that `max_segment_num_per_rowset` won't work(Fix #6926) 4. Enable `enable_batch_delete_by_default` by default.	2021-10-27 09:55:36 +08:00
Lijia Liu	29a4ff4bbe	[Cache][Bug] Correct update cache timeout unit (#6888 ) Now FE update cache use MICROSECONDS as TimeUnit. Replace it by MILLISECONDS.	2021-10-27 09:53:58 +08:00
Mingyu Chen	adb6bfdf74	[Bug] Fix bug that truncate table may change the storage medium property (#6905 )	2021-10-25 10:07:27 +08:00
Mingyu Chen	ed7a873a44	[Memory Usage] Implement segment lru cache to save memory of BE (#6829 )	2021-10-25 10:07:15 +08:00
Mingyu Chen	2d298143cc	[Bug] Fix bug of decommission (#6826 )	2021-10-25 10:07:04 +08:00
Zeno Yang	469b05d708	[Cache][Bug] Fix sql_key of getting and updating Cache is inconsistent (#6903 ) Fix #6735	2021-10-23 16:54:00 +08:00
xy720	7b50409ada	[Bug][Binlog] Fix the number of versions may exceed the limit during data synchronization (#6889 ) Bug detail: #6887 To solve this problem, the commit of transaction must meet any of the following conditions to avoid commit too freqently: 1. The current accumulated event quantity is greater than the `min_sync_commit_size`. 2. The current accumulated data size is greater than the `min_bytes_sync_commit`. In addition, when the accumulated data size exceeds `max_bytes_sync_commit`, the transaction needs to be committed immediately. Before: ![a5e0a2ba01ec4935144253fe0a364af7](https://user-images.githubusercontent.com/22125576/137933545-77018e89-fa2e-4d45-ae5d-84638cc0506a.png) After: ![4577ec53afa47452c847bd01fa7db56c](https://user-images.githubusercontent.com/22125576/137933592-146bef90-1346-47e4-996e-4f30a25d73bc.png)	2021-10-23 16:47:32 +08:00
Wei	696790b397	[Refactor] remove unused code (#6879 )	2021-10-23 16:47:10 +08:00
dohongdayi	a8e3a74ac6	[Bug] Fix bug to reject request with no SQL in TableQueryPlanAction (#6843 ) String.valueOf() returns string "null" with null input, in which case requests with no SQL will be accepted by TableQueryPlanAction unexpectedly with potential risk.	2021-10-23 16:46:24 +08:00
Mingyu Chen	149ce9ecf4	[Bug][Memory Leak] Fix the issue of Catalog instance leakage (#6895 ) The Checkpoint Catalog instance may be incorrectly stored in MetricRepo, causing memory leaks	2021-10-23 16:44:51 +08:00
zh0122	3267455eca	Replace replica_allocation to replication_allocation (#6870 ) Fix #6869	2021-10-20 15:32:35 +08:00
pengxiangyu	bc069eac8b	[BUG] fix bug for schema schange (#6839 ) This commit has an error: #6791 when you only change the order of column, error will apear.	2021-10-17 22:53:28 +08:00
Zeno Yang	4cc01892f6	[SQL Cache] Add all view stmt as the suffix of cache sqlkey (#6832 ) Use all view stmt as the cache sqlkey suffix, so that when the view is modified, the cache can recognize.	2021-10-16 21:56:24 +08:00
shee	8baded8a0e	[BUG] Key is 'True',Extra is 'NONE' when add rollup for DUP table (#6763 ) fix #6762 The result is displayed after the repair ``` mysql> desc test2 all; +-----------+---------------+-------+---------------+------+-------+---------+-------+---------+ \| IndexName \| IndexKeysType \| Field \| Type \| Null \| Key \| Default \| Extra \| Visible \| +-----------+---------------+-------+---------------+------+-------+---------+-------+---------+ \| test2 \| DUP_KEYS \| a \| BIGINT \| Yes \| true \| NULL \| \| true \| \| \| \| b \| BIGINT \| Yes \| true \| NULL \| \| true \| \| \| \| c \| BIGINT \| Yes \| false \| NULL \| NONE \| true \| \| \| \| d \| BIGINT \| Yes \| false \| NULL \| NONE \| true \| \| \| \| e \| VARCHAR(1024) \| Yes \| false \| NULL \| NONE \| true \| \| \| \| f \| VARCHAR(1024) \| Yes \| false \| NULL \| NONE \| true \| \| \| \| \| \| \| \| \| \| \| \| r1 \| DUP_KEYS \| c \| BIGINT \| Yes \| true \| NULL \| \| true \| \| \| \| e \| VARCHAR(1024) \| Yes \| true \| NULL \| \| true \| \| \| \| a \| BIGINT \| Yes \| false \| NULL \| NONE \| true \| +-----------+---------------+-------+---------------+------+-------+---------+-------+---------+ ```	2021-10-16 21:54:00 +08:00
Zhengguo Yang	24d38614a0	[Dependency] Upgrade thirdparty libs (#6766 ) Upgrade the following dependecies: libevent -> 2.1.12 OpenSSL 1.0.2k -> 1.1.1l thrift 0.9.3 -> 0.13.0 protobuf 3.5.1 -> 3.14.0 gflags 2.2.0 -> 2.2.2 glog 0.3.3 -> 0.4.0 googletest 1.8.0 -> 1.10.0 snappy 1.1.7 -> 1.1.8 gperftools 2.7 -> 2.9.1 lz4 1.7.5 -> 1.9.3 curl 7.54.1 -> 7.79.0 re2 2017-05-01 -> 2021-02-02 zstd 1.3.7 -> 1.5.0 brotli 1.0.7 -> 1.0.9 flatbuffers 1.10.0 -> 2.0.0 apache-arrow 0.15.1 -> 5.0.0 CRoaring 0.2.60 -> 0.3.4 orc 1.5.8 -> 1.6.6 libdivide 4.0.0 -> 5.0 brpc 0.97 -> 1.0.0-rc02 librdkafka 1.7.0 -> 1.8.0 after this pr compile doris should use build-env:1.4.0	2021-10-15 13:03:04 +08:00
Mingyu Chen	fcd15edbf9	[Export] Support export job with label (#6835 ) ``` EXPORT TABLE xxx ... PROPERTIES ( "label" = "mylabel", ... ); ``` And than user can use label to get the info by SHOW EXPORT stmt: ``` show export from db where label="mylabel"; ``` For compatibility, if not specified, a random label will be used. And for history jobs, the label will be "export_job_id"; Not like LOAD stmt, here we specify label in `properties` because this will not cause grammatical conflicts, and there is no need to modify the meta version of the metadata.	2021-10-15 10:18:11 +08:00
Gabriel	05e59f6487	[Improvement] fix typo (#6831 )	2021-10-14 14:22:53 +08:00
Wei	80c4007a34	[Refactor] Refactor ConnectProcessor's code doc and unused code (#6823 ) Co-authored-by: wei.zhang2 <wei.zhang2@dmall.com>	2021-10-13 11:44:29 +08:00
Mingyu Chen	3ff63fef99	[Bug] Select outfile failed after query stmt being rewritten. (#6816 )	2021-10-13 11:44:01 +08:00
shee	5d3ebad836	modify comments of DropPartitionClause (#6812 ) Co-authored-by: qzsee <shizhiqiang03@meituan.com>	2021-10-13 11:40:19 +08:00
Xiang Wei	cfeb515f5e	[Enhancement] add spark load config `spark_load_checker_interval_second` (#6809 ) the internal seconds for spark load to check and update etl job status, 60 seconds for default Co-authored-by: weixiang <weixiang06@meituan.com>	2021-10-13 11:39:51 +08:00
Mingyu Chen	5ef3f59928	[Optimize][RoutineLoad] Avoid sending tasks if there is no data to be consumed (#6805 ) 1 Avoid sending tasks if there is no data to be consumed By fetching latest offset of partition before sending tasks.(Fix [Optimize] Avoid too many abort task in routine load job #6803 ) 2 Add a preCheckNeedSchedule phase in update() of routine load. To avoid taking write lock of job for long time when getting all kafka partitions from kafka server. 3 Upgrade librdkafka's version to 1.7.0 to fix a bug of "Local: Unknown partition" See offsetsForTimes fails with 'Local: Unknown partition' edenhill/librdkafka#3295 4 Avoid unnecessary storage migration task if there is no that storage medium on BE. Fix [Bug] Too many unnecessary storage migration tasks #6804	2021-10-13 11:39:01 +08:00
Wei	e547e77f86	Fix variable dbName that is never used (#6802 ) Co-authored-by: wei.zhang2 <wei.zhang2@dmall.com>	2021-10-13 11:37:58 +08:00
shee	32f6dec80f	fix dup table don't schema schange (#6791 ) Co-authored-by: qzsee <shizhiqiang03@meituan.com>	2021-10-13 11:37:39 +08:00
EmmyMiao87	6a058792af	[Feature][Step1] Support lateral view FE part (#6745 ) * [Feature] Support lateral view The syntax: ``` select k1, e1 from test lateral view explode_split(k1, ",") tmp as e1; ``` ```explode_split``` is a special function of doris, which is used to separate the string column according to the specified split string, and then convert the row to column. This is a conforming function of string separation + table function, and its behavior is equivalent to explode in hive ```explode(split(string, string))``` The implement: A tablefunction operator is added to the implementation to handle the syntax of the lateral view separately. The query plan is following: ``` MySQL [test]> explain select k1, e1 from test_explode lateral view explode_split (k2, ",") tmp as e1; +---------------------------------------------------------------------------+ \| Explain String \| +---------------------------------------------------------------------------+ \| PLAN FRAGMENT 0 \| \| OUTPUT EXPRS:`k1` \| `e1` \| \| \| \| RESULT SINK \| \| \| \| 1:TABLE FUNCTION NODE \| \| \| table function: explode_split(`k2`, ',') \| \| \| \| \| 0:OlapScanNode \| \| TABLE: test_explode \| +---------------------------------------------------------------------------+ ``` * Add ut * Add multi table function node * Add session variables 'enable_lateral_view' * Fix ut	2021-10-13 11:37:12 +08:00
Zhengguo Yang	630e273d94	use segmentV2 as default storage format for old tables using storage format 'DEFAULT' (#6807 )	2021-10-13 11:34:40 +08:00
Mingyu Chen	a6e905eae9	[Revert] "[Bug] When using view, make toSql method generates the final sql (#6736 )" (#6793 ) This reverts part of commit 11ec38dd6fd9f86632d83c47bd9d8bc05db69a2b(#6736) Because it will cause view query problem described in #6792 The following bug fix kept: 1. Fix the problem that the WITH statement cannot be printed when UNION is included in SQL	2021-10-11 10:29:50 +08:00
dohongdayi	ea17682d1f	[Typo] Correct misspellings in SparkDpp (#6789 ) Correct misspellings in SparkDpp	2021-10-10 23:07:39 +08:00
Wei	979df5635f	[Code Refactor] Remove unnecessary return statement (#6786 ) Co-authored-by: wei.zhang2 <wei.zhang2@dmall.com>	2021-10-10 23:07:18 +08:00

1 2 3 4 5 ...

1713 Commits