doris

Author	SHA1	Message	Date
Amos Bird	800a36343a	[chore] Prolog of hermetic build with GCC 11 and Clang 13. (#7712 ) Prepare to generate hermetic build using GCC 11 and Clang 13. The ideal toolchain would be ldb toolchain generated by [ldb_toolchain_gen.sh](https://github.com/amosbird/ldb_toolchain_gen/releases/download/v0.3/ldb_toolchain_gen.sh) To kick off a clang build, set `DORIS_TOOLCHAIN=clang` before running any build scripts.	2022-01-21 12:12:04 +08:00
weizuo93	ed39ff1500	[feature](compaction) Support triggering compaction for a specific partition manually (#7521 ) Add statement to trigger cumulative or base compaction for a specified partition.	2022-01-21 09:27:06 +08:00
jiafeng.zhang	4c17c370e7	[Doc]Modify the audit log plugin to record the SQL statement field type as String (#7798 ) Modify the audit log plugin to record the SQL statement field type as String	2022-01-20 21:36:02 +08:00
jiafeng.zhang	4768dd4efa	[docs] Documentation corrections (#7787 )	2022-01-20 16:18:48 +08:00
Mingyu Chen	3494c8973b	[improvement](colocation) Add a new config to delay the relocation of colocation group (#7656 ) 1. Add a new FE config `colocate_group_relocate_delay_second` The relocation of a colocation group may involve a large number of tablets moving within the cluster. Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible. Relocation usually occurs after a BE node goes offline or goes down. This config is used to delay the determination of BE node unavailability. The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group will not be triggered. 2. Change the priority of colocate tablet repair and balance task from HIGH to NORMAL 3. Add a new FE config allow_replica_on_same_host If set to true, when creating table, Doris will allow to locate replicas of a tablet on same host. And also the tablet repair and balance will be disabled. This is only for local test, so that we can deploy multi BE on same host and create table with multi replicas.	2022-01-18 10:26:36 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
jakevin	ebc27a40d7	[docs] Split the FAQ And Revert auto-label action (#7770 )	2022-01-17 10:34:56 +08:00
xy720	e80c34b6fe	[docs][typo] fix some typos in documents (#7769 )	2022-01-16 10:43:42 +08:00
Mingyu Chen	5f8d91257b	[improvement](routine-load) Reduce the probability that the routine load task rpc timeout (#7754 ) If an load task has a relatively short timeout, then we need to ensure that each RPC of this task does not get blocked for a long time. And an RPC is usually blocked for two reasons. 1. handling "memory exceeds limit" in the RPC If the system finds that the memory occupied by the load exceeds the threshold, it will select the load channel that occupies the most memory and flush the memtable in it. this operation is done in the RPC, which may be more time consuming. 2. close the load channel When the load channel receives the last batch, it will end the task. It will wait for all memtables flushes to finish synchronously. This process is also time consuming. Therefore, this PR solves this problem by. 1. Use timeout to determine whether it is a high-priority load task If the timeout of an load task is relatively short, then we mark it as a high-priority task. 2. not processing "memory exceeds limit" for high priority tasks 3. use a separate flush thread to flush memtable for high priority tasks.	2022-01-16 10:41:31 +08:00
shee	8b7d7e4dac	[improvement] create/drop index support if [not] exist (#7748 ) create or drop index clause support if [not] exist	2022-01-16 10:40:44 +08:00
Universe	5b0f11b665	[feature](mysql-compatibility)(function) add `WEEKDAY` function (#7673 ) `WEEKDAY` in MySQL: returns an index from 0 to 6 for Monday to Sunday. `DAYOFWEEK` in MySQL: returns an index from 1 to 7 for Sunday to Saturday. Doris only have `DAYOFWEEK` function, so I add `WEEKDAY` function. Thanks for the following materials: - https://github.com/apache/incubator-doris/pull/6982/files - https://www.bilibili.com/video/BV1V44y1Y7Ro	2022-01-16 10:39:21 +08:00
jakevin	be43316f20	[docs] add doc for community feedback and fix CI (#7759 ) add doc for community feedback and fix CI	2022-01-14 22:19:28 +08:00
jiafeng.zhang	6188ab20df	[docs](faq) add multiple FE WEB UI login issues (#7654 )	2022-01-14 09:26:39 +08:00
luzhijing	ccb6c6ac2e	[docs] update seatunnel.md (#7731 ) correct some wrongly written words and update document format	2022-01-13 15:31:17 +08:00
jiafeng.zhang	db2649525f	[docs](website) Add Database ODBC version correspondence (#7675 )	2022-01-13 15:28:02 +08:00
Mingyu Chen	a034c20d16	[fix](website) Add trademarks footer on official website (#7696 )	2022-01-11 15:07:56 +08:00
qiye	2de79832fc	[docs](hive)(function) fix Hive type error and optimize alias function example (#7694 ) 1. fix Hive type error 2. optimize alias function example	2022-01-11 15:07:32 +08:00
luzhijing	1b2acb6acd	[docs] update the document format (#7689 )	2022-01-11 15:06:42 +08:00
Adonis Ling	2cf574dc01	[docs] Improve instructions for the configuration of BE. (#7620 )	2022-01-11 15:06:05 +08:00
caiconghui	83f6eef506	[improvement](routine-load) Make routine load work with old kafka version (#7630 ) Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2022-01-10 17:30:24 +08:00
wangyongfeng	68c87de69e	[fix](website) fix CaseList component bug (#7683 )	2022-01-10 14:46:05 +08:00
weajun	3a8a85b739	[Optimize][Extension] optimize extension datax doriswriter，Remove import doris via csv in Dataxwriter, only support via json (#7568 ) * 1.Remove import doris via csv in Dataxwriter, only support via json; 2.Format Dataxwriter code; 3.Optimize exception handling and reduce multiple output of exception logs; 4.Update the dataxwriter's documentation; * Delete DorisCsvCodec.java delete unused file extension/DataX/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java * 1.remove `format` config key; 2.Optimize serialization code in DorisJsonCodec class	2022-01-09 13:27:52 +08:00
Zhengguo Yang	ad35067a2a	[chore][docs] add deploy spark/flink connectors to maven release repo docs (#7616 )	2022-01-06 23:23:33 +08:00
924060929	563545475e	[Optimize](Runtime Filter) Support merge in runtime filter(#7546 ) (#7547 ) Support merge IN predicate when exist remote target(e.g. shuffle hash join). Remote the code that IN predicate implicit conversion to Bloom filter then exist remote target. Close related #7546	2022-01-06 19:08:35 +08:00
Zhengguo Yang	738d2d2e07	[refactor] update parent pom version and optimize build scripts (#7548 )	2022-01-05 10:45:11 +08:00
Zhengguo Yang	bf4a867e85	[improvement](tablet-repair) add a config repair_slow_replica (#7423 ) Add a new FE config `repair_slow_replica` when this config is true, Doris will try to delete the replica with the largest number of versions, and then rebalance the replica. Usually, when the number of versions of a certain replica is much higher then that of other replicas, there are some problems with the current be's compilation. Migrating to other machines can typically solve this problem.	2022-01-04 10:28:14 +08:00
Henry2SS	6657524c51	[feature](sql-block-rule) add partition_num, tablet_num, cardinality in SqlBlockRule to block big/slow sql (#7403 ) Add partitionNum, tabletNum, cardinality in SqlBlockRule to block large/slow sql. 1. set partitionNum, tabletNum, cardinality as limitations to block sqls 2. compatible with lower version 3. add unit tests 4. add docs	2022-01-04 09:59:41 +08:00
xuzifu666	a60d86c1e1	[improvement](broker) add disable cache config for broker (#7506 )	2021-12-31 16:48:55 +08:00
Mingyu Chen	b2c5f25ef4	[docs] add more faq and FE debugging method (#7422 ) 1. Add more faq and FE debugging method. 2. Add security document.	2021-12-31 09:55:04 +08:00
zhengshiJ	723ee84a66	[feature] (planner) InferPredicate (#7096 ) This pr is for #7096 , which is add a rewrite rule for infer predicate. For example: origin stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1 rewrite stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1 and t1.id=1 and t3.id=1 + Add a switch enable_infer_predicate to control whether to perform predicate expansion. + Register a new rule InferFiltersrule and add it to GlobalState. + Traverse Conjunct to construct on/where equivalence connection, numerical connection and isNullPredicate. + Infer all equivalence connections + Construct additional numerical connections and isNullPredicate	2021-12-30 13:24:30 +08:00
pengxiangyu	dc9cd34047	[docs] Add user manual for hdfs load and transaction. (#7497 )	2021-12-30 10:22:48 +08:00
Mingyu Chen	a2d6e6e06f	[improvement](config) Modify default value of some brpc config (#7493 ) 1. Change `brpc_socket_max_unwritten_bytes` to 1GB This can make the system more fault-tolerant. Especially in the case of high system load, try to reduce EOVERCROWDED errors. 2. Change `brpc_max_body_size` to 3GB To handle some large object such as bitmap or string.	2021-12-28 16:47:53 +08:00
Zhengguo Yang	07e2acb2f3	[feature] Suport national secret (national commercial password) algorithm SM3/SM4 (#7464 ) SM3 is password hash algorithm SM4 is a block cipher used to replace DES / AES and other international algorithms.	2021-12-28 10:39:54 +08:00
wangyongfeng	6e052f4ede	[Doc][Website] blogs are sorted by date (#7491 ) * blogs are sorted by date Co-authored-by: 943155336 <wangyongfeng> Co-authored-by: jiafeng.zhang <zhangjf1@gmail.com>	2021-12-27 14:30:08 +08:00
jiafeng.zhang	80587e7ac2	[improvement](spark-connector)(flink-connector) Modify the max num of batch written by Spark/Flink connector each time. (#7485 ) Increase the default batch size and flush interval	2021-12-26 11:13:47 +08:00
Lijia Liu	ca97535491	[docs](executor) correct some be error code (#7460 ) correct some be error code in doc.	2021-12-26 11:06:54 +08:00
zhangstar333	0c154733e0	[feature](function) support bitmap_union/intersect have more columns parameters (#7379 ) support multi bitmap parameter for all bitmap aggregation function	2021-12-26 11:03:20 +08:00
Heng Zhao	43ed54faa1	[docs] The name of hidden column is incorrect in batch-delete-manual.md(#7465 ) (#7466 )	2021-12-24 21:30:57 +08:00
jakevin	c596b0362c	[docs](docker) Add document of docker dev (#7447 ) Add development document using docker	2021-12-24 21:27:39 +08:00
xtr_1993	889e33d53d	[docs](seatunnel) Seatunnel Supports Doris connector (#7453 )	2021-12-22 23:29:02 +08:00
Mingyu Chen	2ab3a66e7a	[docs][community] Remove articles (#7449 ) The articles will be moved to https://github.com/apache/incubator-doris-website And I will modify the READ of incubator-doris-website later	2021-12-21 18:50:09 +08:00
jiafeng.zhang	695eca8cbc	[docs] add bloomfilter index doc (#7318 ) * add bloomfilter index doc	2021-12-21 11:05:20 +08:00
Henry2SS	998489ac50	[fix](sql-block-rule) move sql block rule check from ConnectProcessor to StmtExecutor (#7407 ) SqlBlockRule should block only query stmt. And exclude explain stmt.	2021-12-21 10:25:09 +08:00
yjant	e74e55d2a4	[docs] Fix typos (#7404 ) There are a few typos in the document, which have been corrected by me	2021-12-19 18:31:35 +08:00
Mingyu Chen	e9536a8cf1	[deps](cyrus_sasl) Add -fPIC for cyrus_sasl (#7408 )	2021-12-17 13:11:25 +08:00
caiconghui	06c38ce46e	[enhancement] Make concurrent_number for routine load task can be larger than be num (#7386 ) * [enhancement] Make concurrent_number for routine load task can be larger than be num Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2021-12-17 11:04:29 +08:00
Mingyu Chen	0499b2211b	[feat](lateral-view) Support execution of lateral view stmt (#7255 ) 1. Add table function node 2. Add 3 table functions: explode_split, explode_bitmap and explode_json_array	2021-12-16 10:46:15 +08:00
Heng Zhao	5fed8a94ae	[docs](flink-connector) Add instructions for flink doris connector (#7384 )	2021-12-16 10:43:21 +08:00
wangyongfeng	6dd312b21e	[docs](website) develop the caseList component (#7402 ) Remove user cases to a submenu	2021-12-16 10:41:11 +08:00
Mingyu Chen	2b90967c4c	[fix][refactor](broker load) refactor the scheduling logic of broker load (#7371 ) 1. Refactor the scheduling logic of broker load. Details see #7367 2. Fix bug that loadedBytes in SHOW LOAD result is wrong. 3. Cancel the thread of LoadTimeoutChecker Now for PENDING load jobs, there will be no timeout. And the timeout of a load job start when pending load task is scheduled. 4. Fix a bug that the loading task is never submitted to the pool. The logic of BlockedPolicy is wrong. We should make sure the task is submitted to the pool, or the RejectedExecutionException should be thrown. 5. Now the transaction of a load job will begin in pending task, instead of when submitting the job.	2021-12-16 10:39:22 +08:00

... 36 37 38 39 40 ...

2730 Commits