doris

Author	SHA1	Message	Date
Mingyu Chen	0e79f6908b	[CodeRefactor] Modify FE modules (#4146 ) This CL mainly changes: 1. Add 2 new FE modules 1. fe-common save all common classes for other modules, currently only `jmockit` 2. spark-dpp The Spark DPP application for Spark Load. And I removed all dpp related classes to this module, including unit tests. 2. Change the `build.sh` Add a new param `--spark-dpp` to compile the `spark-dpp` alone. And `--fe` will compile all FE modules. the output of `spark-dpp` module is `spark-dpp-1.0.0-jar-with-dependencies.jar`, and it will be installed to `output/fe/spark-dpp/`. 3. Modify some bugs of spark load	2020-07-29 16:18:05 +08:00
caiconghui	1b3af783e6	[Plugin] Add properties grammar in InstallPluginStmt (#4173 ) This PR is to support grammar like the following: INSTALL PLUGIN FROM [source] [PROPERTIES("KEY"="VALUE", ...)] user can set md5sum="xxxxxxx", so we don't need to provide a md5 uri.	2020-07-29 15:02:31 +08:00
Mingyu Chen	83a751497e	[Bug][Socket Leak] Fix bug that Mysql NIO server is leaking sockets (#4192 ) When using mysql nio server, if the mysql handshake protocol fails, we need to actively close the channel to prevent socket leakage.	2020-07-29 15:01:27 +08:00
wutiangan	59676a1117	[BUG] fix 4149, add sessionVariable to choose broadcastjoin first when cardinality cannot be estimated (#4150 )	2020-07-29 12:28:52 +08:00
Lijia Liu	79b4f92cb7	Rewrite GroupByClause.oriGroupingExprs (#4197 )	2020-07-29 12:27:15 +08:00
xy720	841f9cd07b	[Bug][SparkLoad] Divide the upload in spark repository into two steps (#4195 ) When Fe uploads the spark archive, the broker may fail to write the file, resulting in the bad file being uploaded to the repository. Therefore, in order to prevent spark from reading bad files, we need to divide the upload into two steps. The first step is to upload the file, and the second step is to rename the file with MD5 value.	2020-07-28 16:24:07 +08:00
caiconghui	150f8e0e2b	Support check committed txns before catalog drop meta, like db, table, partition etc (#4029 ) This PR is to ensure that dropped db , table or partition can be with normal state after recovered by user. Commited txns can not be aborted, because the partitions's commited versions have been changed, and some tablets may already have new visible versions. If user just don't want the meta(db, table or partition) anymore, just use drop force instead of drop to skip committed txn check.	2020-07-28 15:18:52 +08:00
Zhengguo Yang	90eaa514ba	[SQL][JCUP]Reduce conflict of sql_parser.cup (#4177 ) Fix the Shift/Reduce conflict in cup file #4176	2020-07-28 10:04:50 +08:00
caiconghui	94ac0f43dc	Use LongAdder or volatile long to replace AtomicLong in some scenarios (#4131 ) This PR is to use LongAdder or volatile long to replace AtomicLong in some scenarios. In the statistical summation scenario, LongAdder(introduced by jdk1.8) has better performance than AtomicLong in high concurrency update scenario. And if we just want to keep get and set operation for variable to be atomic, just add volatile at the front of the variable is enough, use AtomicLong is a little heavy. NOTE: LongAdder is usually preferable to AtomicLong when multiple threads update a common sum that is used for purposes such as collecting statistics, not for fine-grained synchronization control, such as auto-incremental ids.	2020-07-27 15:48:35 +08:00
xy720	f2c9e1e534	[Spark Load]Create spark load's repository in HDFS for dependencies (#4163 ) ### Resume When users use spark load, they have to upload the dependent jars to hdfs every time. This cl will add a self-generated repository under working_dir folder in hdfs for saving dependecies of spark dpp programe and spark platform. Note that, the dependcies we upload to repository include: 1、`spark-dpp.jar` 2、`spark2x.zip` 1 is the dpp library which built with spark-dpp submodule. See details about spark-dpp submodule in pr #4146 . 2 is the spark2.x.x platform library which contains all jars in $SPARK_HOME/jars The repository structure will be like this: ``` __spark_repository__/ \|-__archive_1_0_0/ \| \|-__lib_990325d2c0d1d5e45bf675e54e44fb16_spark-dpp.jar \| \|-__lib_7670c29daf535efe3c9b923f778f61fc_spark-2x.zip \|-__archive_2_2_0/ \| \|-__lib_64d5696f99c379af2bee28c1c84271d5_spark-dpp.jar \| \|-__lib_1bbb74bb6b264a270bc7fca3e964160f_spark-2x.zip \|-__archive_3_2_0/ \| \|-... ``` The followinng conditions will force fe to upload dependencies: 1、When fe find its dppVersion is absent in repository. 2、The MD5 value of remote file does not match the local file. Before Fe uploads the dependencies, it will create an archive directory with name `__archive_{dppVersion}` under the repository.	2020-07-27 01:48:41 +00:00
HaiBo Li	ed8cb6a002	[Feature][Meta]Update/Read/Write VisibleVersionTime for Partition#4076 (#4086 ) #4076 1. The visibleVersionTime is updated when insert data to partition 2. GlobalTransactionMgr call partition.updateVisibleVersionAndVersionHash(version, versionHash) when fe is restarted 3. If fe restart, VisibleVersionTime may be changed, but the changed value is newer than the old value	2020-07-26 21:20:55 +08:00
caiconghui	1f7009354a	[Bug] Add db read lock when processing unfinished publish task (#4178 ) This Bug was introduced by PR #4053, here should add db read lock when processing unfinished publish task.	2020-07-26 20:15:14 +08:00
caiconghui	911eb04594	[Bug][UpdateDataQuota] Skip update used data quota for information_schema db and fix bug for wrong time interval for UpdateDbUsedDataQuotaDaemon (#4175 ) This PR is to skip update used data quota for information_schema db, and fix bug for wrong time interval for UpdateDbUsedDataQuotaDaemon.	2020-07-26 20:14:03 +08:00
HangyuanLiu	4d828d2411	Fix recover database not in "show databases" (#4170 )	2020-07-25 10:04:35 +08:00
caiconghui	4608f9786e	Support checking database used data quota when data load job begin a new txn (#3955 ) Now, we only check database used data quota when create or alter table, or in some old type load job, but not for routine load job and stream load job. This PR provide a uniform solution to check db used data quota when data load job begin a new txn.	2020-07-24 10:03:43 +08:00
caiconghui	28f4d30542	Optimize the logic of processing unfinishedTask when transaction is publishTimeout (#4053 ) This PR is to optimize the logic of processing unfinishedTask when transaction is publishTimeout, we find errorReplica by "db -> table -> partition -> index -> tablet(backendId) -> replica" path.	2020-07-24 09:59:01 +08:00
caiconghui	2334f5d997	Fix some problem related with publish version task (#4089 ) This PR is mainly do following three things: 1. Add thread name in fe log to make trace problem more easy. 2. Add agent_task_resend_wait_time_ms config to escape sending duplicate agent task to be. 3. Skip to continue to update replica version when new version is lower than replica version in fe.	2020-07-23 20:06:02 +08:00
Mingyu Chen	d66609de85	[Code Structure] Move the code file to the right place (#4154 ) IsNullPredicateTest.java is not is right place	2020-07-23 15:49:52 +08:00
Lijia Liu	e4f5a2936b	[TabletRepair] Delete bad replicas when no BE can be used to create new replica When there is no available BE for relocating replicas, delete the bad replica first.	2020-07-22 22:42:31 +08:00
Zhengguo Yang	31a6c43a69	[Bug][Alter] Fix boolean support (#4123 ) Fixes #4122 * add type check when add bloom filter index on boolean column. * support add boolean column.	2020-07-22 22:38:55 +08:00
HangyuanLiu	5c4bba107e	[Bug] Fix isnull(null) analyze error (#4094 )	2020-07-22 20:04:39 +08:00
Mingyu Chen	ad17afef91	[CodeRefactor] #4098 Make FE multi module (#4099 ) This PR change the FE code structure to maven multi module structure. See ISSUE: #4098 for more info, such as How to resolve conflicts.	2020-07-21 12:42:42 +08:00
EmmyMiao87	2de4f2471b	[MV] Add framework of mv selector (#4014 ) This commit mainly supports creating bitmap_union, hll_union, and count materialized views. * The main changes are as follows: 1. When creating a materialized view, doris judge the semantic analysis of the newly supported aggregate function. Only bitmap_union(to_bitmap(column)), hll_union(hll_hash(column)) and count(column) are supported. 2. Match the correct materialized view when querying. After the user sends the query, if there is a possibility of matching the materialized view, the query will be rewritten firstly. Such as: Table: k1 int, k2 int MV: k1 int, mv_bitmap_union_k2 bitmap mv_bitmap_union mv_bitmap_union = to_bitmap(k2) Query: select k1, count(distinct k2) from Table Found that there is a match between the materialized view column and the query column, the query is rewritten as: Rewritten query: select k1, bitmap_union_count(mv_bitmap_union_k2) from table Then when the materialized view is matched, it can be matched to the query materialized view table. Sometimes the rewritten query may not match any materialized view, which means that the rewriting failed. The query needs to be re-parsed and executed again.	2020-07-20 17:26:40 +08:00
lichaoyong	fbf7bd6a1d	[Bug] Change get load state interface (#4081 ) Now, the PathTrie will match wrong interface between /api/{db}/{table} and /api/{db}/{label}	2020-07-20 15:51:27 +08:00
Mingyu Chen	bb35de2ccb	[Bug][Alter] Cancel the alter job if database has been dropped (#4088 ) Cancel the alter job in WAITING_TXN state if database has been dropped Fix: #4087	2020-07-19 21:26:57 +08:00
HangyuanLiu	de3c4b198e	Support materialized view extend column in load and insert (#3677 ) This commit mainly supports load bitmap_union, hll_union, and count materialized views. The main changes are as follows: 1、insert stmt support load extend column 2、load stmt support load extend column Issue : #3344 Co-authored-by: HangyuanLiu <460660956@qq.com>	2020-07-17 10:43:36 +08:00
caiconghui	db50c19aad	[Thread Resource Leak] Fix thread resource leak after checkpoint catalog destroyed (#4049 ) This PR is mainly to fix thread resource leak, and then add some notice to use newDaemonScheduledThreadPool api in ThreadPoolManager.	2020-07-16 22:38:39 +08:00
Lijia Liu	1aec46b215	[Bug] Do not choose decommissioned BE in colocate balance For #4102 (#4103 ) When set `disable_colocate_balance` to false and set some BE to decommission, `Coloratebalancer#balanceGroup` will choose decommissioned BE to locate tablets, which is not right Fix #4102	2020-07-16 22:34:51 +08:00
HangyuanLiu	5032b7fe7a	Support materialized view schema change in bitmap hll and count field [#3739 ] (#3873 ) + Building the materialized view function for schema_change here based on defineExpr. + This is a trick because the current storage layer does not support expression evaluation. + count distinct materialized view will set mv_expr with to_bitmap or hll_hash. + count materialized view will set mv_expr with count. + Support to regenerate historical data when a new materialized view is created in BE。 + Support to_bitmap function + Support hll_hash function + Support count(field) function For #3344	2020-07-16 10:45:15 +08:00
Zhengguo Yang	78a1dea19d	Support using B/K/KB/M/MB/G/GB/T/TB/P/PB as unit in session variable exec_mem_limit (#4063 ) Support using B/K/KB/M/MB/G/GB/T/TB/P/PB as unit in session variable exec_mem_limit	2020-07-13 20:54:14 +08:00
EmmyMiao87	e435e6f9a8	[Bug][Planner]Fix bug of count() in MV selector (#4060 ) The output columns of query should be collected by all of tupleIds in BaseTableRef rather than the top tupleIds of query. The top tupleIds of count() is Agg tuple which does not expand the star. Fixed #4065	2020-07-13 20:53:10 +08:00
WingC	d7893f0fa7	[Bug]Fix some schema change not work right (#4009 ) [Bug]Fix some schema change not work right This CL mainly fix some schema change to varchar type not work right because forget to logic check && Add ConvertTypeResolver to add supported convert type in order to avoid forget logic check	2020-07-11 10:18:29 +08:00
Yunfeng,Wu	265c26f67d	[Doris On ES] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055 )	2020-07-10 18:37:36 +08:00
yangzhg	ebaa0c7137	[Bug][SQL]Fix predicate pushdown may incorrect when groupby with grouping sets (#4041 ) Fixes #4040 Fix predicate pushdown may incorrect when groupby with grouping sets	2020-07-09 21:49:37 +08:00
xy720	d2ab38a5e0	[Feature] Batch update partition's property in one command (#3981 ) Support following command. ``` alter table tbl_name modify partition (p1, p2, p3) set ("replication_num" = "3"); ```	2020-07-09 21:48:43 +08:00
caiconghui	5a27981e49	[Config] Add thrift_client_retry_interval_ms config in be for thrift client to avoid avalanche disaster in fe thrift server (#4022 ) This PR is mainly to add `thrift_client_retry_interval_ms` config in be for thrift client to avoid avalanche disaster in fe thrift server and fix some typo and some rpc setting problems at the same time.	2020-07-08 21:07:00 +08:00
wutiangan	fb0ecb70fd	[SQL]fix inline view join mysql choose shuffle join bug (#4048 ) fix #4047 #3886 has certain relevance to this case。 the sql : `bigtable t1 join mysqltable t2 join mysqltable t3 on t1.k1 = t3.k1` 1. after reorder: t1, t2, t3 2. choose join t1 with t2: t1 join t2 with no conditions, and Doris choose cross join 3. choose join (t1 join on t2) with t3: in old code, the t2 is mysqlTable, so the cardinality is zero, and "the cross join t1 with t2" 's cardinality is t1.cardinality multiply t2.cardinality, for t2 is mysql, so t2.cardinality is zero, and "the cross join t1 with t2" is zero. t3 is mysqltable, t3's cardinality is zero. If two tables need to be joined both are zero，we will choose the shuffle join So I change the mysql table ‘s cardinality from 0 to 1, the cross join's cardinality is not zero.	2020-07-08 20:56:24 +08:00
Mingyu Chen	7715a84d4d	[Config] Enable some features by default (#4031 ) Its time to enable some features by default. 1. Enable FE plugins by setting `plugin_enable=true` 2. Enable dynamic partition by setting `dynamic_partition_enable=true` 3. Enable nio mysql server by setting `mysql_service_nio_enabled=true` Also modify installation doc, add download link of MySQL client.	2020-07-08 09:59:10 +08:00
caiconghui	b7051d0971	[Config]Make it easier for users to find configuration items needed (#3957 ) This PR is to make config items ordered by key and support like predicate for admin show config stmt	2020-07-07 23:12:21 +08:00
Lijia Liu	1aa148da7f	[Bug]Fix mini load NPE (#4026 ) for #4025	2020-07-07 23:08:08 +08:00
yangzhg	5c42514a8f	[Bug][SQL]Fix except node child not order correctly (#4003 ) Fixes #3995 ## Why does it happen When SetOperations encounters that the previous node needs Aggregate, the timing of add AggregationNode is wrong. You should add AggregationNode first before add other children. ## Why doesn't intersect and union have this problem intersect and union conform to the commutation law, so it doesn't matter if the order is wrong ## Why this problem has not been tested before In the previous test case, not cover the previous node was not AggregationNode	2020-07-07 23:06:36 +08:00
Yunfeng,Wu	1cc9e1606f	[Doris On ES] Add UT test for all search phase (#4035 ) I forget push some UT test in this PR #4012. Also remove `_cluster/state` resource because DOE does not rely the full ES cluster state meta.	2020-07-07 23:05:02 +08:00
lichaoyong	c9a7c373a7	[Bug] Return actual json for ConnectionAction (#4016 )	2020-07-07 20:14:55 +08:00
Yunfeng,Wu	3ba38e3381	[Doris On ES][Refactor] refactor and enchanment ES sync meta logic (#4012 ) After PR #3454 was merged, we should refactor and reorganize some logic for long-term sustainable iteration for Doris On ES. To facilitate code review，I would divided into this work to multiple PRs (some other WIP work I also need to think carefully) This PR include: 1. introduce SearchContext for all state we needed 2. divide meta-sync logic into three phase 3. modify some logic processing 4. introduce version detect logic for future using	2020-07-07 09:04:05 +08:00
WingC	913b2caac4	[Dynamic Partition]Support set replication number (#3965 ) This CL mainly support set replication_num property in dynamic partition table if dynamic_partition.replication_num is not set, the value is the same as table's default replication_num.	2020-07-05 16:28:38 +08:00
WingC	fa338fb6d9	[Bug][Memroy Leak]Fix bug TransactionState is not clear from idToFinalStatusTransactionState (#4013 ) This CL includes: 1. Memory leak because transactionState is not removed. 2. Extracting the clear logic to method to avoid forget.	2020-07-05 16:27:41 +08:00
Mingyu Chen	ba120292ab	[ShowIndex] Make Show Index stmt act same as MySQL behavior (#4010 ) `SHOW INDEX FROM db2.tbl1 FROM db1;` will be same as `SHOW INDEX FROM db1.tbl1;`	2020-07-05 16:26:54 +08:00
WingC	1fc82cd6e4	[Code Cleanup]Use ThreadPoolManager to manage some native thread (#3997 ) Now, FE use ThreadPoolManager to manage and monitor all Thread, but there are still some threads are not managed. And FE use `Timer` class to do some scheduler task, but `Timer` class has some problem and is out of date, It should replace by ScheduledThreadPool.	2020-07-05 16:26:22 +08:00
WingC	7351f7c237	[Config]Allower use to config different thrift server model (#3986 ) Doris only support TThreadPoolServer model in thrift server, but the server model is not effective in some high concurrency scenario, so this PR introduced new config to allow user to choose different server model by their scenario. Add new FE config: `thrift_server_type`	2020-07-05 16:24:29 +08:00
wutiangan	f521507a46	[SQL] Explain verbose stmt to print tupleDesc/slotDesc information (#3970 )	2020-07-05 16:22:43 +08:00

... 93 94 95 96 97 ...

5755 Commits