doris

Author	SHA1	Message	Date
Mingyu Chen	237c0807a4	[RoutineLoad] Support modify routine load job (#4158 ) Support ALTER ROUTINE LOAD JOB stmt, for example: ``` alter routine load db1.label1 properties ( "desired_concurrent_number"="3", "max_batch_interval" = "5", "max_batch_rows" = "300000", "max_batch_size" = "209715200", "strict_mode" = "false", "timezone" = "+08:00" ) ``` Details can be found in `alter-routine-load.md`	2020-08-06 23:11:02 +08:00
EmmyMiao87	c98b411500	[Bug] Revert part of #4199 to avoid BE crash(#4269 ) Revert “Change type of sum, min, max function column in mv” This pr is revert pr #4199 . The daily test is cored when the type of mv column has been changed. So I revert the pr. The daily core will be fixed in the future. After that, the pr#4199 will be enable. Change-Id: Ie04fcfacfcd38480121addc5e454093d4ae75181	2020-08-06 19:06:00 +08:00
lichaoyong	b62ff8508f	Revert "[BUG] Using attachement strategy of brpc to send packet with big size. (#4237 )" (#4267 ) This reverts commit 120f30bcaec5ba8318ba1849b513b5d06d8df281.	2020-08-06 08:56:07 +00:00
EmmyMiao87	173bc09833	[Alter]Analyze define expr before replay Rollup job (#4236 ) The define expr should be analyzed after replay RollupJob. The slot desc of define expr is used to transfrom to thrift and send to backend.	2020-08-05 21:47:18 +08:00
gengjun-git	a4f3d43e15	fix version check bug (#4244 ) Co-authored-by: gengjun <gengjun@dorisdb.com>	2020-08-05 21:45:36 +08:00
Fullstop000	421828d52a	[Doc] Fix format in doris_storage_optimization.md (#4250 )	2020-08-05 21:45:03 +08:00
Zhengguo Yang	1b341601fe	Generate jave files using maven (#4133 ) generate generated-java files using maven instead of by build.sh	2020-08-05 15:20:39 +08:00
lichaoyong	120f30bcae	[BUG] Using attachement strategy of brpc to send packet with big size. (#4237 ) Using attachement strategy of brpc to send packet with big size. BRPC send packet should serialize it first and then send it. If we send one batch with big size, it will encounter a connection failed. So we can use attachment strategy to bypass the problem and eliminate the serialization cost.	2020-08-05 10:29:41 +08:00
HuangWei	bfb8c654c1	[Bug] Fix UT bug after making MemTracker shared (#4243 ) after making MemTracker shared(#4135), some code haven't been fixed, and add some useless ut back to build. Fixed in this pr.	2020-08-04 17:52:11 +08:00
Mingyu Chen	3f31866169	[Bug][Load][Json] #4124 Load json format with stream load failed (#4217 ) Stream load should read all the data completely before parsing the json. And also add a new BE config streaming_load_max_batch_read_mb to limit the data size when loading json data. Fix the bug of loading empty json array [] Add doc to explain some certain case of loading json format data. Fix: #4124	2020-08-04 12:55:53 +08:00
Zhengguo Yang	de7f83230c	[Delete][Bug]fix decimal delete error (#4228 ) Querys like DELETE FROM tbl WHERE decimal_key <= "123.456"; when the type of decimal_key is DECIMALV2 may failed randomly, this is because the precision and scale is not initialized.	2020-08-02 22:10:56 +08:00
HappenLee	5caa347e86	[ColocateJoin] ColocateJoin support table join itself (#4230 ) (#4231 ) if left table and right table is same table, they are naturally colocate relationship.	2020-08-02 22:05:45 +08:00
HangyuanLiu	85e0a68783	[SQL][Bug] Fix multi predicate in correlation subquery analyze fail (#4211 )	2020-08-02 22:05:23 +08:00
WangCong	d64d65322b	[Bug][DynamicPartition]Fix bug that Modify a dynamic partition property in a non-dynamic partition table will throw a Exception (#4127 )	2020-08-02 22:03:57 +08:00
Lijia Liu	bdaef84a10	[FE] [HttpServer] Config netty param in HttpServer (#4225 ) Now, if the length of URL is longer than 4096 bytes, netty will refuse. The case can be reproduced by constructing a very long URL(longer than 4096bytes) Add 2 http server params: 1. http_max_line_length 2. http_max_header_size	2020-08-01 17:59:01 +08:00
HangyuanLiu	116d7ffa3c	[SQL][Function] Add approx_count_distinct() function (#4221 ) Add approx_count_distinct() function to replace the ndv() function	2020-08-01 17:54:19 +08:00
sduzh	3ce6fc631e	[BUG] Fix wrong result of querying with cast expr in where clause (#4219 )	2020-08-01 17:46:39 +08:00
ZhangYu0123	16c89c7d56	[BUG]Fix remove expired stale rowset path order error (#4214 ) Delete stale rowset path order error. This bug leads to stale rowsets version inconsistents. #4213	2020-08-01 17:44:39 +08:00
kangkaisen	c32ddce0b5	[SQL][BUG]Fix window function with limit zero bug (#4207 )	2020-08-01 17:43:47 +08:00
EmmyMiao87	25f3420855	[MaterializedView] Change type of sum, min, max function column in mv (#4199 ) If the agg function is sum, the type of mv column will be bigint. The only exception is that if the base column is largeint, the type of mv column will be largeint. If the agg function is min or max, the type of mv column will be same as the type of base column. For example, the type of mv column is smallint when the agg function is min.	2020-08-01 17:43:23 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
HappenLee	f412f99511	[Bug][ColocateJoin] Make a wrong choice of colocate join (#4216 ) If table1 and table2 are colocated using column k1, k2. Query should contains all of the k1, k2 to apply colocation algorithm. Query like select * from table1 inner join table2 where t1.k1 = t2.k1 can not be used as colocation. We add the rule to avoid the problem.	2020-07-31 15:18:00 +08:00
HaiBo Li	1ebd156b99	[Feature]Add fetch/update/clear proto of fe&be for cache (#4190 )	2020-07-31 13:23:24 +08:00
HappenLee	e6059341e8	[Spill To Disk][2/6] Add some runningtime function and change some function will be called in the future (#4152 )	2020-07-30 14:21:48 +08:00
HaiBo Li	b4cb8fb9b2	[Feature][Cache]Add interface, metric, variable and config for query cache (#4159 )	2020-07-30 11:24:20 +08:00
worker24h	fdcc223ad2	[Bug][Json] Refactor the json load logic to fix some bug 1. Add `json_root` for nest json data. 2. Remove `_jmap` to make the logic reasonable.	2020-07-30 10:36:34 +08:00
HappenLee	594e53ec92	[Spill To Disk][1/6] The adjustment of the basic BufferedBlockMgr includes the following change (#4151 ) 1. Add Exec msg for BufferedBlockMgr for debug tuning 2. Change the API of Consume Memory? We will use it in HashTable in the future 3. Fix mistake of count _unfullfilled_reserved_buffers in BufferedBlockMgr	2020-07-30 10:29:12 +08:00
caiconghui	237271c764	[Bug] Fix fe meta version problem, make drop meta check code easy to read and add doc content for drop meta check (#4205 ) This PR is mainly do three things: 1. Fix fe meta version bug introduced by #4029 , when fix conflict with #4086 2. Make drop check code easy to read 3. Add doc content for drop meta check	2020-07-30 09:54:20 +08:00
Zhao Chun	08403eed22	[Bug]#4181 Let linked-schema change work for BETA tablet (#4182 ) Let linked-schema change work for BETA tablet	2020-07-30 09:51:18 +08:00
Mingyu Chen	8a169981cf	[Bug][TabletRepair] Fix bug that too many replicas generated when decommission BE (#4148 ) Try to select the BE with an existing replicas as the destination BE for REPLICA_RELOCATING clone task. Fix #4147 Also add 2 new FE configs `max_clone_task_timeout_sec` and `min_clone_task_timeout_sec`	2020-07-30 09:46:33 +08:00
HangyuanLiu	abeb25d2a9	Fx large int literal (#4168 )	2020-07-30 00:53:50 +08:00
Mingyu Chen	0e79f6908b	[CodeRefactor] Modify FE modules (#4146 ) This CL mainly changes: 1. Add 2 new FE modules 1. fe-common save all common classes for other modules, currently only `jmockit` 2. spark-dpp The Spark DPP application for Spark Load. And I removed all dpp related classes to this module, including unit tests. 2. Change the `build.sh` Add a new param `--spark-dpp` to compile the `spark-dpp` alone. And `--fe` will compile all FE modules. the output of `spark-dpp` module is `spark-dpp-1.0.0-jar-with-dependencies.jar`, and it will be installed to `output/fe/spark-dpp/`. 3. Modify some bugs of spark load	2020-07-29 16:18:05 +08:00
caiconghui	1b3af783e6	[Plugin] Add properties grammar in InstallPluginStmt (#4173 ) This PR is to support grammar like the following: INSTALL PLUGIN FROM [source] [PROPERTIES("KEY"="VALUE", ...)] user can set md5sum="xxxxxxx", so we don't need to provide a md5 uri.	2020-07-29 15:02:31 +08:00
Mingyu Chen	83a751497e	[Bug][Socket Leak] Fix bug that Mysql NIO server is leaking sockets (#4192 ) When using mysql nio server, if the mysql handshake protocol fails, we need to actively close the channel to prevent socket leakage.	2020-07-29 15:01:27 +08:00
wutiangan	59676a1117	[BUG] fix 4149, add sessionVariable to choose broadcastjoin first when cardinality cannot be estimated (#4150 )	2020-07-29 12:28:52 +08:00
Lijia Liu	79b4f92cb7	Rewrite GroupByClause.oriGroupingExprs (#4197 )	2020-07-29 12:27:15 +08:00
ZhangYu0123	f292d80266	[Bug][SchemaChange]Fix alter schema add key column bug in agg model (#4143 ) Fix the bug "alter schema add key column bug in agg model when using LinkedSchemaChange policy", the detail description #4142.	2020-07-28 20:53:04 +08:00
xy720	841f9cd07b	[Bug][SparkLoad] Divide the upload in spark repository into two steps (#4195 ) When Fe uploads the spark archive, the broker may fail to write the file, resulting in the bad file being uploaded to the repository. Therefore, in order to prevent spark from reading bad files, we need to divide the upload into two steps. The first step is to upload the file, and the second step is to rename the file with MD5 value.	2020-07-28 16:24:07 +08:00
caiconghui	150f8e0e2b	Support check committed txns before catalog drop meta, like db, table, partition etc (#4029 ) This PR is to ensure that dropped db , table or partition can be with normal state after recovered by user. Commited txns can not be aborted, because the partitions's commited versions have been changed, and some tablets may already have new visible versions. If user just don't want the meta(db, table or partition) anymore, just use drop force instead of drop to skip committed txn check.	2020-07-28 15:18:52 +08:00
Zhengguo Yang	90eaa514ba	[SQL][JCUP]Reduce conflict of sql_parser.cup (#4177 ) Fix the Shift/Reduce conflict in cup file #4176	2020-07-28 10:04:50 +08:00
Zhengguo Yang	a2b53b8ddd	[Profile] Add transfer destinations detail to profile (#4161 ) Add transfer destinations detail to profile	2020-07-27 23:37:50 +08:00
Zhengguo Yang	50e6a2c8a0	[SQL][Function] Fix from/to_base64 may return incorrect value (#4183 ) from/to_base64 may return incorrect value when the value is null #4130 remove the duplicated base64 code fix the base64 encoded string length is wrong， and this will cause the memory error	2020-07-27 22:55:05 +08:00
caiconghui	9e5ca697f3	[Doc] Fix typo for stream load content in basic-usage.md (#4185 )	2020-07-27 16:50:15 +08:00
caiconghui	94ac0f43dc	Use LongAdder or volatile long to replace AtomicLong in some scenarios (#4131 ) This PR is to use LongAdder or volatile long to replace AtomicLong in some scenarios. In the statistical summation scenario, LongAdder(introduced by jdk1.8) has better performance than AtomicLong in high concurrency update scenario. And if we just want to keep get and set operation for variable to be atomic, just add volatile at the front of the variable is enough, use AtomicLong is a little heavy. NOTE: LongAdder is usually preferable to AtomicLong when multiple threads update a common sum that is used for purposes such as collecting statistics, not for fine-grained synchronization control, such as auto-incremental ids.	2020-07-27 15:48:35 +08:00
xy720	f2c9e1e534	[Spark Load]Create spark load's repository in HDFS for dependencies (#4163 ) ### Resume When users use spark load, they have to upload the dependent jars to hdfs every time. This cl will add a self-generated repository under working_dir folder in hdfs for saving dependecies of spark dpp programe and spark platform. Note that, the dependcies we upload to repository include: 1、`spark-dpp.jar` 2、`spark2x.zip` 1 is the dpp library which built with spark-dpp submodule. See details about spark-dpp submodule in pr #4146 . 2 is the spark2.x.x platform library which contains all jars in $SPARK_HOME/jars The repository structure will be like this: ``` __spark_repository__/ \|-__archive_1_0_0/ \| \|-__lib_990325d2c0d1d5e45bf675e54e44fb16_spark-dpp.jar \| \|-__lib_7670c29daf535efe3c9b923f778f61fc_spark-2x.zip \|-__archive_2_2_0/ \| \|-__lib_64d5696f99c379af2bee28c1c84271d5_spark-dpp.jar \| \|-__lib_1bbb74bb6b264a270bc7fca3e964160f_spark-2x.zip \|-__archive_3_2_0/ \| \|-... ``` The followinng conditions will force fe to upload dependencies: 1、When fe find its dppVersion is absent in repository. 2、The MD5 value of remote file does not match the local file. Before Fe uploads the dependencies, it will create an archive directory with name `__archive_{dppVersion}` under the repository.	2020-07-27 01:48:41 +00:00
HaiBo Li	ed8cb6a002	[Feature][Meta]Update/Read/Write VisibleVersionTime for Partition#4076 (#4086 ) #4076 1. The visibleVersionTime is updated when insert data to partition 2. GlobalTransactionMgr call partition.updateVisibleVersionAndVersionHash(version, versionHash) when fe is restarted 3. If fe restart, VisibleVersionTime may be changed, but the changed value is newer than the old value	2020-07-26 21:20:55 +08:00
caiconghui	1f7009354a	[Bug] Add db read lock when processing unfinished publish task (#4178 ) This Bug was introduced by PR #4053, here should add db read lock when processing unfinished publish task.	2020-07-26 20:15:14 +08:00
caiconghui	911eb04594	[Bug][UpdateDataQuota] Skip update used data quota for information_schema db and fix bug for wrong time interval for UpdateDbUsedDataQuotaDaemon (#4175 ) This PR is to skip update used data quota for information_schema db, and fix bug for wrong time interval for UpdateDbUsedDataQuotaDaemon.	2020-07-26 20:14:03 +08:00
HangyuanLiu	4d828d2411	Fix recover database not in "show databases" (#4170 )	2020-07-25 10:04:35 +08:00
sduzh	b32500bda0	[Script] Restore build parallel config (#4166 )	2020-07-24 21:30:56 +08:00

1 2 3 4 5 ...

2187 Commits