doris

Author	SHA1	Message	Date
EmmyMiao87	c873c8c162	[fix](lateral view)(subquery) Forbidden directly AGG/SORT on lateral view (#7337 ) This PR mainly prohibits operations such as aggregation/sorting/window functions on lateral views containing subqueries. For example: select min(e1) from (select c1 from table group by c1)tmp1 lateral view explode_split(c1, ",") tmp2 as e1 But the query can be written in another way, and the result is the same. select min(e1) from (select e1 from (select c1 from table group by c1)tmp1 lateral view explode_split(c1, ",") tmp2 as e1) tmp3 The reason is that when the results of a inline view are subjected to a lateral view, and the outer query performs aggregation or sorting operations on non-table-function columns. The output slot id of the table function node is empty or has fewer columns. The essential reason is that when the inner layer contains an inline view, the outer expression needs to be mapped to the correct tuple through the substitute method according to the smap instead of the virtual tuple. But the substitute method of slot ref cannot recurse to its own source exprs. E.g SlotRef: c2 <source expr min(c1)> from agg tuple smap: <c1, c3> before: c2 <source expr min(c1)> after: c2 <source expr min(c1)> no changed	2021-12-16 15:42:39 +08:00
Mingyu Chen	0499b2211b	[feat](lateral-view) Support execution of lateral view stmt (#7255 ) 1. Add table function node 2. Add 3 table functions: explode_split, explode_bitmap and explode_json_array	2021-12-16 10:46:15 +08:00
Heng Zhao	5fed8a94ae	[docs](flink-connector) Add instructions for flink doris connector (#7384 )	2021-12-16 10:43:21 +08:00
wangyongfeng	6dd312b21e	[docs](website) develop the caseList component (#7402 ) Remove user cases to a submenu	2021-12-16 10:41:11 +08:00
Mingyu Chen	2b90967c4c	[fix][refactor](broker load) refactor the scheduling logic of broker load (#7371 ) 1. Refactor the scheduling logic of broker load. Details see #7367 2. Fix bug that loadedBytes in SHOW LOAD result is wrong. 3. Cancel the thread of LoadTimeoutChecker Now for PENDING load jobs, there will be no timeout. And the timeout of a load job start when pending load task is scheduled. 4. Fix a bug that the loading task is never submitted to the pool. The logic of BlockedPolicy is wrong. We should make sure the task is submitted to the pool, or the RejectedExecutionException should be thrown. 5. Now the transaction of a load job will begin in pending task, instead of when submitting the job.	2021-12-16 10:39:22 +08:00
jiafeng.zhang	2e334d06da	[docs](sql-block-rule) modify document of sql block rule (#7370 )	2021-12-16 10:38:54 +08:00
jakevin	6ede693839	[fix](insert) modify code logic of InsertStmt (#7360 ) when entry is null, there will be NullPointerException.	2021-12-16 10:38:05 +08:00
HappenLee	4afdcdb939	[performance](reader) Opt the unique reader to reduce unnecessary compare and function call (#7348 )	2021-12-16 10:36:43 +08:00
zhoubintao	85521944dd	[refactor](olap-scan-node) Refactor olap scannode (#7131 ) 1. Delete useless variables 2. Add const modifier for read-only function 3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule: [https://en.cppreference.com/w/cpp/language/rule_of_three] 4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function. Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override	2021-12-16 10:33:41 +08:00
wudi	549e849400	[improvement](flink-connector) DataSourceFunction read doris supports parallel (#7232 ) The previous DataSourceFunction inherited from RichSourceFunction. As a result, no matter how much the parallelism of flink is set, the parallelism of DataSourceFunction is only 1. Now modify it to RichParallelSourceFunction. And when flink has multiple degrees of parallelism, assign the doris data to each parallelism. For example, read dorisPartitions.size = 10, flink.parallelism = 4 The task is split as follows: task0: dorisPartitions[0],[4],[8] task1: dorisPartitions[1],[5],[9] task2: dorisPartitions[2],[6] task3: dorisPartitions[3],[7]	2021-12-15 16:21:29 +08:00
Mingyu Chen	c8bc0cf523	[chore][community](github) Remove travis and add github action (#7380 ) 1. Remove travis 2. Add github action to build extension: 1. docs 2. fs_broker 3. flink/spark/connector	2021-12-15 13:27:37 +08:00
caiconghui	382351b0ee	[fix](ut) Fix run fe ut failed, be ut memory leak and build thirdparty failed (#7377 )	2021-12-15 11:00:20 +08:00
Zhengguo Yang	926540c561	[feature] Support return bitmp/hll data in select statement (#7276 ) Support return bitmp/hll data in select statement, this can be used when set show_object_data=true;	2021-12-15 09:48:27 +08:00
jiafeng.zhang	e64da03866	[deps](log4j) Upgrade log4j 2 to 2.16.0 (#7394 ) Upgrade log4j 2 to 2.16.0, the official strongly recommends upgrading to this version	2021-12-14 15:57:16 +08:00
EmmyMiao87	d9c927fdc6	[improvement](log)(schema change) Add a clear memory description in the log (#7378 ) If the memory exceeds the limit when be generates a materialized view or schema change, a more detailed log about limit and configuration will be prompted..	2021-12-14 15:56:50 +08:00
HappenLee	4e02109926	[refactor][fix](constants-fold) Refactor the code of fold constant mgr and fix some undefined behavior and mem leak (#7373 ) 1. Fix some memory leaks 2. Remove redundant and invalid code 3. Fix some buggy writes to reduce extra memory copies and return null pointers to string 4. Reframing the naming to make the structure clearer	2021-12-14 15:53:56 +08:00
luzhijing	a6a584a2e7	[doc] update the compilation.md (#7350 ) Update the compilation.md, add the docker image version explain.	2021-12-14 15:52:40 +08:00
Dayue Gao	414c5a8b5a	[fix] LRUCache::prune_if may not remove all the entries matching the predicate (#7383 ) [fix] LRUCache::prune_if may not remove all the entries matching the predicate Co-authored-by: gaodayue <gaodayue@bytedance.com>	2021-12-13 21:09:47 +08:00
HB	ef2ea1806e	[docs] Improve the chapter on debugging FE in doc. (#7309 ) At present, there are defects in the chapter on debugging FE in doc. My colleagues and I stepped on the pit when building the debugging environment, so I want to improve this chapter in combination with my own stepping on the pit experience. The following is my explanation of the changes: 1. mkdir -p ./thirdparty/installed/bin explain: When I downloaded versions 0.14 and 0.15, there were no files under thirdparty, so I didn't know whether to create it myself or what to do. Finally, I decided to create it myself. I think it's necessary to add instructions here. 2. Add installation thrift@0.13.0 Failed handling method. explain: My colleagues and I failed to find the installation package when executing the installation command, and finally found a solution on GitHub. Therefore, I added the handling method of the problem to avoid other Mac users from getting stuck in this place. 3. Fixed an error in the generated code description. explain: Before I finished building the code, I debugged FE, and I failed all the time. Idea hints that no files can be found. Later, after consulting with morningman in wechat group, it was understood that `mvn install -DskipTests` does not need to execute `mvn generate-sources` after execution. This is inconsistent with the description in the document and needs to be corrected.	2021-12-13 16:26:45 +08:00
SleepyBear	e0889aee1e	[typo](load) correct the error of ‘EtlJobMgr::get_job_status’ function (#7353 )	2021-12-11 16:54:25 +08:00
GoGoWen	5745adb26c	[improvement](reader) optimize for single rowset reading (#7351 ) read single rowset without do aggregation when reading all columns, and otherwise should use `_agg_key_next_row`	2021-12-11 16:53:56 +08:00
jiafeng.zhang	568f6611df	[deps](log4j) upgrade log4j (#7364 ) to 2.15.0	2021-12-10 23:19:11 +08:00
thinker	80c11da3df	[refactor] modify the implements of Tuple & RowBatch (#7319 ) code refactor: improve code's readability, avoid const_cast 1. make loop simpler and clearer by using range-based loop grammar, it's safer than old loop style 2. iteration for _row_desc.tuple_descriptors() use index replace index and iterator mixed 3. add new function To cast_to(From from), use this union-based casting between two types to replace reinterpret_cast, this new cast is more readable 4. avoid using the same variable name for nested loop, it's dangerous 5. add const keyword for member functions followed CppCoreGuidelines	2021-12-09 22:36:37 +08:00
jakevin	ac739fec10	[refactor] modify the control flow code to improve code readability (#7302 ) Now the code of command handler isn't clear. We can modify `if` and `else` to improve code readability.	2021-12-09 22:35:46 +08:00
Mingyu Chen	db57c42c83	[improvement](compaction)(tablet repair) Add missing rowsets in compaction status url and support force dropping redundant replica (#7283 ) 1. Add missing rowsets in compaction status url 2. Add a new config `force_drop_redundant_replica` to force drop redundant replicas. 3. Fix FE ut	2021-12-09 22:34:57 +08:00
qiye	dc281ebc34	[fix](routine load) fix bug that can not read image when using keyword STREAM (#7323 ) issue #7322 1. Support `stream` as an identifier. 2. Optimize exception log output in `RoutineLoad`	2021-12-08 20:51:17 +08:00
jakevin	b080e797a1	[community](github) add more content of gitignore file (#7307 ) Ignore the `target` file in samples/doris-demo/	2021-12-08 20:50:44 +08:00
jakevin	be0cf51eed	[docs] add java formatter in doc (#7306 ) Now there isn't the guidance of java format. We should add it in doc.	2021-12-08 20:49:45 +08:00
weizuo93	6f91741628	[Bug]Fix BE coredump when manual compaction task is triggered (#7260 ) * fix compaction action bug Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-12-08 17:10:34 +08:00
Mingyu Chen	10ccadacce	[fix](forward) Avoid endless forward execution (#7335 ) Close related #7334 1. Fix bug describe in [Bug] show frontends cause FE oom #7334 2. Fix error of CurrentConnected fields in show frontends result. 3. Add more FAQ	2021-12-08 16:25:04 +08:00
EmmyMiao87	2ae9c41aa1	[fix](lateral view)(subquery) Fix column materialization error (#7330 ) Fix the problem that when the source column of the lateral view comes from a inline view, the column in the inline view cannot be materialized correctly. At the same time, fix the problem that the correct output column cannot be projected when the source column of the lateral view comes from a inline view. It should be noted that when the column in the query is from a inline view column. During semantic analysis and planning, it needs to be converted from tuple(virtual) to real tuple.	2021-12-07 10:23:33 +08:00
MeiontheTop	868281f7cf	[docs] update data-model-rollup.md (#7321 ) Fix typo	2021-12-07 10:05:00 +08:00
jakevin	3b10002536	[community][typo](github) modify PR template (#7310 ) I found some small problems when I read code. So I add some small enhancement. 1. modify PR template. Now the template of PR isn't simple and clear. It's useful to refactor it. 2. some small change (typo, format .....)	2021-12-07 10:03:28 +08:00
Zeno Yang	5e32ae3c3f	[improvement](cache) Optimize sql cache (#7231 ) issue: #7230 When getting the latest update time of a table, only compare the partitions of this query, not all partitions of a table. The goal is to improve the SqlCache hit rate.	2021-12-07 09:59:31 +08:00
Mingyu Chen	03ad8c1fe3	[fix](load) Fix bug that show load may be blocked (#7254 ) When a broker load's task is failed, it may be retried by holding the LoadJob's write lock and submit loading task to a thread pool. But submitting a task to thread pool may be blocked for at most 60 seconds (depends on BlockPolicy), so it will hold write lock for too long.	2021-12-07 09:58:50 +08:00
Zhengguo Yang	62d12067aa	[feature](udf) make orthogonal bitmap udaf as build in functions (#7211 ) move orthogonal bitmap udaf as build in functions add three buildin bitmap functions: - orthogonal_bitmap_intersect - orthogonal_bitmap_intersect_count - orthogonal_bitmap_union_count	2021-12-07 09:57:26 +08:00
caiconghui	8660bf69ff	[fix](select join) Make selected slotRef nullable when slotRef is from nullable tuple in outer join sql block (#7290 )	2021-12-06 16:17:10 +08:00
Mingyu Chen	164b27412c	[revert] "[improvement](bdbje) clean too many bdbje log (#7273 )" (#7312 ) Reverts #7273 Because there is no EnvironmentConfig.RESERVED_DISK.	2021-12-06 11:32:45 +08:00
Zhengguo Yang	200210e708	[fix] (ut) fix fe unit test failed, this is because we fix the MAX_PHYSICAL_PACKET_LENGTH to 0xffffff	2021-12-06 11:13:01 +08:00
caiconghui	6e0664bdf8	[enhancement](audit) Enable fe audit plugin to audit more infos for query (#7300 )	2021-12-06 10:33:15 +08:00
caiconghui	bffc2836d7	[fix](show) Fix bug that AdminShowDataSkew operation may cause fe oom (#7297 )	2021-12-06 10:32:00 +08:00
thinker	f9be31d4bc	[refactor](rowbatch) make RowBatch better (#7286 ) 1. add const keyword for RowBatch's read-only member functions 2. should use member object rather than member object pointer as possible as you can	2021-12-06 10:31:43 +08:00
tianhui5	e080afa186	[typo] update comment of MasterDaemon (#7285 ) The comment of MasterDaemon is out of date, may misguide reader.	2021-12-06 10:30:48 +08:00
thinker	8a6528a2fb	[fix](executor) set the length of StringValue to 0 when it is null (#7284 ) the tuple String Slot's ptr and len are not assigned appropriately on send side, the receive side may crash in some situation. detail description: on send side, when we call RowBatch::serialize(PRowBatch* output_batch) to pack RowBatch, the Tuple::deep_copy() will be called, for each String Slot, only String Slots that is not null will set ptr and len with proper value, the null String Slots will keep original status, the ptr member will point randomly and the len member may unexpect. on recv side, unpack is processed by RowBatch::RowBatch(const RowDescriptor&, const PRowBatch&...), in this function, each String Slot will transfer offset to valid string_val->ptr whether the String Slot is null or not. but some business logic depends on string_val->len=0, such as AggregateFuncTraits::init(), HyperLogLog::deserialize() will return correctly if slice.size<=0. so if string_val->len is set to 0 in send side, everything will be ok, otherwise server may crash. by netcomm viewpoint, we should make sure transfer correct data, it's sender's responsibility to set data with proper value, and do not make any presume which way the recv side will use it.	2021-12-06 10:30:26 +08:00
wei zhao	19a3c393a9	[Improvement](spark-connector) Add 'sink.batch.size' and 'sink.max-retries' options in spark-connector (#7281 ) Add `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`. Be consistent with `link-connector` options . eg: ```scala df.write .format("doris") // specify maximum number of lines in a single flushing .option("sink.batch.size",2048) // specify number of retries after writing failed .option("sink.max-retries",3) .save() ```	2021-12-06 10:29:33 +08:00
dh-cloud	974ab9b90c	[improvement](bdbje) clean too many bdbje log (#7273 ) In an HA environment, JE will retains as many reserved files. the jdbje log become too large. so we should limit the reserved files size, default set 1GB	2021-12-06 10:28:36 +08:00
tinkerrrr	25b31e7d5e	[docs][typo] correct sql syntax in upgrade.md (#7271 ) correct sql syntax in upgrade.md Co-authored-by: 袁湘敏 <yuanxiangmin@corp.netease.com>	2021-12-06 10:28:01 +08:00
EmmyMiao87	4bfee42ba1	[feature-wip](lateral view) Support lateral view based on subquery (#7269 ) Support lateral view of the result column in subquery. For example: ``` select e1 from (select k2 as a from test_explode group by a) tmp1 lateral view explode_split(a, ",") tmp2 as e1; ``` The lateral view will parse the inline view column and put the table function node above the subquery.	2021-12-06 10:26:36 +08:00
renzhimin7	27f494dad3	[docs][typo] Update fe_config.md (#7252 ) Int type should be 4 bytes and decimal should be 16 bytes	2021-12-06 10:25:28 +08:00
HappenLee	d3316ff567	[performance](function) Support SIMD function in some string function (#7236 ) Support SIMD function in some string function：lrtim，rtrim，trim，reverse，hex	2021-12-06 10:24:26 +08:00

1 2 3 4 5 ...

3618 Commits