Commit Graph

5948 Commits

Author SHA1 Message Date
5745adb26c [improvement](reader) optimize for single rowset reading (#7351)
read single rowset without do aggregation when reading all columns,
and otherwise should use `_agg_key_next_row`
2021-12-11 16:53:56 +08:00
568f6611df [deps](log4j) upgrade log4j (#7364)
to 2.15.0
2021-12-10 23:19:11 +08:00
80c11da3df [refactor] modify the implements of Tuple & RowBatch (#7319)
code refactor: improve code's readability, avoid const_cast

1. make loop simpler and clearer by using range-based loop grammar, it's safer than old loop style
2. iteration for _row_desc.tuple_descriptors() use index replace index and iterator mixed
3. add new function To cast_to(From from), use this union-based casting between two types to replace reinterpret_cast,  this new cast is more readable
4. avoid using the same variable name for nested loop, it's dangerous
5. add const keyword for member functions followed CppCoreGuidelines
2021-12-09 22:36:37 +08:00
ac739fec10 [refactor] modify the control flow code to improve code readability (#7302)
Now the code of command handler isn't clear.
We can modify `if` and `else` to improve code readability.
2021-12-09 22:35:46 +08:00
db57c42c83 [improvement](compaction)(tablet repair) Add missing rowsets in compaction status url and support force dropping redundant replica (#7283)
1. Add missing rowsets in compaction status url
2. Add a new config `force_drop_redundant_replica` to force drop redundant replicas.
3. Fix FE ut
2021-12-09 22:34:57 +08:00
dc281ebc34 [fix](routine load) fix bug that can not read image when using keyword STREAM (#7323)
issue #7322 

1. Support `stream` as an identifier.
2. Optimize exception log output in `RoutineLoad`
2021-12-08 20:51:17 +08:00
b080e797a1 [community](github) add more content of gitignore file (#7307)
Ignore the `target` file in samples/doris-demo/
2021-12-08 20:50:44 +08:00
be0cf51eed [docs] add java formatter in doc (#7306)
Now there isn't the guidance of java format. We should add it in doc.
2021-12-08 20:49:45 +08:00
6f91741628 [Bug]Fix BE coredump when manual compaction task is triggered (#7260)
* fix compaction action bug

Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-12-08 17:10:34 +08:00
10ccadacce [fix](forward) Avoid endless forward execution (#7335)
Close related #7334

1. Fix bug describe in [Bug] show frontends cause FE oom #7334
2. Fix error of CurrentConnected fields in show frontends result.
3. Add more FAQ
2021-12-08 16:25:04 +08:00
2ae9c41aa1 [fix](lateral view)(subquery) Fix column materialization error (#7330)
Fix the problem that when the source column of the lateral view comes from a inline view,
the column in the inline view cannot be materialized correctly.

At the same time, fix the problem that the correct output column cannot be projected
when the source column of the lateral view comes from a inline view.

It should be noted that when the column in the query is from a inline view column.
During semantic analysis and planning, it needs to be converted from tuple(virtual) to real tuple.
2021-12-07 10:23:33 +08:00
868281f7cf [docs] update data-model-rollup.md (#7321)
Fix typo
2021-12-07 10:05:00 +08:00
3b10002536 [community][typo](github) modify PR template (#7310)
I found some small problems when I read code. So I add some small enhancement. 

1. modify PR template.  Now the template of PR isn't simple and clear. It's useful to refactor it.
2. some small change (typo, format .....)
2021-12-07 10:03:28 +08:00
5e32ae3c3f [improvement](cache) Optimize sql cache (#7231)
issue: #7230
When getting the latest update time of a table, only compare the partitions of this query,
not all partitions of a table.
The goal is to improve the SqlCache hit rate.
2021-12-07 09:59:31 +08:00
03ad8c1fe3 [fix](load) Fix bug that show load may be blocked (#7254)
When a broker load's task is failed, it may be retried by holding the
LoadJob's write lock and submit loading task to a thread pool.

But submitting a task to thread pool may be blocked for at most 60 seconds
(depends on BlockPolicy), so it will hold write lock for too long.
2021-12-07 09:58:50 +08:00
62d12067aa [feature](udf) make orthogonal bitmap udaf as build in functions (#7211)
move orthogonal bitmap udaf as build in functions
add three buildin bitmap functions:

- orthogonal_bitmap_intersect
- orthogonal_bitmap_intersect_count
- orthogonal_bitmap_union_count
2021-12-07 09:57:26 +08:00
8660bf69ff [fix](select join) Make selected slotRef nullable when slotRef is from nullable tuple in outer join sql block (#7290) 2021-12-06 16:17:10 +08:00
164b27412c [revert] "[improvement](bdbje) clean too many bdbje log (#7273)" (#7312)
Reverts #7273
Because there is no EnvironmentConfig.RESERVED_DISK.
2021-12-06 11:32:45 +08:00
200210e708 [fix] (ut) fix fe unit test failed, this is because we fix the MAX_PHYSICAL_PACKET_LENGTH to 0xffffff 2021-12-06 11:13:01 +08:00
6e0664bdf8 [enhancement](audit) Enable fe audit plugin to audit more infos for query (#7300) 2021-12-06 10:33:15 +08:00
bffc2836d7 [fix](show) Fix bug that AdminShowDataSkew operation may cause fe oom (#7297) 2021-12-06 10:32:00 +08:00
f9be31d4bc [refactor](rowbatch) make RowBatch better (#7286)
1. add const keyword for RowBatch's read-only member functions
2. should use member object rather than member object pointer as possible as you can
2021-12-06 10:31:43 +08:00
e080afa186 [typo] update comment of MasterDaemon (#7285)
The comment of MasterDaemon is out of date, may misguide reader.
2021-12-06 10:30:48 +08:00
8a6528a2fb [fix](executor) set the length of StringValue to 0 when it is null (#7284)
the tuple String Slot's ptr and len are not assigned appropriately on send side, the receive side may crash in some situation.

detail description:
on send side, when we call RowBatch::serialize(PRowBatch* output_batch) to pack RowBatch, the Tuple::deep_copy()
 will be called, for each String Slot, only String Slots that is not null will set ptr and len with proper value, the null String
 Slots will keep original status, the ptr member will point randomly and the len member may unexpect.

on recv side, unpack is processed by RowBatch::RowBatch(const RowDescriptor&, const PRowBatch&...), in this 
function, each String Slot will transfer offset to valid string_val->ptr whether the String Slot is null or not.

but some business logic depends on string_val->len=0, such as AggregateFuncTraits::init(), HyperLogLog::deserialize() 
will return correctly if slice.size<=0. so if string_val->len is set to 0 in send side, everything will be ok, otherwise server 
may crash.

by netcomm viewpoint, we should make sure transfer correct data, it's sender's responsibility to set data with proper 
value, and do not make any presume which way the recv side will use it.
2021-12-06 10:30:26 +08:00
19a3c393a9 [Improvement](spark-connector) Add 'sink.batch.size' and 'sink.max-retries' options in spark-connector (#7281)
Add  `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`.
Be consistent with `link-connector` options .
eg:
```scala
   df.write
      .format("doris")
      // specify maximum number of lines in a single flushing
      .option("sink.batch.size",2048)
      // specify number of retries after writing failed
      .option("sink.max-retries",3)
      .save()
```
2021-12-06 10:29:33 +08:00
974ab9b90c [improvement](bdbje) clean too many bdbje log (#7273)
In an HA environment, JE will retains as many reserved files.
the jdbje log become too large.
so we should limit the reserved files size, default set 1GB
2021-12-06 10:28:36 +08:00
25b31e7d5e [docs][typo] correct sql syntax in upgrade.md (#7271)
correct sql syntax in upgrade.md
Co-authored-by: 袁湘敏 <yuanxiangmin@corp.netease.com>
2021-12-06 10:28:01 +08:00
4bfee42ba1 [feature-wip](lateral view) Support lateral view based on subquery (#7269)
Support lateral view of the result column in subquery.
For example:
  ```
  select e1 from (select k2 as a from test_explode group by a) tmp1
  lateral view explode_split(a, ",") tmp2 as e1;
  ```
The lateral view will parse the inline view column
and put the table function node above the subquery.
2021-12-06 10:26:36 +08:00
27f494dad3 [docs][typo] Update fe_config.md (#7252)
Int type should be 4 bytes and decimal should be 16 bytes
2021-12-06 10:25:28 +08:00
d3316ff567 [performance](function) Support SIMD function in some string function (#7236)
Support SIMD function in some string function:lrtim,rtrim,trim,reverse,hex
2021-12-06 10:24:26 +08:00
270bebe196 [chore](github) Add third-party GitHub Action as submodule to allow it to run (#7280)
Add the 3rd-party GHA as submodule so that it can be run without asking to add it into allow list.
2021-12-04 19:43:30 +08:00
845f931098 [fix](select outfile) Remove optional properties check of hdfs storage (#7272) 2021-12-03 13:42:56 +08:00
92020e6e85 [deps](librdkafka) set --enable-sasl option in rdkafka build to enable plain password auth at routine load (#7251)
```
create routine load rd_001
on tb1
with append
COLUMNS(user_id, date)
properties (
"desired_concurrent_number" = "3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"max_error_number" = "100",
"format" = "json"
)
from KAFKA (
"kafka_broker_list" = "127.0.0.1:9092",
"kafka_topic" = "topic1",
"property.security.protocol" = "sasl_plaintext",
"property.sasl.mechanism" = "PLAIN",
"property.sasl.username" = "your-username",
"property.sasl.password" = "your-password",
"property.group.id" ="group1",
"property.client.id" = "client-1",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);
```
2021-12-02 11:44:37 +08:00
Wei
5f7c4f903f [refactor](log) Remove unused log instance creation (#7249) 2021-12-02 11:43:29 +08:00
f51448d60b [community](github) add enhancement.yml (#7242)
Add enhancement type of issue
2021-12-02 11:42:31 +08:00
fc9e502b51 [improvement](brpc)(config) Support transfer RowBatch in Controller Attachment (#7164)
Transfer RowBatch in Protobuf Request to Controller Attachment,
when the maximum length of the RowBatch in the Protobuf Request is exceeded.
This can avoid reaching the upper limit of the Protobuf Request length (2G),
and it is expected that performance can be improved.
2021-12-02 11:41:38 +08:00
dd36ccc3bf [feature](storage-format) Z-Order Implement (#7149)
Support sort data by Z-Order:

```
CREATE TABLE table2 (
siteid int(11) NULL DEFAULT "10" COMMENT "",
citycode int(11) NULL COMMENT "",
username varchar(32) NULL DEFAULT "" COMMENT "",
pv bigint(20) NULL DEFAULT "0" COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(siteid, citycode)
COMMENT "OLAP"
DISTRIBUTED BY HASH(siteid) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"data_sort.sort_type" = "ZORDER",
"data_sort.col_num" = "2",
"in_memory" = "false",
"storage_format" = "V2"
);
```
2021-12-02 11:39:51 +08:00
d8ba6e3eb6 1. Fix an error when fetch string type field may cause malform packet error. (#7262)
This is beacuse of an const MAX_PHYSICAL_PACKET_LENGTH  in fe should be 2^24 -1,
   but it is set as 2^24 -2 by mistake.
2. Fix bitmap_to_string may failed when the result is large than 2G
2021-12-01 10:02:34 +08:00
fbab8afe24 [feature] Support disable query and load for backend to make Doris more robust and set default value to 1 for max_query_retry_time (#7155)
ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("disable_query" = "true");
ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("disable_load" = "true");
2021-11-30 22:08:32 +08:00
6c4aeab06f [fix](broker-load) BE may crash when using preceding filter in broker or routine load (#7193)
The broker scan node has two tuple descriptors:
One is dest tuple and the other is src tuple.
The src tuple is used to read the lines of the original file,

and the dest tuple is used to save the converted lines.
The preceding filter is executed on the src tuple, so src tuple descriptor should be used
to initialize the filter expression
2021-11-30 22:04:05 +08:00
904a32c758 [docs] fix 0.14 release date in download page (#7253)
The release date of 0.14 in download page is wrong
2021-11-30 15:00:36 +08:00
9b3c834396 [docs](release) Update download page to add release 0.15 (#7244)
Also modify some steps in release processing document
2021-11-29 16:06:32 +08:00
91a3150910 [fix](reader) Fix the bug that reader call _capture_rs_readers function twice (#7224) 2021-11-26 10:17:33 +08:00
baa5d6089f [fix](alter) Fix bug that partition column of a unique key table can be modified (#7217)
The partition columns can not be modified.
2021-11-26 10:16:01 +08:00
948a2a738d [performance] Improve DeltaWriter's performance. (#7216)
1. Support batch write for DeltaWriter.
2. Use mutex instead of SpinLock.
2021-11-26 10:15:27 +08:00
178fda593d [docs] Refine documents for commit message tags. (#7215) 2021-11-26 10:14:39 +08:00
52cd12a1f9 [fix](planner) fix preaggregation reason error (#7205)
this pr is going to Fix #7204.
2021-11-26 10:13:53 +08:00
a1bf2878c0 [feat-opt](json-function) optimize get_json_xx function (#7157)
Avoid repeated parsing json string is the first parameter of function is constant.
2021-11-26 10:12:55 +08:00
70670b5a42 [feat-wip](lateral-iew) Pruning output slot of TableFunctionNode (#7148)
If the calculation of the lateral view function is completed,
the result will be directly returned to the upper layer.
It will cause a lot of memory copy and network transmission.
The reason is that the original column that generally participates
in the lateral view is very likely to be a very long value.
If Doris still retain this column after calculating the lateral view,
it need to perform a memory copy.
However, in many cases, the upper plan node does not need the original columns of the lateral view,
so it is necessary to perform column pruning after the calculation of the lateral view,
so as to avoid useless memory copy and network transmission.
For example, the following query can prune the original column v1

```select k1, e1 from table lateral view explode_split(v1, ",") tmp as e1;```

The `outputSlotIds` in TableFunctionNode is used to store the columns that should be retained after pruning.

* Support scalar function in lateral view

The child 0 of explode_split function could be a scalar function
such as: concat(k1, ",", k2)

This pr mainly detects whether the lateral view with function satisfies the following specifications in semantics.
1. The columns in the function must all belong to the original table
2. The function must be a scalar function
2021-11-26 10:10:05 +08:00
Pxl
2445f10868 [fix](bitmap-function) fix core dump at some bitmap function (#7221) 2021-11-25 22:52:50 +08:00