Commit Graph

5457 Commits

Author SHA1 Message Date
d1dbe7bfc8 [fix](reader) fix leak in Level1Iteartor (#23612)
_merge_next() and _normal_next() leak _cur_child when _cur_child->next()
returns failure.
2023-08-29 23:32:24 +08:00
030df6db35 [fix](odbc) fix odbc insert string data to sqlserve (#23364) 2023-08-29 21:47:50 +08:00
1ac0ff0ea9 [feature](delete-predicate) support delete sub predicate v2 (#22442)
New structure for delete sub predicate.
Delete sub predicate uses a string type condition_str to stored temporarily now and fields will be extracted from it using std::regex, which may introduces stack overflow when matching a extremely large string(bug of libc).

Now we attempt to use a new PB structure to hold the delete sub predicate, to avoid that problem.

message DeleteSubPredicatePB {
    optional int32 column_unique_id = 1;
    optional string column_name = 2;
    optional string op = 3;
    optional string cond_value = 4;
}
Currently, 2 versions of sub predicate will both be filled. For query, we use the v2, and during compaction we still use v1. The old rowset meta with delete predicates which had sub predicate v1 will be attempted to convert to v2 when read from PB. Moreover, efforts will be made to rewrite these meta with the new delete sub predicate.

Make preparation to use column unique id to specify a column globally.
Using the column unique id rather than the column name to identify a column is vital for flexible schema change. The rewritten delete predicate will attach column unique id.
2023-08-29 19:37:23 +08:00
94a8fa6bc9 [bug](function) fix explode_number function return wrong rows (#23603)
before the explode_number function result is random with const value.
because the _cur_size is reset, so it's can't insert values to column.
2023-08-29 19:02:49 +08:00
82a4f114e4 [improvement](compaction) add an option on delete stale rowset by judging _stale_rs_metas size when doing compaction (#23448) 2023-08-29 17:40:37 +08:00
1410a15a61 [fix](compaction) print column name when checking block ColumnPtr is nullptr on get block byte (#23338) 2023-08-29 17:24:48 +08:00
0cece561f9 [refactor](segment iterator) remove std::map in iterator use std::vector instead and not rely on unique id to idenfy position (#23505) 2023-08-29 16:43:32 +08:00
f7a3d2778a [FIX](array)update array olapconvertor and support array nested other complex type (#23489)
* update array olapconvertor and support array nested other complex type

* update for inverted index
2023-08-29 16:18:11 +08:00
993659cd0b [FIX](serde) fix handle serde error #23565 2023-08-29 14:55:35 +08:00
97eb2b9172 [Fix](multi-catalog) Fix broker load reader and hdfs reader issue. (#23529)
Broker load with broker sometimes will throw 'Invalid orc post script length'.
hdfs query sometimes will throw 'Invalid orc post script length'.
2023-08-29 13:45:48 +08:00
7dcde4d529 [bug](decimal) Use max value as result if overflow (#23602)
* [bug](decimal) Use max value as result if overflow

* update
2023-08-29 13:26:25 +08:00
Pxl
7913354f78 add column number check for vsorted_run_merger (#23584) 2023-08-29 10:41:59 +08:00
0128dd42d9 [fix](regexp_extract_all) fix be OOM when quering with regexp_extrac… (#23284) 2023-08-29 10:34:12 +08:00
da9eb79ac4 [Enhancement](Schema hash) Remove schema hash in tablet info (#23516) 2023-08-29 10:05:12 +08:00
d863cc3a12 [fix](move-memtable) fix tablets to commit (#23577) 2023-08-29 09:49:07 +08:00
9c65b7ab96 [improvement](column_reader) move load once to index reader to reduce (#23537)
memory footprint of column reader
2023-08-29 09:34:27 +08:00
fbf8499999 [improvement](compaction) reduce the memory using on vertical compaction (#23388) 2023-08-28 21:54:21 +08:00
35a1404bbe [fix](load) add error handle when load data dir (#23457) 2023-08-28 19:33:50 +08:00
392437008c [Improvement](ColumnReader) optimize memory using of ColumnReader meta (#23528) 2023-08-28 17:57:59 +08:00
650cc25ea4 [fix](light-schema-change) fix schema consistency check failed (#23283) 2023-08-28 16:40:30 +08:00
29b94c4ed7 [pipeline](refactor) refine pipeline fragment context (#23478) 2023-08-28 15:55:02 +08:00
7e7cfd17bf [fix](tablet sink) check data valid of tablet sink data (#23530)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
2023-08-28 15:54:12 +08:00
Pxl
3049533e63 [Bug](materialized-view) fix core dump on create materialized view when diffrent mv column have same reference base column (#23425)
* Remove redundant predicates on scan node

update

fix core dump on create materialized view when diffrent mv column have same reference base column

Revert "update"

This reverts commit d9ef8dca123b281dc8f1c936ae5130267dff2964.

Revert "Remove redundant predicates on scan node"

This reverts commit f24931758163f59bfc47ee10509634ca97358676.

* update

* fix

* update

* update
2023-08-28 14:40:51 +08:00
28a2e71084 [pipelineX](refactor) refine codes (#23521)
* [pipelineX](refactor) refine codes

* update

* update
2023-08-28 14:38:07 +08:00
c05319b8eb [fix](agg) incorrect result of bitmap_agg and bitmap_union (#23558) 2023-08-28 14:22:19 +08:00
5be8d57f52 [fix](be-ut) fix ColumnFixedLenghtObjectTest on 32 bits system (#23519) 2023-08-28 14:02:05 +08:00
962221cb18 [test](log) add log for debug case failure (#23506) 2023-08-28 10:45:25 +08:00
981586155c [Improvement][json] optimize performance of json_extract by reusing json path object (#23430)
* reuse json path to speed up json function

* fix typo

* clang format

* path reentry safe

* fix compile error

* fix bug of continue
2023-08-27 17:39:10 +08:00
e0bf621fe0 [chore](build) Fix compilation errors for BE UT (#23535)
Issue Number: close #23536

This issue was introduced by #23414 .
2023-08-27 11:52:13 +08:00
153e8f0f72 [imporvement](table property) support for alter table property: skip wirte index , single compaction (#23475) 2023-08-26 23:52:09 +08:00
ba351af452 [enhancement](thirdparty) upgrade thirdparty libs - again (#23414)
submit again #23290 (not upgrade brpc, because bthread local has error)

protobuf 3.15.0 -> 21.11
glog 0.4.0 -> 0.6.0
lz4 1.9.3 -> 1.9.4
curl 7.79.0 -> 8.2.1
zstd 1.5.2 -> 1.5.5
arrow 7.0.0 -> 13.0.0
abseil 20220623.1 -> 20230125.3
orc 1.7.2 -> 1.9.0
jemalloc for arrow 5.2.1 -> 5.3.0
xsimd 7.0.0 -> 13.0.0
opentelemetry-proto 0.19.0 -> 1.0.0
opentelemetry 1.8.3 -> 1.10.0

new:
c-ares -> 1.19.1
grpc -> 1.54.3
2023-08-26 22:59:10 +08:00
30e3c5bbe6 [bugfix](file cache) Fix the init file cache coredump (#23464)
* [bugfix](file cache) Fix the init file cache coredump

* fix compile
2023-08-26 16:50:50 +08:00
40be6a0b05 [fix](hive) do not split compress data file and support lz4/snappy block codec (#23245)
1. do not split compress data file
Some data file in hive is compressed with gzip, deflate, etc.
These kinds of file can not be splitted.

2. Support lz4 block codec
for hive scan node, use lz4 block codec instead of lz4 frame codec

4. Support snappy block codec
For hadoop snappy

5. Optimize the `count(*)` query of csv file
For query like `select count(*) from tbl`, only need to split the line, no need to split the column.

Need to pick to branch-2.0 after this PR: #22304
2023-08-26 12:59:05 +08:00
bc020112fc [enhancement](routineload) add debug conf and set broker.name.ttl = 0 (#23302)
* set broker.name.ttl = 0

* add debug config for librdkafka
2023-08-26 10:56:35 +08:00
f32efe5758 [Fix](Outfile) Fix that it does not report error when export table to S3 with an incorrect ak/sk/bucket (#23441)
Problem:
It will return a result although we use wrong ak/sk/bucket name, such as:
```sql
mysql> select * from demo.student
    -> into outfile "s3://xxxx/exp_"
    -> format as csv
    -> properties(
    ->   "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com",
    ->   "s3.region" = "ap-beijing",
    ->   "s3.access_key"= "xxx",
    ->   "s3.secret_key" = "yyyy"
    -> );
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL                                                                                                |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
|          1 |         3 |       26 | s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
1 row in set (0.15 sec)
```

The reason for this is that we did not catch the error returned by `close()` phase.
2023-08-26 00:19:30 +08:00
f66f161017 [fix](multi-catalog)fix hive table with cosn location issue (#23409)
Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc).
This PR mainly changes:

1. Fix the bug of accessing files via cosn.
2. Add a new field `fs_name` in TFileRangeDesc
    This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query
request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name
for each file, otherwise, it may return error:

`reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`
2023-08-26 00:16:00 +08:00
8af1e7f27f [Fix](orc-reader) Fix incorrect result if null partition fields in orc file. (#23369)
Fix incorrect result if null partition fields in orc file. 

### Root Cause
Theoretically, the underlying file of the hive partition table should not contain partition fields. But we found that in some user scenarios, the partition field will exist in the underlying orc/parquet file and are null values. As a result, the  pushed down partition field which are null values. filter incorrectly.

### Solution
we handle this case by only reading non-partition fields. The parquet reader is already handled this way, this PR handles the orc reader.
2023-08-26 00:13:11 +08:00
a3a951c71d [Fix](multi-catalog) Fix load string dict issue for transactional hive tables. (#23306)
Fix load string dict issue for transactional hive tables. The column name need to pass 'row.column_name'.

apache/doris-thirdparty#112
2023-08-26 00:09:12 +08:00
2b6d876280 [feature](move-memtable)[6/7] add options to enable memtable on sink node (#23470)
Co-authored-by: Siyang Tang <82279870+TangSiyang2001@users.noreply.github.com>
2023-08-25 22:32:22 +08:00
6e6da733c6 [fix](invert index) fix the keyword type index length limit (#23503) 2023-08-25 21:34:11 +08:00
17e7c1ca53 [fix](fqdn)Fqdn with ipv6 (#22454)
now,`hostname_to_ip` only can resolve `ipv4`,Therefore, a method is provided to parse ipv4 or ipv6 based on parameters。
when `_heartbeat` call `hostname_to_ip`,Resolve to ipv4 or ipv6, determined by `BackendOptions.is_bind_ipv6` Decision
Additionally, a method is provided to first attempt to parse the host into ipv4, and then try ipv6 if it fails
2023-08-25 21:24:55 +08:00
29273771f7 [Fix](multi-catalog) Fix hive incorrect result by disable string dict filter if exprs contain null expr. (#23361)
Issue Number: close #21960

Fix hive incorrect result by disable string dict filter if exprs contain null expr.
2023-08-25 21:16:43 +08:00
9d1c702b3a [improvement](function) do not use hyperscan for non-const partterns in like function (#23495) 2023-08-25 20:40:23 +08:00
49a32c2ee0 [pipelineX](fix) fix two phase execution and add test cases (#23353) 2023-08-25 17:57:35 +08:00
f80b067990 [fix](column) add unimplemented function of ColumnFixedLengthObject (#23468) 2023-08-25 17:38:01 +08:00
1312c12236 Revert "[fix](testcase) fix test case failure of insert null value into not null column (#20963)" (#23462)
* Revert "[fix](testcase) fix test case failure of insert null value into not null column (#20963)"

This reverts commit 55a6649da962fb170ddb40fea8ef26bdc552a51a.

Mannual Revert "fix in strict mode, return error for insert if datatype convert fails (#20378)"

This mannual reverts commit 1b94b6368f5e871c9a0fe53dd7c64409079a4c9d

* fix case failure
2023-08-25 16:47:14 +08:00
5c37be16fe [pipelineX](correctness) Fix close problem for local state (#23479) 2023-08-25 14:19:27 +08:00
Pxl
b96b8f4370 [Bug](jdbc) support get_default on complex type (#23325)
support get_default on complex type
2023-08-25 14:08:24 +08:00
d8e499cb55 [fix](UT) fix flaky test in LoadStreamMgrTest (#23459) 2023-08-25 13:53:20 +08:00
59acf61ec5 [pipelineX](pick) pick 2 PR from pipeline engine (#23463) 2023-08-25 13:26:05 +08:00