Commit Graph

5389 Commits

Author SHA1 Message Date
392437008c [Improvement](ColumnReader) optimize memory using of ColumnReader meta (#23528) 2023-08-28 17:57:59 +08:00
650cc25ea4 [fix](light-schema-change) fix schema consistency check failed (#23283) 2023-08-28 16:40:30 +08:00
29b94c4ed7 [pipeline](refactor) refine pipeline fragment context (#23478) 2023-08-28 15:55:02 +08:00
7e7cfd17bf [fix](tablet sink) check data valid of tablet sink data (#23530)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
2023-08-28 15:54:12 +08:00
Pxl
3049533e63 [Bug](materialized-view) fix core dump on create materialized view when diffrent mv column have same reference base column (#23425)
* Remove redundant predicates on scan node

update

fix core dump on create materialized view when diffrent mv column have same reference base column

Revert "update"

This reverts commit d9ef8dca123b281dc8f1c936ae5130267dff2964.

Revert "Remove redundant predicates on scan node"

This reverts commit f24931758163f59bfc47ee10509634ca97358676.

* update

* fix

* update

* update
2023-08-28 14:40:51 +08:00
28a2e71084 [pipelineX](refactor) refine codes (#23521)
* [pipelineX](refactor) refine codes

* update

* update
2023-08-28 14:38:07 +08:00
c05319b8eb [fix](agg) incorrect result of bitmap_agg and bitmap_union (#23558) 2023-08-28 14:22:19 +08:00
5be8d57f52 [fix](be-ut) fix ColumnFixedLenghtObjectTest on 32 bits system (#23519) 2023-08-28 14:02:05 +08:00
962221cb18 [test](log) add log for debug case failure (#23506) 2023-08-28 10:45:25 +08:00
981586155c [Improvement][json] optimize performance of json_extract by reusing json path object (#23430)
* reuse json path to speed up json function

* fix typo

* clang format

* path reentry safe

* fix compile error

* fix bug of continue
2023-08-27 17:39:10 +08:00
e0bf621fe0 [chore](build) Fix compilation errors for BE UT (#23535)
Issue Number: close #23536

This issue was introduced by #23414 .
2023-08-27 11:52:13 +08:00
153e8f0f72 [imporvement](table property) support for alter table property: skip wirte index , single compaction (#23475) 2023-08-26 23:52:09 +08:00
ba351af452 [enhancement](thirdparty) upgrade thirdparty libs - again (#23414)
submit again #23290 (not upgrade brpc, because bthread local has error)

protobuf 3.15.0 -> 21.11
glog 0.4.0 -> 0.6.0
lz4 1.9.3 -> 1.9.4
curl 7.79.0 -> 8.2.1
zstd 1.5.2 -> 1.5.5
arrow 7.0.0 -> 13.0.0
abseil 20220623.1 -> 20230125.3
orc 1.7.2 -> 1.9.0
jemalloc for arrow 5.2.1 -> 5.3.0
xsimd 7.0.0 -> 13.0.0
opentelemetry-proto 0.19.0 -> 1.0.0
opentelemetry 1.8.3 -> 1.10.0

new:
c-ares -> 1.19.1
grpc -> 1.54.3
2023-08-26 22:59:10 +08:00
30e3c5bbe6 [bugfix](file cache) Fix the init file cache coredump (#23464)
* [bugfix](file cache) Fix the init file cache coredump

* fix compile
2023-08-26 16:50:50 +08:00
40be6a0b05 [fix](hive) do not split compress data file and support lz4/snappy block codec (#23245)
1. do not split compress data file
Some data file in hive is compressed with gzip, deflate, etc.
These kinds of file can not be splitted.

2. Support lz4 block codec
for hive scan node, use lz4 block codec instead of lz4 frame codec

4. Support snappy block codec
For hadoop snappy

5. Optimize the `count(*)` query of csv file
For query like `select count(*) from tbl`, only need to split the line, no need to split the column.

Need to pick to branch-2.0 after this PR: #22304
2023-08-26 12:59:05 +08:00
bc020112fc [enhancement](routineload) add debug conf and set broker.name.ttl = 0 (#23302)
* set broker.name.ttl = 0

* add debug config for librdkafka
2023-08-26 10:56:35 +08:00
f32efe5758 [Fix](Outfile) Fix that it does not report error when export table to S3 with an incorrect ak/sk/bucket (#23441)
Problem:
It will return a result although we use wrong ak/sk/bucket name, such as:
```sql
mysql> select * from demo.student
    -> into outfile "s3://xxxx/exp_"
    -> format as csv
    -> properties(
    ->   "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com",
    ->   "s3.region" = "ap-beijing",
    ->   "s3.access_key"= "xxx",
    ->   "s3.secret_key" = "yyyy"
    -> );
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL                                                                                                |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
|          1 |         3 |       26 | s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
1 row in set (0.15 sec)
```

The reason for this is that we did not catch the error returned by `close()` phase.
2023-08-26 00:19:30 +08:00
f66f161017 [fix](multi-catalog)fix hive table with cosn location issue (#23409)
Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc).
This PR mainly changes:

1. Fix the bug of accessing files via cosn.
2. Add a new field `fs_name` in TFileRangeDesc
    This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query
request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name
for each file, otherwise, it may return error:

`reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`
2023-08-26 00:16:00 +08:00
8af1e7f27f [Fix](orc-reader) Fix incorrect result if null partition fields in orc file. (#23369)
Fix incorrect result if null partition fields in orc file. 

### Root Cause
Theoretically, the underlying file of the hive partition table should not contain partition fields. But we found that in some user scenarios, the partition field will exist in the underlying orc/parquet file and are null values. As a result, the  pushed down partition field which are null values. filter incorrectly.

### Solution
we handle this case by only reading non-partition fields. The parquet reader is already handled this way, this PR handles the orc reader.
2023-08-26 00:13:11 +08:00
a3a951c71d [Fix](multi-catalog) Fix load string dict issue for transactional hive tables. (#23306)
Fix load string dict issue for transactional hive tables. The column name need to pass 'row.column_name'.

apache/doris-thirdparty#112
2023-08-26 00:09:12 +08:00
2b6d876280 [feature](move-memtable)[6/7] add options to enable memtable on sink node (#23470)
Co-authored-by: Siyang Tang <82279870+TangSiyang2001@users.noreply.github.com>
2023-08-25 22:32:22 +08:00
6e6da733c6 [fix](invert index) fix the keyword type index length limit (#23503) 2023-08-25 21:34:11 +08:00
17e7c1ca53 [fix](fqdn)Fqdn with ipv6 (#22454)
now,`hostname_to_ip` only can resolve `ipv4`,Therefore, a method is provided to parse ipv4 or ipv6 based on parameters。
when `_heartbeat` call `hostname_to_ip`,Resolve to ipv4 or ipv6, determined by `BackendOptions.is_bind_ipv6` Decision
Additionally, a method is provided to first attempt to parse the host into ipv4, and then try ipv6 if it fails
2023-08-25 21:24:55 +08:00
29273771f7 [Fix](multi-catalog) Fix hive incorrect result by disable string dict filter if exprs contain null expr. (#23361)
Issue Number: close #21960

Fix hive incorrect result by disable string dict filter if exprs contain null expr.
2023-08-25 21:16:43 +08:00
9d1c702b3a [improvement](function) do not use hyperscan for non-const partterns in like function (#23495) 2023-08-25 20:40:23 +08:00
49a32c2ee0 [pipelineX](fix) fix two phase execution and add test cases (#23353) 2023-08-25 17:57:35 +08:00
f80b067990 [fix](column) add unimplemented function of ColumnFixedLengthObject (#23468) 2023-08-25 17:38:01 +08:00
1312c12236 Revert "[fix](testcase) fix test case failure of insert null value into not null column (#20963)" (#23462)
* Revert "[fix](testcase) fix test case failure of insert null value into not null column (#20963)"

This reverts commit 55a6649da962fb170ddb40fea8ef26bdc552a51a.

Mannual Revert "fix in strict mode, return error for insert if datatype convert fails (#20378)"

This mannual reverts commit 1b94b6368f5e871c9a0fe53dd7c64409079a4c9d

* fix case failure
2023-08-25 16:47:14 +08:00
5c37be16fe [pipelineX](correctness) Fix close problem for local state (#23479) 2023-08-25 14:19:27 +08:00
Pxl
b96b8f4370 [Bug](jdbc) support get_default on complex type (#23325)
support get_default on complex type
2023-08-25 14:08:24 +08:00
d8e499cb55 [fix](UT) fix flaky test in LoadStreamMgrTest (#23459) 2023-08-25 13:53:20 +08:00
59acf61ec5 [pipelineX](pick) pick 2 PR from pipeline engine (#23463) 2023-08-25 13:26:05 +08:00
84792d0886 fix compile of master (#23467) 2023-08-25 11:47:39 +08:00
8ef6b4d996 [fix](json) fix json int128 overflow (#22917)
* support int128 in jsonb

* fix jsonb int128 write

* fix jsonb to json int128

* fix json functions for int128

* add nereids function jsonb_extract_largeint

* add testcase for json int128

* change docs for json int128

* add nereids function jsonb_extract_largeint

* clang format

* fix check style

* using int128_t = __int128_t for all int128

* use fmt::format_to instead of snprintf digit by digit for int128

* clang format

* delete useless check

* add warn log

* clang format
2023-08-25 11:40:30 +08:00
d331bfc513 [Performance](pipeline) support shared scan segment in mow (#23305) 2023-08-25 10:43:02 +08:00
Pxl
d9db3f5431 [Improvement](scan) Remove redundant predicates on scan node (#23374)
* Remove redundant predicates on scan node

* update

* fix
2023-08-25 10:41:37 +08:00
a305f2ffc2 [fix](pipeline) update status when prepare failed #23419 2023-08-25 10:34:37 +08:00
0a70cbfe99 [feature](move-memtable)[5/7] add olap table sink v2 and writers (#23458)
Co-authored-by: laihui <1353307710@qq.com>
2023-08-25 10:20:06 +08:00
2847c5e5b8 [Optimize](index) Optimize implement the new internal lucene index query interface (#23389) 2023-08-25 10:14:02 +08:00
9cacf9535a [Opt](functions) Use preloaded cache to accelerate timezone parsing (#22694)
* opt

* bugfix

* fix ut

* fix stylecheck
2023-08-25 10:00:48 +08:00
7cfb3cc0aa [fix](functions) fix function substitute for datetimeV1/V2 (#23344)
* fix

* function fe
2023-08-25 09:59:38 +08:00
71071ba057 [feature](move-memtable)[4/7] add stream sink file writer (#23416)
Co-authored-by: laihui <1353307710@qq.com>
2023-08-25 00:08:27 +08:00
98d0a2f6c1 [feature](move-memtable)[3/7] add load stream manager and rpc service (#23415)
Co-authored-by: zhengyu <freeman.zhang1992@gmail.com>
Co-authored-by: Yongqiang YANG <dataroaring@gmail.com>
Co-authored-by: laihui <1353307710@qq.com>
2023-08-25 00:08:04 +08:00
b65023e667 [fix](load) avoid using protobuf set_allocated_id in VDataStreamSender (#23435) 2023-08-25 00:07:09 +08:00
caddcc6215 [Fix](orc-reader) Fix decimal type check for ColumnValueRange issue and use primitive_type. (#23424)
Fix decimal type check for ColumnValueRange issue and use primitive_type in orc_reader. Because in #22842 the `CppType` of `PrimitiveTypeTraits<TYPE_DECIMALXXX> ` were changed.
2023-08-24 23:26:41 +08:00
55e572df82 [pipelineX](analytic operator) Support analytic operator (#23444) 2023-08-24 23:05:29 +08:00
96164f3bdc [pipelinex](sort) Fix expression initialization order (#23405) 2023-08-24 17:29:24 +08:00
c6ac925f5a [fix](common) implement the move assignment operator for Status (#23372) 2023-08-24 14:41:19 +08:00
c775f8e7bd [feature](move-memtable)[2/7] add protos for memtable on sink node (#23348)
Co-authored-by: zhengyu <freeman.zhang1992@gmail.com>
Co-authored-by: laihui <1353307710@qq.com>
2023-08-24 11:11:46 +08:00
687c676160 [FIX](map)fix column map for offset next_array_item_rowid order (#23250)
* fix column map for offset next_array_item_rowid order

* add regress test
2023-08-24 10:57:40 +08:00