Commit Graph

5457 Commits

Author SHA1 Message Date
5694f1b04b [fix](conf) revert changes in regression-conf.groovy in #23702 and fix BE compile error (#23788) 2023-09-02 22:31:49 +08:00
347cceb530 [Feature](inverted index) push count on index down to scan node (#22687)
Co-authored-by: airborne12 <airborne12@gmail.com>
2023-09-02 22:24:43 +08:00
95488c4d93 [Fix](vscanner) remove TEMP column in block after filter (#23778) 2023-09-02 21:54:27 +08:00
eedd24316d [Feature](CCR) Support MoW for CCR (#22798) 2023-09-02 20:40:06 +08:00
6b56896a01 [chore](json reader) add original data to error messge for tracing (#22803) 2023-09-02 20:15:18 +08:00
9898c08620 [enhancement](merge-on-write) Add delete bitmap correctness check in commit phase (#23316) 2023-09-02 20:03:00 +08:00
26571a00bf [Fix](IndexColumnWriter) Add logic for IndexedColumnWriter::add when the current page is full (#23766) 2023-09-02 14:28:59 +08:00
a6dff2faf0 [Feature](config) allow update multiple be configs in one request (#23702) 2023-09-02 14:26:54 +08:00
a542f107db [feature](move-memtable) buffer messages in load stream stub (#23721) 2023-09-02 13:42:34 +08:00
228f0ac5bb [Feature](Multi-Catalog) support query doris bitmap column in external jdbc catalog (#23021) 2023-09-02 12:46:33 +08:00
8bad3bbd62 [fix](be) doris_be compile failed(#20932) (#20932)
Co-authored-by: yiguolei <676222867@qq.com>
2023-09-02 08:26:24 +08:00
68aa4867b0 [fix](map_agg) lost scale information for decimal type (#23776) 2023-09-02 08:03:33 +08:00
e5d1248c72 [Fix] spark load not found file #23502
Use the thrift interface to assign values to variables.
Otherwise, __isset will returns false.
2023-09-02 01:16:38 +08:00
18d470ecf7 [improvement](config) add a specific be config for segment_cache_capacity (#23701)
* add segment_cache_capacity config istead of fd limit * 2/5
* default -1 for backward compatibility
2023-09-02 01:14:14 +08:00
657e927d50 [fix](json)Fix the bug that read json file Out of bounds access (#23411) 2023-09-02 01:11:37 +08:00
75e2bc8a25 [function](bitmap) support bitmap_to_base64 and bitmap_from_base64 (#23759) 2023-09-02 00:58:48 +08:00
e0efda1234 [Improvement](Slice) support move constructor and operator for Slice (#23694) 2023-09-01 21:05:10 +08:00
9d2fc78bd5 [fix](cooldown) Fix potential data loss when clone task's dst tablet is cooldown replica (#17644)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
Co-authored-by: Kang <kxiao.tiger@gmail.com>
2023-09-01 15:27:52 +08:00
91c5640cae [fix](tablet clone) fix clone backend chose wrong disk (#23729) 2023-09-01 15:12:35 +08:00
Pxl
32853a529c [Bug](cte) fix multi cast data stream source not open expr (#23740)
fix multi cast data stream source not open expr
2023-09-01 14:57:12 +08:00
eaf2a6a80e [fix](date) return right date value even if out of the range of date dictionary(#23664)
PR(https://github.com/apache/doris/pull/22360) and PR(https://github.com/apache/doris/pull/22384) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark.
How to fix:
1. Increase dictionary range: 1900 ~ 2038
2. The date out of 1900 ~ 2038 is regenerated.
2023-09-01 14:40:20 +08:00
c31cb5fd11 [enhance] use correct default value for show config action (#19284) 2023-09-01 11:28:26 +08:00
e1090d6a63 [Fix](column predicate) seperate CHAR primitive type for column predicate (#23581) 2023-09-01 09:41:53 +08:00
hzq
16d6357266 [fix] (mac compile) Fix mac compile error & fe start time related (#23727)
Fix of PR #23582

Some Fe codes are deleted by [Improvement](pipeline) Cancel outdated query if original fe restarts #23582 , need to be added back;
Fix mac build failed caused by wrong thrift declaration order.
2023-09-01 08:02:30 +08:00
65f41f71c1 [pipelineX](refactor) refine codes (#23726) 2023-09-01 07:57:35 +08:00
c74ca15753 [pipeline](sink) Supprt Async Writer Sink of result file sink and memory scratch sink (#23589) 2023-08-31 22:44:25 +08:00
e680d42fe7 [feature](information_schema)add metadata_name_ids for quickly get catlogs,db,table and add profiling table in order to Compatible with mysql (#22702)
add information_schema.metadata_name_idsfor quickly get catlogs,db,table.

1. table  struct :   
```mysql
mysql> desc  internal.information_schema.metadata_name_ids;
+---------------+--------------+------+-------+---------+-------+
| Field         | Type         | Null | Key   | Default | Extra |
+---------------+--------------+------+-------+---------+-------+
| CATALOG_ID    | BIGINT       | Yes  | false | NULL    |       |
| CATALOG_NAME  | VARCHAR(512) | Yes  | false | NULL    |       |
| DATABASE_ID   | BIGINT       | Yes  | false | NULL    |       |
| DATABASE_NAME | VARCHAR(64)  | Yes  | false | NULL    |       |
| TABLE_ID      | BIGINT       | Yes  | false | NULL    |       |
| TABLE_NAME    | VARCHAR(64)  | Yes  | false | NULL    |       |
+---------------+--------------+------+-------+---------+-------+
6 rows in set (0.00 sec) 


mysql> select * from internal.information_schema.metadata_name_ids where CATALOG_NAME="hive1" limit 1 \G;
*************************** 1. row ***************************
   CATALOG_ID: 113008
 CATALOG_NAME: hive1
  DATABASE_ID: 113042
DATABASE_NAME: ssb1_parquet
     TABLE_ID: 114009
   TABLE_NAME: dates
1 row in set (0.07 sec)
```

2. when you create / drop catalog , need not refresh catalog . 
```mysql
mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 21301
1 row in set (0.34 sec)


mysql> drop catalog hive2;
Query OK, 0 rows affected (0.01 sec)

mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 10665
1 row in set (0.04 sec) 


mysql> create catalog hive3 ... 
mysql> select count(*) from internal.information_schema.metadata_name_ids\G;                                                                        
*************************** 1. row ***************************
count(*): 21301
1 row in set (0.32 sec)
```

3. create / drop table , need not refresh catalog .  
```mysql
mysql> CREATE TABLE IF NOT EXISTS demo.example_tbl ... ;


mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 10666
1 row in set (0.04 sec)

mysql> drop table demo.example_tbl;
Query OK, 0 rows affected (0.01 sec)

mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 10665
1 row in set (0.04 sec) 

```

4. you can set query time , prevent queries from taking too long . 
```

fe.conf :  query_metadata_name_ids_timeout 

the time used to obtain all tables in one database

```
5. add information_schema.profiling in order to Compatible with  mysql

```mysql
mysql> select * from information_schema.profiling;
Empty set (0.07 sec)

mysql> set profiling=1;                                                                                 
Query OK, 0 rows affected (0.01 sec)
```
2023-08-31 21:22:26 +08:00
6fe2418cfc [fix](filter) fix error id in bloomfilter (#23564)
1. "set" may overwrite the original ID.
2.A bloom filter may not necessarily be an IN_OR_BLOOM_FILTER.

before may be
RuntimeFilterInfo  id  -1:  [type  =  BF,  input  =  25,  filtered  =  0]
now 
 RuntimeFilterInfo  id  0:  [type  =  BF,  input  =  25,  filtered  =  0]
2023-08-31 21:12:09 +08:00
25b6e4deb2 [fix](daemon) Fix incorrect initialization order of daemon services (#23578)
Current initialization dependency:

      Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo
                │
                │
BackendService ─┘
However, original code incorrectly initialize Daemon before StorageEngine.
This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.
2023-08-31 19:46:38 +08:00
b3a9c247af [refactor](move-memtable) add load stream stub (#23642) 2023-08-31 19:39:34 +08:00
f1e43fcaa4 [opt](cache) Support segment cache dynamic opening and closing (#23659)
Dynamically modify the config to clear the cache, each time the disable cache will only be cleared once.
TODO, Support page cache and other caches.

curl -X POST http://xxxx:8040/api/update_config?disable_segment_cache=true
2023-08-31 18:48:26 +08:00
3a2c0d16f7 [fix](parquet) fix potential heap-use-after-free issue and cache issue (#23638)
1. When file meta cache is disabled (by setting `max_external_file_meta_cache_num=0` in be.conf),
the parquet's meta info is owned by parquet reader and will be released when calling `reader->close()`.

But the underlying file reader of this parquet reader will be released after `reader->close()`,
this may causing `heap-use-after-free` bug because some part of meta info may be referenced by file reader.

This PR fix it by making sure that meta info is released after file reader released.

2. Add modification time for file meta cache in BE, to avoid parquet read error like:
`Failed to deserialize parquet page header`
2023-08-31 18:23:05 +08:00
hzq
c083336bbe [Improvement](pipeline) Cancel outdated query if original fe restarts (#23582)
If any FE restarts, queries that is emitted from this FE will be cancelled.

Implementation of #23704
2023-08-31 17:58:52 +08:00
cb2515b7c8 [Fix](meta lock) Should not acquire wlock twice (#23666) 2023-08-31 15:53:35 +08:00
62c075bf7e [improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672) 2023-08-31 14:44:17 +08:00
409640ac46 [Bug](decimal) Prevent invalid decimal value (#23677) 2023-08-31 14:43:10 +08:00
126606cb4d [Fix](cache) fix query cache returns wrong result after deleting partitions. (#23555)
The reason is that sql cache just use partitionKey , latestVersion and latestTime to check if the cache should be returned, if we delete some partition(s) which is not the latest updated partition, all above values are not changed, so the cache will hit.
Use a field to save the partition num of these tables and sum the partition nums and send it to BE, there are two situations which contains delete-partition ops:

- just delete some partition(s), so the sum of partition num will be lower than before.
- delete some partition(s) coexists with add some partition(s), so the latest time or latest version will be higher than before.
2023-08-31 14:22:52 +08:00
46eb0c7796 [Fix](status) fix printing too many logs in VNodeChannel::try_send_and_fetch_status #23693
after #23425, Status::InternalError(...) will print stacktrace and warning logs, so we can't use it in VNodeChannel::try_send_and_fetch_status
2023-08-31 13:54:23 +08:00
d22290e548 [pipelineX](join) support hash join (#23689) 2023-08-31 13:01:26 +08:00
Pxl
f35ab37e1e [Bug](materialized-view) fix load db use analyzer to analyze diffrent metaindex (#23673)
fix load db use analyzer to analyze diffrent metaindex
2023-08-31 12:35:38 +08:00
3e4ee3c1e6 [fix](jdbc catalog) fix jdbc driver cache load error (#23656)
log error:
`W20230830 11:19:47.495721 3046231 status.h:363] meet error status: [INTERNAL_ERROR]user function's name should be function_id.checksum[.file_name].file_type, now the all split parts are by delimiter(.): 7119053928154065546.20c8228267b6c9ce620fddb39467d3eb.postgresql-42.5.0.jar`

When the jdbc driver had `.` in its name we failed to split it properly
2023-08-31 10:17:15 +08:00
449c595f9d [opt](FileReader) InMemoryReader is only used in s3 (#23486)
If file size < 8MB, the file will be read into memory, and this idea is from https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md#s3inmemoryinputstream. However, in some cases, we only read one or two columns in a file, and the actually required bytes is only 1%, resulting in a multiple fold increase in the amount of data read. Therefore, `InMemoryReader` can only used in object storage, and reduce the threshold.
2023-08-30 20:43:39 +08:00
05771e8a14 [Enhancement](Load) stream Load using SQL (#23362)
Using stream load in SQL mode

for example:
example.csv

10000,北京
10001,天津
curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select c1,c2 from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
curl -v --location-trusted -u root: -H "sql: insert into test.t2(c1, c2, c3) select c1,c2, 'aaa' from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
curl -v --location-trusted -u root: -H "sql: insert into test.t3(c1, c2) select c1, count(1) from stream(\"format\" = \"CSV\", \"column_separator\" = \",\") group by c1" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
2023-08-30 19:02:48 +08:00
25b8831afd [fix](Outfile) fix core dump when export data to orc file format using outfile (#23586)
* fix

* add test
2023-08-30 19:01:44 +08:00
f7caae08d5 [fix](union) should open/alloc_resource in sink operator instead of source (#23637) 2023-08-30 18:58:59 +08:00
6d41272421 [Opt](pipeline) Refactor the short circuit of join pipeline (#23639)
* [Opt](pipeline) Refactor the short circuit of join pipeline

* change core by cr
2023-08-30 18:44:14 +08:00
14310ad30b [improvement](move-memtable) wait StreamClose from remote (#23605)
* [fix](move-memtable) wait StreamClose from remote
2023-08-30 18:03:36 +08:00
1ce783fb23 [fix](stacktrace) Temporary fix ARM and MacOS stacktrace #23650 2023-08-30 14:51:20 +08:00
942a119881 [bug](java-udf) fix java-udf not return const column when all args are const values (#23188)
eg: udf('asd'), this need return a const column, otherwise will be failed use the return column as other function params.

mysql> select concat('a', 'b', cuuid9('a'), ':c');
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[CANCELLED][INTERNAL_ERROR]const check failed, expr=VectorizedFn[VectorizedFnCallconcat]{
VLiteral (name = String, type = String, value = (a)),
VLiteral (name = String, type = String, value = (b)),
VectorizedFn[VectorizedFnCallcuuid9]
{ VLiteral (name = String, type = String, value = (a))},
VLiteral (name = String, type = String, value = (:c))}
2023-08-30 10:46:47 +08:00
e05a0466f2 [improve](Status) Add new status codeKEY_NOT_FOUND and KEY_ALREADY_EXISTS for merge on write (#23619) 2023-08-30 08:50:07 +08:00