Commit Graph

8047 Commits

Author SHA1 Message Date
211cc66d02 [fix](multi-catalog) fix image loading failture when create catalog with resource (#15692)
Bug fix
fix image loading failture when create catalog with resource
When creating jdbc catalog with resource, the metadata image will failed to be loaded.
Because when loading jdbc catalog image, it will try to get resource from ResourceMgr,
but ResourceMgr has not been loaded, so NPE will be thrown.

This PR fix this bug, and refactor some logic about catalog and resource.

When loading jdbc catalog image, it will not get resource from ResourceMgr.
And now user can create catalog with resource and properties, like:

create catalog jdbc_catalog with resource jdbc_resource
properites("user" = "user1");
The properties in "properties" clause will overwrite the properties in "jdbc_resource".

force adding tinyInt1isBit=false to jdbc url
The default value of tinyInt1isBit is true, and it will cause tinyint in mysql to be bit type.
force adding tinyInt1isBit=false to jdbc url so that the tinyint in mysql will be tinyint in Doris.

Avoid calculate checksum of jdbc driver jar multiple times
Refactor
Refactor the notification logic when updating properties in resource.
When updating properties in resource, it will notify the corresponding catalog to update its own properties.
This PR change this logic. After updating properties in resource, it will only uninitialize the catalog's internal
objects such "jdbc client" or "hms client". And this objects will be re-initialized lazily.

And all properties will be got from Resource at runtime, so that it will always get the latest properties

Regression test cases
Because we add tinyInt1isBit=false to jdbc url, some of cases need to be changed.
2023-01-09 09:56:26 +08:00
Pxl
1514b5ab5c [Feature](Materialized-View) support advanced Materialized-View (#15212) 2023-01-09 09:53:11 +08:00
97cea9b5c9 [improvement](bdbje) add more log to make bdbje DatabaseNotFoundException problem easily solved (#15715)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-01-09 08:55:21 +08:00
wxy
6829d361cb [Feature](audit) add errorCode and errorMessage in audit log (#14925)
* [feat] add errorCode and errorMessage in audit log.

* [Feature](audit) add errorCode and errorMessage in audit log

Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-01-09 08:47:57 +08:00
c57fa7c930 [Pipeline] Fix PipScannerContext::can_finish return wrong status (#15259)
Now in ScannerContext::push_back_scanner_and_reschedule, _num_running_scanners-- is before _num_scheduling_ctx++.
InPipScannerContext::can_finish, we check _num_running_scanners == 0 && _num_scheduling_ctx == 0 without obtaining _transfer_lock.
In follow case, PipScannerContext::can_finish will return wrong result.

_num_running_scanners--
Check _num_running_scanners == 0 && _num_scheduling_ctx == 0` return true.
_num_scheduling_ctx++
So, we can set _num_running_scanners-- in the last of this func.

Describe your changes.

PipScannerContext::get_block_from_queue not block.
Set _num_running_scanners-- in the last of ScannerContext::push_back_scanner_and_reschedule.
2023-01-09 08:46:58 +08:00
663676ccfe fix(ui): 1. fix component/table can not change pageSize,affect system/query profile/session page etc. (#15533)
2. add antd Table Component missing rowKey property to fit react specification
2. fix system/query profile/session/configuration page maybe lead memory leak when switch these pages fast
3.other grammar fix to fit typescript and react specification

Co-authored-by: tongyang.hty <hantongyang@douyu.tv>
2023-01-09 08:46:18 +08:00
ba54634d55 [refactor] delete non vec load from memtable (#15667)
* [refactor] delete non vec load from memtable
delete non vec load from memtable totally.

remove function keys_type() in memtable.

Co-authored-by: zhoubintao <1229701101@qq.com>
2023-01-09 08:41:58 +08:00
wxy
fb1f6bdd82 [doc](export) add docs for cancel-export. (#15682)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-01-09 08:38:45 +08:00
f256bb8d39 [fix](meta) fix priv table load bug when upgrading to 1.2.x (#15706)
In old version, NODE_PRIV will be incorrectly assigned to normal users.
So when upgrading to 1.2.x, it will failed to handle this unexpected case.
This PR fix this by removing NODE_PRIV from normal user.
2023-01-09 08:38:26 +08:00
36590da24b [fix](regression p0) add the alias function hist to histogram and fix p0 (#15708)
add the alias function hist to histogram and fix p0
2023-01-08 11:31:23 +08:00
90be1a22a9 [bugfix](vertical compaction) fix dcheck failed in MOW tablet (#15638)
fix a dcheck error for vertical compaction on Merge-On-Write table。
When merge rowsets with empty segment, VerticalHeapMergeIterator::init
return ok directly and _record_rowids not set, dcheck failed when
_unique_key_next_block call current_block_row_locations。
2023-01-08 10:39:52 +08:00
500c7fb702 [improvement](multi-catalog) support unsupported column type (#15660)
When creating an external catalog, Doris will automatically sync the schema of table from external catalog.
But some of column type are not supported by Doris now, such as struct, map, etc.

In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding
table can not be synced. But user may just want to query other supported columns.

In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync.
When meeting unsupported column, it will be synced as column with UNSUPPORTED type.

When query this table, there are serval situation:

select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx
select k1 from table: k1 is with supported type. query OK.
select * except(k2): k2 is with unsupported type. query OK
2023-01-08 10:07:10 +08:00
707eab9a63 [opt](multi-catalog) cache and reuse position delete rows in iceberg v2 (#15670)
A deleted file may belong to multiple data files. Each data file will read a full amount of deleted files,
so a deleted file may be read repeatedly. The deleted files can be cached, and multiple data files
can reuse the first read content.

The performance is improved by 60% in the case of single thread, and by 30% in the case of multithreading.
2023-01-07 22:29:11 +08:00
a1d8177e33 [fix](test) remove unstable regression test (#15689)
remove `regression-test/suites/nereids_performance_p0/sub_query_join_where_pushdown.groovy`
2023-01-07 22:05:22 +08:00
ae1a77e034 add Q&A to jdbc external table (#15680) 2023-01-07 20:04:02 +08:00
054af036fe [typo](doc) fix Chinese describe (#15683) 2023-01-07 20:02:44 +08:00
5dfdacd278 [enhancement](histogram) add histogram syntax and perstist histogram statistics (#15490)
Histogram statistics are more expensive to collect and we collect and persist them separately.

This PR does the following work:
1. Add histogram syntax and add keyword `TABLE`
2. Add the task of collecting histogram statistics
3. Persistent histogram statistics
4. Replace fastjson with gson
5. Add unit tests...

Relevant syntax examples:
> Refer to some databases such as mysql and add the keyword `TABLE`.

```SQL
-- collect column statistics
ANALYZE TABLE statistics_test;

-- collect histogram statistics
ANALYZE TABLE statistics_test UPDATE HISTOGRAM ON col1,col2;
```

base on #15317
2023-01-07 00:55:42 +08:00
76ad599fd7 [enhancement](histogram) optimise aggregate function histogram (#15317)
This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910)  aggregation function. Including the following:
1. Support input parameters `sample_rate` and `max_bucket_num`
2. Add UT and regression test
3. Add documentation
4. Optimize function implementation logic
 
Parameter description:
- `sample_rate`:Optional. The proportion of sample data used to generate the histogram. The default is 0.2.
- `max_bucket_num`:Optional. Limit the number of histogram buckets. The default value is 128.

---

Example:

```
MySQL [test]> SELECT histogram(c_float) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_float`)                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+

MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_string`)                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+
```

Query result description:

```
{
    "sample_rate": 0.2, 
    "max_bucket_num": 128, 
    "bucket_num": 3, 
    "buckets": [
        {
            "lower": "0.1", 
            "upper": "0.2", 
            "count": 2, 
            "pre_sum": 0, 
            "ndv": 2
        }, 
        {
            "lower": "0.8", 
            "upper": "0.9", 
            "count": 2, 
            "pre_sum": 2, 
            "ndv": 2
        }, 
        {
            "lower": "1.0", 
            "upper": "1.0", 
            "count": 2, 
            "pre_sum": 4, 
            "ndv": 1
        }
    ]
}
```

Field description:
- sample_rate:Rate of sampling
- max_bucket_num:Limit the maximum number of buckets
- bucket_num:The actual number of buckets
- buckets:All buckets
    - lower:Upper bound of the bucket
    - upper:Lower bound of the bucket
    - count:The number of elements contained in the bucket
    - pre_sum:The total number of elements in the front bucket
    - ndv:The number of different values in the bucket

> Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).
2023-01-07 00:50:32 +08:00
9c8fcd805c [feature](Nereids) support variable type expression (#15659) 2023-01-07 00:32:57 +08:00
08d439cde7 [feature](Nereids) add keyword rlike (#15647) 2023-01-07 00:28:21 +08:00
a6773417ef [Doc] Add sidebars for split_by_string function and delete split_by_char builtins code (#15679) 2023-01-06 21:14:26 +08:00
c18bfdc93e [test][regression cases][external]add external table p2 regression cases according doris1.2 docs 20230105 (#15651) 2023-01-06 20:19:39 +08:00
cad47dd9d9 [test](Nereids) add two regression test cases for Nereids (#15598)
1. test predicates infer could work well with push down predicates through join
2. test count with subquery containing constant literal
2023-01-06 16:29:50 +08:00
53559e2bdc [fix](decimalv2) fix loss of precision when cast to decimalv2 literal (#15629) 2023-01-06 16:02:46 +08:00
9c36278c4a [improvement](pipeline) Support sharing hash table for broadcast join (#15628) 2023-01-06 15:11:28 +08:00
1038093c29 [Pipeline](Exec) disable work steal of hash join build (#15652) 2023-01-06 15:08:10 +08:00
f24659c003 [Refactor](pipeline) refactor the code of channel buffer limit and change the default value (#15650) 2023-01-06 14:52:43 +08:00
7f84db310a [fix](nereids) Convert to datetime when binary expr's left is date and right is int type (#15615)
In the below case, expression ` date > 20200101` should implicit cast date both side to datetime instead of bigint

```sql
        CREATE TABLE `part_by_date`
        (
            `date`                  date   NOT NULL COMMENT '',
            `id`                      int(11) NOT NULL COMMENT ''
        ) ENGINE=OLAP
        UNIQUE KEY(`date`, `id`)
        PARTITION BY RANGE(`date`) 
        (PARTITION p201912 VALUES [('0000-01-01'), ('2020-01-01')),
        PARTITION p202001 VALUES [('2020-01-01'), ('2020-02-01')))
        DISTRIBUTED BY HASH(`id`) BUCKETS 3
        PROPERTIES (
        "replication_allocation" = "tag.location.default: 1"
        );

        INSERT INTO  part_by_date VALUES('0001-02-01', 1),('2020-01-15', 2);

        SELECT
            id
        FROM
           part_by_date
        WHERE date > 20200101;
```
2023-01-06 14:08:05 +08:00
ae77b582f0 [fix](Nereids) add information function and fix bugs in schemaScan (#15608)
1. Add information function
- Database()
- User()
- Current_User()
- Connection_id()

2. Fix bugs in schemaScan
2023-01-06 13:37:27 +08:00
ef72b8d859 [Feature](Nereids): add logical operator || && (#15643) 2023-01-06 12:18:21 +08:00
df2da89b89 [feature](multi-catalog) support postgresql jdbc catalog (#15570)
support postgresql jdbc catalog
2023-01-06 11:00:59 +08:00
b57500d0c3 [Bug](decimalv3) fix wrong result for MOD operation (#15644) 2023-01-06 10:38:53 +08:00
05d72e8919 [fix](join) fix anti join incorrectly outputs null values (#15567) 2023-01-06 09:55:48 +08:00
b41934864e [enhancement](frontendservice) add retry when create connection to frontend service (#15635) 2023-01-06 09:15:08 +08:00
95f2f43c02 [fix](macOS) Failed to run BE UT due to syscall to map cache into shared region failed (#15641)
According to the post https://developer.apple.com/forums/thread/676684, the executable whose size is bigger than 2G may fail to start. The size of the executable `doris_be_test` generated by run-be-ut.sh is 2.1G (> 2G) now and we can't run it on macOS (arm64).

We can separate the debug info from the executable `doris_be_test` to reduce the size. After that, we can run `doris_be_test` successfully.
2023-01-06 01:23:37 +08:00
6d691edcc7 [fix](Nereids): restrict join reorder project. (#15645) 2023-01-06 00:18:05 +08:00
77ffafb766 [vulnerability](CVE-2022-1292) fix CVE-2022-1292 (#15639) 2023-01-05 21:57:16 +08:00
9d1f02c580 [Improvement](topn) runtime prune for topn query (#15558) 2023-01-05 20:10:12 +08:00
d36b93708c [feature](Nereids): add ExtractFilterFromJoin rule to support more (#14896) 2023-01-05 19:09:43 +08:00
5460c873e8 [Feature] (Nereids) support un equals conjuncts in un scalar sub query (#15591)
support un equals conjuncts in un scalar sub query.
[fix] in correlated subquery wrong result
2023-01-05 16:56:14 +08:00
5ee479f45c [Pipeline](load) Support transaction on pipeline engine (#15597) 2023-01-05 15:59:18 +08:00
6523b546ab [chore](vulnerability) fix some high risk vulnerabilities report by bug scanner (#15621)
* [chore](vulnerability) fix some high risk vulnerabilities report by bug scanner
2023-01-05 14:58:23 +08:00
0dfa143140 [enhancement](Nereids) generate colocate join when property is different with require property (#15479)
1. When checking HashProperty which's type is nature, we only need to check whether the required properties contain all shuffle column
2. In ChildrenPropertiesRegulator.java, when colocate/buckte join is not allowed, we will enforce the required property.
2023-01-05 11:41:18 +08:00
4f2a36f032 [project] update year in NOTICE.txt (#15632)
* [project] update year in NOTICE.txt
2023-01-05 10:22:34 +08:00
1018657d9d [Enhancement](SparkLoad): avoid BE OOM in push task, fix #15572 (#15620)
Release memory pool held by the parquet reader when the data has been flushed by rowset writter.
Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-01-05 10:20:32 +08:00
59f34be41f [fix](having-clause) having clause do not works correct with same alias name (#15143) 2023-01-05 10:15:15 +08:00
Pxl
93f5e440eb [Bug](execute) fix get next non stop for eos on streaming preagg (#15611)
* fix get nnext non stop for eos on streaming preagg

* update
2023-01-05 09:36:11 +08:00
5ff5b8fc98 [feature](mark join) Support mark join for hash join node (#15569)
* [feature](mark join) Support mark join for hash join node
2023-01-05 09:32:26 +08:00
61d538c713 [improvement](storage-policy) Add check validity when create storage policy. (#14405) 2023-01-04 22:24:49 +08:00
e67ea1ddb7 [fix](doc): catalog use resource doc error (#15607) 2023-01-04 19:53:25 +08:00