Since Filesystem inherited std::enable_shared_from_this , it is dangerous to create native point of FileSystem.
To avoid this behavior, making the constructor of XxxFileSystem a private method and using the static method create(...) to get a new FileSystem object.
Bug fix
fix image loading failture when create catalog with resource
When creating jdbc catalog with resource, the metadata image will failed to be loaded.
Because when loading jdbc catalog image, it will try to get resource from ResourceMgr,
but ResourceMgr has not been loaded, so NPE will be thrown.
This PR fix this bug, and refactor some logic about catalog and resource.
When loading jdbc catalog image, it will not get resource from ResourceMgr.
And now user can create catalog with resource and properties, like:
create catalog jdbc_catalog with resource jdbc_resource
properites("user" = "user1");
The properties in "properties" clause will overwrite the properties in "jdbc_resource".
force adding tinyInt1isBit=false to jdbc url
The default value of tinyInt1isBit is true, and it will cause tinyint in mysql to be bit type.
force adding tinyInt1isBit=false to jdbc url so that the tinyint in mysql will be tinyint in Doris.
Avoid calculate checksum of jdbc driver jar multiple times
Refactor
Refactor the notification logic when updating properties in resource.
When updating properties in resource, it will notify the corresponding catalog to update its own properties.
This PR change this logic. After updating properties in resource, it will only uninitialize the catalog's internal
objects such "jdbc client" or "hms client". And this objects will be re-initialized lazily.
And all properties will be got from Resource at runtime, so that it will always get the latest properties
Regression test cases
Because we add tinyInt1isBit=false to jdbc url, some of cases need to be changed.
When creating an external catalog, Doris will automatically sync the schema of table from external catalog.
But some of column type are not supported by Doris now, such as struct, map, etc.
In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding
table can not be synced. But user may just want to query other supported columns.
In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync.
When meeting unsupported column, it will be synced as column with UNSUPPORTED type.
When query this table, there are serval situation:
select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx
select k1 from table: k1 is with supported type. query OK.
select * except(k2): k2 is with unsupported type. query OK
This pr mainly to optimize the histogram(👉🏻https://github.com/apache/doris/pull/14910) aggregation function. Including the following:
1. Support input parameters `sample_rate` and `max_bucket_num`
2. Add UT and regression test
3. Add documentation
4. Optimize function implementation logic
Parameter description:
- `sample_rate`:Optional. The proportion of sample data used to generate the histogram. The default is 0.2.
- `max_bucket_num`:Optional. Limit the number of histogram buckets. The default value is 128.
---
Example:
```
MySQL [test]> SELECT histogram(c_float) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_float`) |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+
MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_string`) |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+
```
Query result description:
```
{
"sample_rate": 0.2,
"max_bucket_num": 128,
"bucket_num": 3,
"buckets": [
{
"lower": "0.1",
"upper": "0.2",
"count": 2,
"pre_sum": 0,
"ndv": 2
},
{
"lower": "0.8",
"upper": "0.9",
"count": 2,
"pre_sum": 2,
"ndv": 2
},
{
"lower": "1.0",
"upper": "1.0",
"count": 2,
"pre_sum": 4,
"ndv": 1
}
]
}
```
Field description:
- sample_rate:Rate of sampling
- max_bucket_num:Limit the maximum number of buckets
- bucket_num:The actual number of buckets
- buckets:All buckets
- lower:Upper bound of the bucket
- upper:Lower bound of the bucket
- count:The number of elements contained in the bucket
- pre_sum:The total number of elements in the front bucket
- ndv:The number of different values in the bucket
> Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).