Support specifying AccessControllerFactory when creating catalog
create catalog hive properties(
...
"access_controller.class" = "org.apache.doris.mysql.privilege.RangerAccessControllerFactory",
"access_controller.properties.prop1" = "xxx",
"access_controller.properties.prop2" = "yyy",
...
)
So that user can specified their own access controller, such as RangerAccessController
Add interface to check column level privilege
A new method of CatalogAccessController: checkColsPriv(),
for checking column level privileges.
TODO:
Support grant column level privileges statements in Doris
Add TestExternalCatalog/Database/Table/ScanNode
These classes are used for FE unit test. In unit test you can
create catalog test1 properties(
"type" = "test"
"catalog_provider.class" = "org.apache.doris.datasource.ColumnPrivTest$MockedCatalogProvider"
"access_controller.class" = "org.apache.doris.mysql.privilege.TestAccessControllerFactory",
"access_controller.properties.key1" = "val1",
"access_controller.properties.key2" = "val2"
);
To create a test catalog, and specify catalog_provider to mock database/table/schema metadata
Set roles in current user identity in connection context
The roles can be used for authorization in access controller.
The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5`
make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`,
triggered by skipping _colname_to_value_range init in #16818 .
This PR makes two changes:
1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param
2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump
Describe your changes.
1.DeployManager adds the ability to obtain domain names from third-party systems
2.When the DeployManager determines whether the node exists, add the domain name judgment logic
3.rename Backend.getHost() to getIp()
4.Delete the logic for handling UnknownHostException in FQDNManager, because there are two cases of UnknownHostException. If it occurs temporarily, it can wait for the next detection. If the node is deleted, the logic can be handed over to DeployManager for processing.
This commits forbid struct and map type to be distributed key/aggregation key.
The sql such as:
select distinct stuct_col from struct_table
will report an error.
1、support stream load with json, csv format for map
2、fix olap convertor when compaction action in map column which has null
3、support select outToFile for map
4、add some regression-test
Steps:
1. drop the old MTMV jobs
2. clear the old task records and clean the running and pending tasks
3. set the new scheduler info in MTMV and replay it in followers.
4. create a job in the master node.
Note that if you change the refresh info of MTMV, the old MTMV tasks will be cleaned.
The background is described in this issue: #15723,
where users used Apache Druid to satisfy such lambada requirements before.
We will not make Doris dropping data not belonged to current time window automatically like Druid,
which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way:
1. Support mutable property for a partition.
2. The mutable property of a partition is passed from FE to BE in a load procedure
3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio',
so that data write to immutable partition will be neglected and not cause load failure.
Use Example:
1. Add immutable partition or modify an partition to be immutable:
- alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true');
- alter table test_tbl modify partition xx set ('mutable' = 'false');
2. Write 5 records into table, two of then belongs to immutable partition
Introduced a new function non_nullable to BE, which can extract concrete data column from a nullable column. If the input argument is already not a nullable column, raise an error.
fix dbt incremental :new ideas for no rollback and support incremental data rerun .
add snapshot
use 'mysql-connector-python' mysql driver to replace 'MysqlDb' driver
Support config like whether push down to es and refactor some code
Like transform to wildcard query and push down to es, this increases the cpu consumption of the es,
I add a switch control it.
* [typo](docs)supplement the document content
* Update grouping.md
Add space before and after English letters in CN docs and keep the English case consistent.
* Update grouping.md
Change the Chinese title to English
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
Logic in function VCollectIterator::build_heap is not robust, which may cause memory leak:
Level1Iterator* cumu_iter = new Level1Iterator(
cumu_children, _reader, cumu_children.size() > 1, _is_reverse, _skip_same);
RETURN_IF_NOT_EOF_AND_OK(cumu_iter->init());
std::list<LevelIterator*> children;
children.push_back(*base_reader_child);
children.push_back(cumu_iter);
_inner_iter.reset(
new Level1Iterator(children, _reader, _merge, _is_reverse, _skip_same));
cumu_iter will be leaked if cumu_iter->init()); is not success.
The element in InvertedIndexSearcherCache is inverted index searcher, which is a file descriptor of inverted index file, so InvertedIndexSearcherCache is actually cache file descriptor of inverted index file.
If open file descriptor limit of the Linux system is set too small and config inverted_index_searcher_cache_limit is too big, during high pressure load maybe cause "Too many open files".
So, when insert inverted index searcher into InvertedIndexSearcherCache, need also check whether reach file_descriptor_number limit for inverted index file.
There is a bug in inverted_index_writer when adding multiple lines array values' index.
This problem can cause error result when doing schema change adding index.
From the original logic, query like `select * from a where exists (select * from b order by 1) order by 1 limit 1` is a query contains subquery,
but the top query will pass `checkEnableTwoPhaseRead` and set `isTwoPhaseOptEnabled=true`.So check the double plan is a general topn query plan is needed, and rollback the needMaterialize flag setted by the previous `analyze`.
* [improve](dynamic table) refine SegmentWriter columns writer generate
```
Dynamic Block consists of two parts, dynamic part of columns and static part of columns
static dynamic
| ----- | ------- |
the static ones are original _tablet_schame columns
the dynamic ones are auto generated and extended from file scan.
```
**We should only consisder to use Block info to generte columns when it's a dynamic table load procudure.**
And seperate the static ones and dynamic ones
* test
property `hive.metastore.kerberos.principal` is essential when the principal of hms you are connecting is not the
default value: hive-metastore/_HOST@your_realms。
otherwise, you will get error: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)