Commit Graph

8812 Commits

Author SHA1 Message Date
57519fcf50 [fix](information_schema) catch and skip exception when getting schema from FE catalog (#16647)
When querying information_schema database, BE will call FE RPC
to get schema info such as db name list, table name list, etc.
But some external catalog when failed to get these info because of wrong connection info.
We should catch these kind of exception and skip it, so that it can continue to
get schema info of other catalogs.
Otherwise, the whole query on information_schema will fail, even if user just want to get
info of internal catalog.

And set jdbc connection timeout to 5s, to avoid thrift rpc timeout from BE to FE(default is 30s)
2023-02-21 08:43:09 +08:00
c618e69f59 [typo](docs)supplement the document content for grouping_id.md. (#16926)
* [typo](docs)supplement the document content for grouping_id.md.

* Update grouping_id.md

* Update grouping_id.md
2023-02-21 08:27:25 +08:00
e04c13b7a6 [enhancement](exception safe) make function state exception safe (#16771) 2023-02-20 23:01:45 +08:00
a46941c684 [Fix](multi-catalog) Fix switch-case fall-through issue in multi-catalog module. (#16931)
Fix switch-case fall-through issue in multi-catalog module.
2023-02-20 21:35:41 +08:00
a1799e5506 [improve](point query) reuse rowset from lookup_row_key to eliminate tablet lock (#16770)
Reuse rowset for 2 reasons:
1. eliminate tablet lock for performance issue, if other thread hold the lock too long could affect point query latency
2. rowset should be acquired during lookup procedure
2023-02-20 18:38:11 +08:00
83ab29fd56 [Fix](inverted index) fix compound directory unlock problem (#16861)
In DorisCompoundDirectory::FSIndexInput::close, use lock_guard to automatic unlock, or it may cause lock leak.
2023-02-20 18:29:39 +08:00
f32cd2c123 [fix](statistics) fix a problem with histogram statistics collection parameters (#16918)
1. Fixed a problem with histogram statistics collection parameters.
2. Solved the problem that it takes a long time to collect histogram statistics.

TODO: Optimize histogram statistics sampling method and make the sampling parameters effective.

The problem is that the histogram function works as expected in the single-node test, but doesn't work in the multi-node test. In addition, the performance of the current support sampling to collect histogram is low, resulting in a large time consumption when collecting histogram information.

Fixed the parameter issue and temporarily removed support for sampling to speed up the collection of histogram statistics.

Will next support sampling to collect histogram information.
2023-02-20 16:33:18 +08:00
c98a0bf803 [Enchancement](merge-on-write) check the correctness of rowid conversion after compaction (#16689)
MoW updates the delete bitmap of the imported data during the compaction by rowid conversion. The correctness of rowid conversion is very important to the result of delete bitmap. So I add a rowid conversion result check.
2023-02-20 16:27:18 +08:00
3a5e8f83e8 [fix](merge-on-write) fix that be may coredump when sequence column is null (#16832)
To facilitate the use of the primary key index, encode the seq column to the minimum value of the corresponding length when the seq column is null.
2023-02-20 16:25:52 +08:00
a3aceab72b [Fix](inverted index) fix inverted index bkd reader memory leak problem (#16885)
Original implementation of get_bkd_reader's raw pointer usage may cause memory leak problem, use shared_ptr to avoid that.
2023-02-20 15:39:04 +08:00
66e283ac7f [improvement](doc) change some version from dev to 1.2.2 (#16907) 2023-02-20 14:48:12 +08:00
218c90c159 [improvement](test) Add clickbench and arm pipeline trigger (#16922) 2023-02-20 14:15:42 +08:00
21a9f5102f [doc](typo) Update spark-load-manual.md (#16911) 2023-02-20 13:22:21 +08:00
Pxl
ce3afe7f13 [Enchancement](Materialized-View) forbiden some case in create mv with group by and fix select fail on g… (#16820)
1. forbiden some case in create mv with group by

select k1+1,sum(abs(k2+2)+k3+3) from d_table group by k1; 

   2. fix select fail on grouping column have diffrent expr with select list

create materialized view k1p2ap3psg as select k1+1,sum(abs(k2+2)+k3+3) from d_table group by k1+1;

mysql [test]>explain select k1+1,sum(abs(k2+2)+k3+3) from d_table group by k1;
ERROR 1105 (HY000): errCode = 2, detailMessage = select list expression not produced by aggregation output (missing from GROUP BY clause?): `k1` + 1
2023-02-20 13:04:50 +08:00
1011422e6d [feature](Nereids): infer isNotNull from Inner/Semi/Anti Join (#16821) 2023-02-20 12:14:15 +08:00
0b96ddc090 [improvement](help-doc) Add help doc format unit test (#16904)
[improvement](help-doc) Add help doc format unit test #16904
2023-02-20 12:02:51 +08:00
a17a32ebd4 [improve](show alter) add more infos to 'show alter' result for schema change job (#16843) 2023-02-20 11:59:06 +08:00
ef2fdb79bb [Improvement](parquet-reader) Optimize and refactor parquet reader to improve performance. (#16818)
Optimize and refactor parquet reader to improve performance.
- Improve 2x performance for small dict string by aligned copying.
- Refactor code to decrease condition(if) checking.
- Don't call skip(0).
- Don't read page index if no condition.

**ssb-flat-100**: (single-machine, single-thread)
| Query        | before opt           | after opt  |
| ------------- |:-------------:| ---------:|
| SELECT count(lo_revenue) FROM lineorder_flat       | 9.23   | 9.12 |
| SELECT count(lo_linenumber) FROM lineorder_flat | 4.50    | 4.36 |
| SELECT count(c_name) FROM lineorder_flat             | 18.22 | 17.88| 
| **SELECT count(lo_shipmode) FROM lineorder_flat**     |**10.09** | **6.15**|
2023-02-20 11:42:29 +08:00
Pxl
2bc014d83a [Enchancement](function) remove unused params on aggregate function (#16886)
remove unused params on aggregate function
2023-02-20 11:08:45 +08:00
46d5cca661 [fix](merge-on-write) The delete bitmap of the currently imported rowset is not persistent (#16859) 2023-02-20 11:02:41 +08:00
b7d2bec8ea [fix](merge-on-write) add check for segment num (#14032) 2023-02-20 11:01:34 +08:00
e958b13747 [Exec] Add conjection for union_node. (#16777) 2023-02-20 10:48:58 +08:00
97230a54fb [Refactor](auth)(step-2) Add AccessController to support customized authorization (#16802)
Support specifying AccessControllerFactory when creating catalog

create catalog hive properties(
...
"access_controller.class" = "org.apache.doris.mysql.privilege.RangerAccessControllerFactory",
"access_controller.properties.prop1" = "xxx",
"access_controller.properties.prop2" = "yyy",
...
)
So that user can specified their own access controller, such as RangerAccessController

Add interface to check column level privilege

A new method of CatalogAccessController: checkColsPriv(),
for checking column level privileges.

TODO:
Support grant column level privileges statements in Doris

Add TestExternalCatalog/Database/Table/ScanNode

These classes are used for FE unit test. In unit test you can

create catalog test1 properties(
    "type" = "test"
    "catalog_provider.class" = "org.apache.doris.datasource.ColumnPrivTest$MockedCatalogProvider"
    "access_controller.class" = "org.apache.doris.mysql.privilege.TestAccessControllerFactory",
    "access_controller.properties.key1" = "val1",
    "access_controller.properties.key2" = "val2"
);
To create a test catalog, and specify catalog_provider to mock database/table/schema metadata

Set roles in current user identity in connection context

The roles can be used for authorization in access controller.
2023-02-20 10:32:48 +08:00
5291f14aff [vectorized](udf) java udf support array type (#16841) 2023-02-20 10:00:25 +08:00
2074b83c67 [enhancement](third-party) Upgrade JEMalloc version from 5.2.1 to 5.3.0 (#14871)
https://github.com/jemalloc/jemalloc/releases
2023-02-20 00:00:40 +08:00
58c51086ca [bugfix](topn) fix topn read_orderby_key_columns nullptr (#16896)
The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5`
make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`,
triggered by skipping _colname_to_value_range init in #16818 .

This PR makes two changes:
1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param
2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump
2023-02-19 23:28:33 +08:00
1c6c28b8fb [Enhance](ComputeNode) K8sDeployManager support domain (#16897)
Describe your changes.
1.DeployManager adds the ability to obtain domain names from third-party systems

2.When the DeployManager determines whether the node exists, add the domain name judgment logic

3.rename Backend.getHost() to getIp() 

4.Delete the logic for handling UnknownHostException in FQDNManager, because there are two cases of UnknownHostException. If it occurs temporarily, it can wait for the next detection. If the node is deleted, the logic can be handed over to DeployManager for processing.
2023-02-19 21:30:18 +08:00
cd3dbc33c9 [deps](be) update libhdfs3 and jemalloc (#16894)
- Modified: libhdfs3 2.3.7 -> 2.3.8
- Modified: jemalloc 5.2.1 -> 5.3.0  (#14871)
2023-02-19 19:49:27 +08:00
73f7979b73 [fix](struct-type) forbid struct-type to be distributed key/aggregation key and add more tests (#16626)
This commits forbid struct and map type to be distributed key/aggregation key.

The sql such as:

select distinct stuct_col from struct_table

will report an error.
2023-02-19 15:16:36 +08:00
8b70bfdc31 [Feature](map-type) Support stream load and fix some bugs for map type (#16776)
1、support stream load with json, csv format for map
2、fix olap convertor when compaction action in map column which has null
3、support select outToFile for map
4、add some regression-test
2023-02-19 15:11:54 +08:00
96a3c60d3b [feature-wip](MTMV) Support alter statement (#16817)
Steps:
1. drop the old MTMV jobs
2. clear the old task records and clean the running and pending tasks
3. set the new scheduler info in MTMV and replay it in followers.
4. create a job in the master node.

Note that if you change the refresh info of MTMV, the old MTMV tasks will be cleaned.
2023-02-19 12:15:17 +08:00
d4cebb39ba [fix](Nereids): fix SemiJoinLogicalJoinTransposeProject. (#16883) 2023-02-18 23:12:34 +08:00
e2e6a0dd83 [Feature](load) Support mutable property for partition (#16036)
The background is described in this issue: #15723,
where users used Apache Druid to satisfy such lambada requirements before.
We will not make Doris dropping data not belonged to current time window automatically like Druid,
which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way:

1. Support mutable property for a partition.
2. The mutable property of a partition is passed from FE to BE in a load procedure
3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio',
   so that data write to immutable partition will be neglected and not cause load failure.

Use Example:

1. Add immutable partition or modify an partition to be immutable:
- alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true');
- alter table test_tbl modify partition xx set ('mutable' = 'false');

2. Write 5 records into table, two of then belongs to immutable partition
2023-02-18 23:09:34 +08:00
1ac5b23e40 Update doris-join-optimization.md (#15818)
修改文档错误
2023-02-18 22:24:51 +08:00
d6a841409f [Enhancement](func)Introduce non_nullable extraction function. #16621
Introduced a new function non_nullable to BE, which can extract concrete data column from a nullable column. If the input argument is already not a nullable column, raise an error.
2023-02-18 20:44:07 +08:00
45427b86be [regression](struct-type) add more regression tests for struct and map type (#16790)
This commit forbid struct and map column in Materialized view and add more regression tests.
2023-02-18 20:42:17 +08:00
45dbd4d872 [fix](dbt)fix dbt incremental #16840
fix dbt incremental :new ideas for no rollback and support incremental data rerun .
add snapshot
use 'mysql-connector-python' mysql driver to replace 'MysqlDb' driver
2023-02-18 20:40:56 +08:00
861e4bc64a [fix](planner) Nullable of slot descriptor is mistaken and cause BE crash #16862 2023-02-18 20:39:56 +08:00
4bf778c6cd [typo](docs)fix dynamic Table version label (#16895) 2023-02-18 20:39:14 +08:00
a4e42b1e94 [improvement](pipeline) Added compatible code synchronization delay issues with failures and updates needed to trigger the pipeline (#16902) 2023-02-18 20:26:23 +08:00
2d7d8102c7 [fix](doc) fix mal-format doc #16898
We must write sql reference with guidance:
https://doris.apache.org/zh-CN/community/how-to-contribute/contribute-doc/#%E5%A6%82%E4%BD%95%E7%BC%96%E5%86%99%E5%91%BD%E4%BB%A4%E5%B8%AE%E5%8A%A9%E6%89%8B%E5%86%8C
2023-02-18 14:30:54 +08:00
070f42c463 [Enhancement](Es): Support config like whether push down to es (#16800)
Support config like whether push down to es and refactor some code
Like transform to wildcard query and push down to es, this increases the cpu consumption of the es,
I add a switch control it.
2023-02-17 21:56:11 +08:00
d5c393f413 [docs](docs)Fix FE config max_running_txn_num_per_db default value (#16877) 2023-02-17 20:55:52 +08:00
90ae8dcf01 [typo](docs)supplement the document content (#16884)
* [typo](docs)supplement the document content

* Update grouping.md

Add space before and after English letters in CN docs and keep the English case consistent.

* Update grouping.md

Change the Chinese title to English
2023-02-17 20:55:34 +08:00
adc42600b4 [typo](docs)Modify some document label errors (#16866)
* [typo](docs)Modify some document label errors

* fix
2023-02-17 20:55:17 +08:00
fda4afecf5 [RegressionTest](Pipeline) Fix pipeline failed in regression test (#16880)
regression-test/suites/inverted_index_p0/test_add_drop_index_with_data.groovy
2023-02-17 20:49:17 +08:00
ea0e090a77 collect_set function documentation added 1.2 label (#16868) 2023-02-17 19:05:44 +08:00
fd5d7d6097 [refactor](Nereids) remove local sort (#16819)
After adding phase in sort, the locatSort is no longer needed
change the order of sortPhase in constructor
2023-02-17 18:52:41 +08:00
9b94729c87 Revert "[test](pipeline) Run nereids cases in p1/p2 (#16130)" (#16792)
This reverts commit b480db2e119ac0516e8621ea3d53c40f250c1d24.
2023-02-17 18:48:27 +08:00
6a1e3d3435 [fix](cooldown)Fix bug for single cooldown compaction, add remote meta (#16812)
* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction
2023-02-17 15:13:06 +08:00