Commit Graph

5755 Commits

Author SHA1 Message Date
39cf393874 [fix](stats) Fix potential NPE when loading Histogram (#19078)
Return Histogram.UNKNOWN as default when error occurred during loding
2023-04-26 14:24:01 +08:00
d3a0b94602 [feature](stats) Support to kill analyze #18901
1. Report error if submit analyze jobs when stats table is not available
2. Support kill analyze
3. Support cancel sync analyze
2023-04-26 14:23:44 +08:00
50d9f35f63 [fix](planner) NPE when use ctas to create table (#18973)
This is caused by expr in orderbyelements is not analyzed.
2023-04-26 14:12:28 +08:00
7a786c3b09 [fix](Nerieds) fix bucket shuffle plan and cost model bugs and add new function add_months (#18836)
fix
1. fix varchar(1) compare to varchar(2) bug
2. fix bucket shuffle join's cost model bug

feature:
1. support add_months function
2023-04-26 13:52:44 +08:00
270be55c4c [feat](stats) Add option to config file to enable or disable analyze function (#19062)
Add this option in conf:

    /**
     * If set false, user couldn't submit analyze SQL and FE won't allocate any related resources.
     */
    @ConfField
    public static boolean enable_stats = true;

It will be checked during analyze of analyze related stmt and init analyze manager
2023-04-26 13:37:08 +08:00
aa88083c1e [fix](Nereids) dead loop in FillUpMissingSlots (#18902)
FillUpMissingSlots don't handle some cornel case, sometime we don't need fillup, we should return null
2023-04-26 13:31:51 +08:00
a7773d16d6 [fix](Nereids): UT shouldn't contains slotId (#19082) 2023-04-26 13:23:21 +08:00
1c8b70a48c [refactor](config) Do not let set enable_vectorized_engine throw an error (#19002)
* update

* Update fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java

Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>

---------

Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2023-04-26 12:03:32 +08:00
8864266a42 [fix](Jdbc Catalog) fix Druid Pool parameter and set testWhileIdle = true (#19049)
Set `testWhileIdle` for the druid pool to true
2023-04-26 11:44:45 +08:00
d037938a4c [vectorzied](function) fix year_floor get result is incorrectly (#19006) 2023-04-26 11:39:22 +08:00
c993964a88 [Bug](delete) fix the delete ignore char case (#18714) 2023-04-26 07:30:44 +08:00
8ea69ca11c [refactor](nereids) do not use in_filter in pipeline mode (#19028)
1. in pipeline in_or_bloom filter replaced by bloom filter
2. do not set broadcast row limit
2023-04-25 19:02:12 +08:00
61b7a52444 [Enhancement](multi-catalogs) Use decimal V3 type in multi-catalogs module. (#18926)
1. Use decimal V3 type in JDBC and Iceberg tables.
2. Fix hdfs TVF decimal V3 type and regression test.
2023-04-25 14:49:40 +08:00
a4a85f2476 [feat](stats) Return job id for async analyze stmt (#18800)
1. Return job id from async analysis
2. Sync analysis jobs don't save to analysis_jobs anymore
2023-04-25 14:43:54 +08:00
39d66ca2c6 [fix](parquet) hasn't initialize select vector when number of nested values equals zero (#18953)
Fix bug when reading array type in parquet file:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed,
reason = [IO_ERROR]Decode too many values in current page
```
When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong.
The situation where this error occurs is particularly extreme: The column pages have remaining values to be read,
but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.
2023-04-25 14:21:33 +08:00
a836a6a4fe [refactor](multi catalog)Refactor FileQueryScanNode init and finalize mothods(#18954)
Refactor FileQueryScanNode init and finalize methods.
Handle schema related initialization in init method, handle scan range generation in finalize method.
2023-04-25 11:18:21 +08:00
228cc90e4e [fix](session-var) ignore exception when setting global only var in non master FE (#18949)
Introduced from #18609.

When setting global variables from Non Master FE, there will be error like:

`Variable 'password_history' is a GLOBAL variable and should be set with SET GLOBAL`

Because when setting global variables from Non Master FE, Doris will do following step:

1. forward this SetStmt to Master FE to execute.
2. Change this SetStmt to "SESSION" level, and execute it again on this Non Master FE.

But for "GLOBAL only" variable, such ash "password_history", it doesn't allow to set on SESSION level.
So when doing step 2, "set password_history=xxx" without "GLOBAL" keywords will throw exception.
So in this case, we should just ignore this exception and return.
2023-04-25 11:05:09 +08:00
e2afa07271 [fix](nereids) disable_join_reorder does not work with semi/anti #18898
semi/anti push rules should not work if disable_join_reorder = true;
2023-04-25 10:57:40 +08:00
fd4576e420 [Fix](auth) fix some problem of skip_localhost_auth_check in FE config #18996 2023-04-25 09:10:01 +08:00
efebb3d21e [fix](schema) fix show create table get wrong random distribution info (#18895)
* [fix](schema) fix show create table get wrong random distribution info


---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-24 23:33:42 +08:00
54d58364c1 [fix](Nereids): move SimplifyAggGroupBy before NormalizeAggregate. (#18918) 2023-04-24 19:00:27 +08:00
17e206c538 [Feature](resource-group) Support drop resource group (#18873) 2023-04-24 14:00:00 +08:00
6bf51150f3 [fix](nereids) remove unnecessary project above scan node (#18920)
1. remove unnecessary project node above scan node.
2. fix in subquery may be recognized as scalar subquery bug
3. fix some Quantile related functions' return type bug
2023-04-24 13:58:57 +08:00
d368326cc2 [fix](Nereids) should not fallback to legacy planner when execution failed (#18847) 2023-04-24 13:29:29 +08:00
22cdfc5970 [refactor](fs)(step1) add new storage file system (#18938)
PR1: add new storage file system template and move old storage to new package
PR2: extract some method in old storage to new file system.
PR3: use storages to access remote object storage, and use file systems to access file in local or remote location. Will add some unit tests.

---------

Co-authored-by: jinzhe <jinzhe@selectdb.com>
2023-04-24 11:41:48 +08:00
296b0c92f7 [Enhancement](compaction) stop tablet compaction when table dropped (#18702)
* [Enhancement](compaction) stop tablet compaction when table dropped

* fix be ut
2023-04-24 11:04:27 +08:00
8e4710079d [improvement](profile) Insert into add LoadChannel runtime profile (#18908)
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
2023-04-24 09:41:57 +08:00
d2f50ce3f5 [Fix](HttpServer) Chinese garbled characters appear when obtaining query plan (#18820)
When obtaining the query plan, the Chinese garbled characters in the predicate lead to incorrect data results.
2023-04-24 08:49:44 +08:00
2d7903e2bd [Feature](multi-catalog) support query hive views. (#18815)
A very simple implementation to query hive views, it is an EXPERIMENTAL feature.
We can try to parse the ddl of hive views and try to execute the query relies on the fact that HiveQL
is very similar to Doris SQL. But if the ddl of hive views use some complicated or incompatible grammar,
the query might fail.
2023-04-24 08:49:26 +08:00
1e7ef35741 [fix](Nereids) two phase read for topn only support simple case (#18955)
1. topn must has merge node
2. topn must the top node of plan
2023-04-23 21:32:23 +08:00
166bed11d4 [Enchancement](auth) Forbid to login doris from 127.0.0.1 without password (#18816)
* forbid to login from 127.0.0.1 without password

* add localhost limit

* rename
2023-04-23 13:56:31 +08:00
fd905b66b0 [refactor](jdbc) close datasource if no need to maintain the cache (#18724)
after pr #18670
could use jvm parameters to init jdbc datasource,
but when set JDBC_MIN_POOL=0, it can be immediately closed.
There is no need to wait for the recycling timer.
2023-04-22 22:07:34 +08:00
814f12981d [feat](Nereids): validate Project list. (#18868) 2023-04-22 12:32:51 +08:00
13894ae790 [fix](jdbc catalog) Use default value if the user does not set the pool parameter in be.conf #18919 2023-04-22 08:39:26 +08:00
b75f4c97f3 [function](string) support char function (#18878)
* [function](string) support char function

* fix
2023-04-22 08:36:48 +08:00
313fab0802 [fix](mtmv) fix mtmv thread interruption issue (#18884) 2023-04-21 22:27:13 +08:00
f7651d8dfb (fix)[olap] not support in_memory=true now (#18731)
* (fix)[olap] can not set in_memory=true now

---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-21 21:55:37 +08:00
0ae3a6df7e [bug](bdbje) Add retry for reSetupBdbEnvironment() restore.execute() (#18777)
* In reSetupBdbEnvironment() `restore.execute()` may throw NullPointerException,
  add retry for `restore.execute()`
2023-04-21 20:58:42 +08:00
317d9ee152 [feat](Nereids): Simplify Agg GroupBy (#18887) 2023-04-21 18:57:15 +08:00
af20b2c95e [Bug](topn opt) Fix be crash when enable topn opt with larger thresho… (#18858)
topn opt should be inited when update it
2023-04-21 17:45:00 +08:00
c72a46f3df [Improvement](bitmap-filter) enable bitmap runtime filter in fuzzy mode. (#17621) 2023-04-21 16:00:13 +08:00
ec1ab1a3d2 [Improve](GEO)wkb input and output are represented as hexadecimal strings And delete EWKB (#18721) 2023-04-21 15:11:18 +08:00
3007cd49f2 [enhancement](mysql) enable two-way ssl authentication (#18530)
According to the mysql-ssl, enable two-way SSL authentication.
2023-04-21 14:39:14 +08:00
c41b486e7e [fix](nereids) LogicalProject should always has non-empty project list (#18863) 2023-04-21 14:28:07 +08:00
0c26f8df4d [refactor](Nereids): move out misunderstanding func from JoinUtils (#18865) 2023-04-21 14:11:03 +08:00
063dfefd80 [fix](planner) Failed to create table with CTAS when multiple varchar type filed as key (#18814)
Add restricton for converting varchar/char to string type, only fields that is string type and not in key desc could be convert to string type now.
2023-04-21 13:33:35 +08:00
1a6401d682 [enchancement](statistics) support sampling collection of statistics (#18880)
1. Supports sampling to collect statistics
2. Improved syntax for collecting statistics
3. Support histogram specifies the number of buckets
4. Tweaked some code structure

---

The syntax supports WITH and PROPERTIES, using the same syntax as before.

Column Statistics Collection Syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
     [ (column_name [, ...]) ]
     [ [WITH SYNC] | [WITH INCREMENTAL] | [WITH SAMPLE PERCENT | ROWS ] ]
     [ PROPERTIES ('key' = 'value', ...) ];
```

Column histogram collection syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
     [ (column_name [, ...]) ]
     UPDATE HISTOGRAM
     [ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] ]
     [ PROPERTIES ('key' = 'value', ...) ];
```

Illustrate:
- sync:Collect statistics synchronously. Return after collecting.
- incremental:Collect statistics incrementally. Incremental collection of histogram statistics is not supported.
- sample percent | rows:Collect statistics by sampling. Scale and number of rows can be sampled.
- buckets:Specifies the maximum number of buckets generated when collecting histogram statistics.
- table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`.
- column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas.
- properties:Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement)
   - 'sync' = 'true'
   - 'incremental' = 'true'
   - 'sample.percent' = '50'
   - 'sample.rows' = '1000'
   - 'num.buckets' = 10

--- 

TODO: 
- Supplement the complete p0 test
- `Incremental` statistics see #18653
2023-04-21 13:11:43 +08:00
ae76b59f2f [fix](external table) Use FederationBackendPolicy in Coordinator for ExternalScanNode #18860 2023-04-21 12:35:45 +08:00
b84bd156fb [enhancement](Nereids) two phase read for topn (#18829)
add two phase read topn opt, the legacy planner's PR are:
- #15642
- #16460
- #16848

TODO:
we forbid limit(sort(project(scan))) since be core when plan has a project on the scan.
we need to remove this restirction after we fix be bug
2023-04-21 12:05:22 +08:00
c6b1b9de80 [Improvement](broker) support broker load from tencent Goose File System (#18745)
Including below functions:
1. broker load
2. export
3. select into outfile
4. create repo and backup to gfs
after config env, use gfs like other hdfs system.
2023-04-20 23:12:17 +08:00