Commit Graph

248 Commits

Author SHA1 Message Date
45414db1ba [enhancement](table-meta) flush column unique ids for tables before 1.2 automatically (#23616) 2023-09-02 14:56:48 +08:00
e680d42fe7 [feature](information_schema)add metadata_name_ids for quickly get catlogs,db,table and add profiling table in order to Compatible with mysql (#22702)
add information_schema.metadata_name_idsfor quickly get catlogs,db,table.

1. table  struct :   
```mysql
mysql> desc  internal.information_schema.metadata_name_ids;
+---------------+--------------+------+-------+---------+-------+
| Field         | Type         | Null | Key   | Default | Extra |
+---------------+--------------+------+-------+---------+-------+
| CATALOG_ID    | BIGINT       | Yes  | false | NULL    |       |
| CATALOG_NAME  | VARCHAR(512) | Yes  | false | NULL    |       |
| DATABASE_ID   | BIGINT       | Yes  | false | NULL    |       |
| DATABASE_NAME | VARCHAR(64)  | Yes  | false | NULL    |       |
| TABLE_ID      | BIGINT       | Yes  | false | NULL    |       |
| TABLE_NAME    | VARCHAR(64)  | Yes  | false | NULL    |       |
+---------------+--------------+------+-------+---------+-------+
6 rows in set (0.00 sec) 


mysql> select * from internal.information_schema.metadata_name_ids where CATALOG_NAME="hive1" limit 1 \G;
*************************** 1. row ***************************
   CATALOG_ID: 113008
 CATALOG_NAME: hive1
  DATABASE_ID: 113042
DATABASE_NAME: ssb1_parquet
     TABLE_ID: 114009
   TABLE_NAME: dates
1 row in set (0.07 sec)
```

2. when you create / drop catalog , need not refresh catalog . 
```mysql
mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 21301
1 row in set (0.34 sec)


mysql> drop catalog hive2;
Query OK, 0 rows affected (0.01 sec)

mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 10665
1 row in set (0.04 sec) 


mysql> create catalog hive3 ... 
mysql> select count(*) from internal.information_schema.metadata_name_ids\G;                                                                        
*************************** 1. row ***************************
count(*): 21301
1 row in set (0.32 sec)
```

3. create / drop table , need not refresh catalog .  
```mysql
mysql> CREATE TABLE IF NOT EXISTS demo.example_tbl ... ;


mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 10666
1 row in set (0.04 sec)

mysql> drop table demo.example_tbl;
Query OK, 0 rows affected (0.01 sec)

mysql> select count(*) from internal.information_schema.metadata_name_ids\G; 
*************************** 1. row ***************************
count(*): 10665
1 row in set (0.04 sec) 

```

4. you can set query time , prevent queries from taking too long . 
```

fe.conf :  query_metadata_name_ids_timeout 

the time used to obtain all tables in one database

```
5. add information_schema.profiling in order to Compatible with  mysql

```mysql
mysql> select * from information_schema.profiling;
Empty set (0.07 sec)

mysql> set profiling=1;                                                                                 
Query OK, 0 rows affected (0.01 sec)
```
2023-08-31 21:22:26 +08:00
ca55bd88ad [Fix](Job)Fix the window time is not updated when no job is registered (#23628)
Fix resume job grammar definition is inconsistent
Show Job task Add execution results
JOB allows to define update operations
2023-08-30 09:48:21 +08:00
e02747e976 [feature](Nereids) support struct type (#23597)
1. support struct data type
2. add array / map / struct literal syntax
3. fix array union / intersect / except type coercion
4. fix explict cast data type check for array
5. fix bound function type coercion
2023-08-29 20:41:24 +08:00
6ac694aede [Configuration](multi-catalog) Modify default external cache expire time to 10 mins. (#23490)
Configuration Modify default external cache expire time to 10 mins.
2023-08-28 16:16:43 +08:00
ef2fc44e5c [Improve](Job)Allows modify the configuration of the Job queue and the number of consumer threads (#23547) 2023-08-28 12:01:49 +08:00
7cfb3cc0aa [fix](functions) fix function substitute for datetimeV1/V2 (#23344)
* fix

* function fe
2023-08-25 09:59:38 +08:00
6a4976921d [fix](auth)Disable column auth temporarily (#23295)
- add config `enable_col_auth` to temporarily disable column permissions(because old/new planner has bug when select from view)
- Restore the old optimizer to the previous authentication method
- Support for new optimizer authentication(Legacy issue: When querying the view, the permissions of the base table will be authenticated. The view's own permissions should be authenticated and processed after the new optimizer is improved)
- fix: show grants for non-existent users
- fix: role:`admin` can not grant/revoke to/from user
2023-08-24 23:37:06 +08:00
35d0c9e71e [refactor](nereids) Refactor stats collection framework (#22963)
* remove auto analyze grammer
* refactor ResultRow
2023-08-23 10:05:57 +08:00
a4e041ea55 [improve](alter-job) Add a config for forbiding doing alter job (#23294) 2023-08-22 16:28:36 +08:00
b670dd0db7 [feature](Nereids) support array type (#22851)
FEATURE:
1. enable array type in Nereids
2. support generice on function signature
3. support array and map type in type coercion and type check
4. add element_at and element_slice syntax in Nereids parser

REFACTOR:
1. remove AbstractDataType

BUG FIX:
1. remove FROM from nonReserved keyword list

TODO:
1. support lambda expression
2. use Nereids' way do function type coercion
3. use castIfnotSame when do implict cast on BoundFunction
4. let AnyDataType type coercion do same thing as function type coercion
5. add below array function
- array_apply
- array_concat
- array_filter
- array_sortby
- array_exists
- array_first_index
- array_last_index
- array_count
- array_shuffle shuffle
- array_pushfront
- array_pushback
- array_repeat
- array_zip
- reverse
- concat_ws
- split_by_string
- explode
- bitmap_from_array
- bitmap_to_array
- multi_search_all_positions
- multi_match_any
- tokenize
2023-08-22 09:47:55 +08:00
ae9f04f969 [fix](array) fix typeExtactMatch for array() type (#23264)
if we write sql with : `select cast(array() as array<varchar(10)>)`
castexpr in fe will call analyze() with `Type.matchExactType(childType, type, true);`
here array type only check contains_null , but should check inner type to make array matchExactType right
2023-08-21 19:41:09 +08:00
0967d7ec04 [improvement](agg) Do not serialize bitmap to string (#23172) 2023-08-21 10:10:15 +08:00
10abbd2b62 [Feauture](Export) support parallel export job using Job Schedule (#22854) 2023-08-18 22:24:42 +08:00
1f19d0db3e [improvement](tablet clone) improve tablet balance, scaling speed etc (#22317) 2023-08-17 22:30:49 +08:00
3efa06e63e [Fix](View)varchar type conversion error (#22987) 2023-08-16 11:49:04 +08:00
d7a5c37672 [improvement](tablet clone) update the capacity coeficient for calculating backend load score (#22857)
update the capacity coeficient for calcutating the backend load score:
1. Add fe config entry `backend_load_capacity_coeficient` to allow setting the capacity coeficient manually;
2. Adjust calculating capacity coeficient as below.

We emphasize disk usage for calculating load score. 
If a be has a high used capacity percent, we should increase its load score.
So we increase capacity coefficient with a be's used capacity percent.

But this is not enough. For example, if the tablets have a big difference in data size.
Then for below two BEs, their load score maybe the same:
BE A:  disk usage = 60%,  replica number = 2000  (it contains the big tablets)
BE B:  disk usage = 30%,  replica number = 4000  (it contains the small tablets)

But what we want is: firstly move some big tablets from A to B, after their disk usages are close,
then move some small tablets from B to A, finally both of their disk usages and replica number
are close.

To achieve this, when the max difference between all BE's disk usages >= 30%,  we set the capacity cofficient to 1.0 and avoid the affect of replica num. After the disk usage difference decrease, then decrease the capacity cofficient to make replica num effective.
2023-08-15 17:27:31 +08:00
94a7b44540 [Improvement](log) add config to controll compression of fe log & fe audit log (#22865)
fe log is large for a busy doris cluster, if you want to preserve some historical logs, it cost too much disk space.
enable compression is a good way to save space.
and a gzip compressed text file can be viewed without decompression.
2023-08-11 14:08:08 +08:00
b9b9071c9b [improvement](create partition) create partition require quorum replicas succ (#22554) 2023-08-11 11:59:05 +08:00
8e5b4005dc [enhancement](data type) add use_mysql_bigint_for_largeint config Tell Doris to use bigint when returning largeint type to mysql jdbc (#22835) 2023-08-10 18:53:31 +08:00
f2658dc7bd [Feature](multi-catalog) Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema. (#22318)
Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema by session var `truncate_char_or_varchar_columns`.
2023-08-10 14:37:20 +08:00
77d3d4e324 [fix](cache) add sql cache conf cache_result_max_data_size (#22645)
Only the maximum number of rows in sql cache cache_result_max_row_count is not enough. If a row of data is too large, FE may OOM.
2023-08-09 14:46:23 +08:00
7bfcee6e71 [improvement](variable) add annotations for variables (#22292) 2023-08-08 22:16:42 +08:00
97adbaadb9 fix full auto analyze (#22650) 2023-08-07 11:41:38 +08:00
95aa4d8631 [Feature](Export) Supports concurrently export of table data (#21911) 2023-08-04 18:50:17 +08:00
672acb8784 [fix](show-table-status) fix hive view NPE and external meta cache refresh issue (#22377) 2023-08-04 16:55:10 +08:00
4f9969ce1e [feature](show-frontends-disk) Add Show frontend disks (#22040)
Co-authored-by: yuxianbing <yuxianbing@yy.com>
Co-authored-by: yuxianbing <iloveqaz123>
2023-08-03 14:04:48 +08:00
e670d84b72 [feature](executor) using max_instance_num to limit automatically instance (#22521) 2023-08-03 13:12:32 +08:00
e5028314bc [Feature](Job)Support scheduler job (#21916) 2023-08-02 21:34:43 +08:00
afb6a57aa8 [enhancement](nereids) Improve stats preload performance (#21970) 2023-07-31 17:32:01 +08:00
ad080c691f [chore](log)Move non-user-friendly error message to be.WARNING (#22315)
Move non-user-friendly error message to be.WARNING
2023-07-28 13:15:25 +08:00
e87174dd6b [feature](planner) modify multi partition prefix value (#22098)
modify multi partition prefix value: 'p_'
2023-07-28 10:21:32 +08:00
b51fcbd9c7 [opt](stats) Scale replica of stats table to 3 when it's possible (#22227)
So that we could improve the availability of stats.
2023-07-27 17:36:54 +08:00
31c856351a [enhancement](default_config) change default value of rpc related (#22149)
configs

Bdbje elect timeout is 30 seconds, so we enlarge thrift_rpc_timeout_ms
and txn_commit_rpc_timeout_ms to 60s.

BTW: enlarge bdbje_lock_timeout_second from 1 to 5.
2023-07-27 11:12:26 +08:00
582acad8a1 [feature](stats) Enable period time with cron expr (#22095)
Support such grammar

ANALYZE TABLE test WITH CRON "* * * * * ?"

Such job would be scheduled as the cron expr specifie, but natively support minute-level schedule only
2023-07-26 17:25:57 +08:00
964ac4e601 [opt](nereids) Retry when async analyze task failed (#21889)
Retry at most 5 times when async analyze task execution failed
2023-07-26 17:16:56 +08:00
3b6702a1e3 [Bug](point query) cancel future when meet timeout in PointQueryExec (#21573)
1. cancel future when meet timeout and add config to modify rpc timeout
2. add config to modify numof BackendServiceProxy since under high concurrent work load GRPC channel will be blocked
2023-07-25 18:18:09 +08:00
0f439bb1ca [vectorized](udf) java udf support map type (#22059) 2023-07-25 11:56:20 +08:00
0205f540ac [enhancement](config) Enlarge broker scanner bytes conf to 500G, 5G is still not enough (#22126) 2023-07-24 19:49:39 +08:00
22aa54e335 [enhancement](config) enlarge max_bytes_per_broker_scanner to 5G #22099 2023-07-23 12:00:32 +08:00
3d0f952934 [FIX](complex-type)delete enable_map/struct_type switch #21957 2023-07-22 15:29:32 +08:00
85cc044aaa [feature](create-table) support setting replication num for creating table opertaion globally (#21848)
Add a new FE config `force_olap_table_replication_num`.
If this config is larger than 0, when doing creating table operation, the replication num of table will
forcibly be this value.
Default is 0, which make no effect.
This config will only effect the creating olap table operation, other operation such as `add partition`,
`modify table properties` will not be effect.

The motivation of this config is that the most regression test cases are creating table will single replica,
this will be the regression test running well in p0, p1 pipeline.
But we also need to run these cases in multi backend Doris cluster, so we need test cases will multi replicas.
But it is hard to modify each test cases. So I add this config, so that we can simply set it to create all tables with
specified replication number.
2023-07-21 19:36:04 +08:00
367ad9164a [feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917) 2023-07-20 18:03:39 +08:00
0f116ce148 Revert "[Enhancement](Nereids)enable nereids DML by default. (#21539)" (#22013)
This reverts commit f668b3965effbd5df4902f20b496cb6b6642414c.
2023-07-20 11:32:54 +08:00
f668b3965e [Enhancement](Nereids)enable nereids DML by default. (#21539)
TODO: fix cast agg_state type when do insert
2023-07-19 13:52:15 +08:00
d349c955f0 [fix](nereids) Disable auto analyze temporarily #21919 2023-07-19 09:27:24 +08:00
beec0e9169 [Improvement](tablet clone) impr tablet sched speed and fix tablet sched failed too many times (#21856) 2023-07-18 23:25:22 +08:00
05cf095506 [feature](stats) Support full auto analyze (#21192)
1. Auto analyze all tables except for internal tables
2. make resource used by analyze configurable
2023-07-17 20:42:57 +08:00
352a0c2e17 [Improvement](multi catalog)Cache file system to improve list remote files performance (#21700)
Use file system type and Conf as key to cache remote file system.
This could avoid get a new file system for each external table partition's location.
The time cost for fetching 100000 partitions with 1 file for each partition is reduced to 22s from about 15 minutes.
2023-07-14 09:59:46 +08:00
8ffa21a157 [fix](config) set FE header size limit to 1MB from 10k (#21719)
Enlarge jetty_server_max_http_header_size to avoid Request Header Fields
Too Large error when streamloading to FE.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-07-11 19:52:14 +08:00