Commit Graph

210 Commits

Author SHA1 Message Date
0205f540ac [enhancement](config) Enlarge broker scanner bytes conf to 500G, 5G is still not enough (#22126) 2023-07-24 19:49:39 +08:00
22aa54e335 [enhancement](config) enlarge max_bytes_per_broker_scanner to 5G #22099 2023-07-23 12:00:32 +08:00
3d0f952934 [FIX](complex-type)delete enable_map/struct_type switch #21957 2023-07-22 15:29:32 +08:00
85cc044aaa [feature](create-table) support setting replication num for creating table opertaion globally (#21848)
Add a new FE config `force_olap_table_replication_num`.
If this config is larger than 0, when doing creating table operation, the replication num of table will
forcibly be this value.
Default is 0, which make no effect.
This config will only effect the creating olap table operation, other operation such as `add partition`,
`modify table properties` will not be effect.

The motivation of this config is that the most regression test cases are creating table will single replica,
this will be the regression test running well in p0, p1 pipeline.
But we also need to run these cases in multi backend Doris cluster, so we need test cases will multi replicas.
But it is hard to modify each test cases. So I add this config, so that we can simply set it to create all tables with
specified replication number.
2023-07-21 19:36:04 +08:00
367ad9164a [feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917) 2023-07-20 18:03:39 +08:00
0f116ce148 Revert "[Enhancement](Nereids)enable nereids DML by default. (#21539)" (#22013)
This reverts commit f668b3965effbd5df4902f20b496cb6b6642414c.
2023-07-20 11:32:54 +08:00
f668b3965e [Enhancement](Nereids)enable nereids DML by default. (#21539)
TODO: fix cast agg_state type when do insert
2023-07-19 13:52:15 +08:00
d349c955f0 [fix](nereids) Disable auto analyze temporarily #21919 2023-07-19 09:27:24 +08:00
beec0e9169 [Improvement](tablet clone) impr tablet sched speed and fix tablet sched failed too many times (#21856) 2023-07-18 23:25:22 +08:00
05cf095506 [feature](stats) Support full auto analyze (#21192)
1. Auto analyze all tables except for internal tables
2. make resource used by analyze configurable
2023-07-17 20:42:57 +08:00
352a0c2e17 [Improvement](multi catalog)Cache file system to improve list remote files performance (#21700)
Use file system type and Conf as key to cache remote file system.
This could avoid get a new file system for each external table partition's location.
The time cost for fetching 100000 partitions with 1 file for each partition is reduced to 22s from about 15 minutes.
2023-07-14 09:59:46 +08:00
8ffa21a157 [fix](config) set FE header size limit to 1MB from 10k (#21719)
Enlarge jetty_server_max_http_header_size to avoid Request Header Fields
Too Large error when streamloading to FE.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-07-11 19:52:14 +08:00
5a15967b65 [fix](sparkdpp) Change spark dpp default version to 1.2-SNAPSHOT (#21698) 2023-07-11 10:49:53 +08:00
8973610543 [feature](datetime) "timediff" supports calculating microseconds (#21371) 2023-07-10 19:21:32 +08:00
f2fb23e98f [pipeline](exec) disable pipeline load in now version (#21632) 2023-07-09 01:00:06 +08:00
53c10a2389 (chore) Disable ssl connection to FE by default for compatibility reason (#20230)
Older MySQL client (< 5.7.28) will try to connect to server with tls1.1,
which is insecure and is not supported by Doris FE. The connection will
fail.

We disable ssl connection support on Doris FE to keep the users' application
unaffected. To enable ssl support explicitly, just put
the following to fe.conf
```
enable_ssl = true
```
2023-07-07 12:24:55 +08:00
9bcf79178e [Improvement](statistics, multi catalog)Support iceberg table stats collection (#21481)
Fetch iceberg table stats automatically while querying a table.
Collect accurate statistics for Iceberg table by running analyze sql in Doris (remove collect by meta option).
2023-07-07 09:18:37 +08:00
bb3b6770b5 [Enhancement](multi-catalog) Make meta cache batch loading concurrently. (#21471)
I will enhance performance about querying meta cache of hms tables by 2 steps:
**Step1** : use concurrent batch loading for meta cache
**Step2** : execute some other tasks concurrently as soon as possible

**This pr mainly for step1 and it mainly do the following things:**
- Create a `CacheBulkLoader` for batch loading
- Remove the executor of the previous async cache loader and change the loader's type to `CacheBulkLoader` (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful)
- Use a `FixedCacheThreadPool` to replace the `CacheThreadPool` (The previous `CacheThreadPool` just log warn infos and will not throw any exceptions when the pool is full).
- Remove parallel streams and use the `CacheBulkLoader` to do batch loadings
- Change the value of `max_external_cache_loader_thread_pool_size` to 64, and set the pool size of hms client pool to `max_external_cache_loader_thread_pool_size`
- Fix the spelling mistake for `max_hive_table_catch_num`
2023-07-06 15:18:30 +08:00
37a52789bd [improvement](statistics, multi catalog)Estimate hive table row count based on file size. (#21207)
Support estimate table row count based on file size.

With sample size=3000 (total partition number is 87491), load cache time is 45s.
With sample size=100000 (more than total partition number 87505), load cache time is 388s.
2023-07-05 16:07:12 +08:00
76bdcf1d26 [improvement](pipeline) task group scan entity (#19924) 2023-06-25 14:43:35 +08:00
ca8f51602b [Improvement](multi catalog, statistics)Support two level external statistics cache loader (#20906)
The current column statistic cache loader is to load data from column_statistics olap table.
This pr is to change the cache loader logic to First load from column_statistics olap table, if no data was loaded, then load from table metadata. This is mainly to support fetch statistics data for external catalog using HMS or Iceberg api.
This is the first PR, next pr will implement the fetch logic for different external catalogs.
2023-06-20 16:43:18 +08:00
87258a13c4 [enhancement](nereids) Remove useless config option #20905
1. Remove useless config option
2. Fix timeout cancel, before this PR an OlapAnalysisTask would continue running even if it's already timeout.
2023-06-20 10:37:46 +08:00
1cc611a913 [fix](match) fix regression case test_index_match_select and test_index_match_phrase (#20860)
1. add more checks for match expression in nereids:
  - match expression only support in filter
  - match expression left child and right child must all be string type
  - left child for match expression must be sloftRef, right child for match expression must be Literal

2. to fix regression case test_index_match_select and test_index_match_phrase
2023-06-16 20:18:29 +08:00
c3b9e99350 [fix](regress-test)update config for disable_nested_complex_type (#20735) 2023-06-16 15:51:41 +08:00
bcf103e993 [enhancement](log4j) support high performance mode for log4j to escape potential bottleneck for doris read and write (#20759)
As we know, log4j2 some times may be bottleneck in doris fe when there are many logs to be output in sync mode while asynchronous logging has a better performance, and we find that capturing caller location has a similar impact across all logging libraries, and slows down asynchronous logging by about 30-100x. so, here we provide three log mode for log4j2 to meet the needs of different users.
refer to https://logging.apache.org/log4j/2.x/performance.html
2023-06-14 15:16:04 +08:00
20ac940711 [Bug](pipeline) fix bug for file scan node on pipeline engine (#20763) 2023-06-14 12:52:56 +08:00
54a7dbeb4d [Refactor](External) Move Common ODBC Methods to JDBC Class and Add Default config to Disable ODBC Creation (#20566)
This PR addresses the refactoring of common methods that were originally located within the ODBC classes, but were used by the JDBC classes. These methods have now been moved to the JDBC classes to improve code readability and maintainability.

In addition, we have disabled the creation of ODBC external tables by default. However, this will not affect the existing usage of ODBC. You can still enable the ODBC external tables through the enable_odbc_table setting. Please be aware that we plan to completely remove the ODBC external tables in future versions, so we recommend using the JDBC Catalog as a priority.
2023-06-13 14:29:04 +08:00
bcc37c9405 [fix](planner)the common type of floating and decimal should be floating type (#20634)
* [fix](planner)the common type of floating and decimal should be floating type

* fix test cases
2023-06-12 11:32:23 +08:00
87bc405c41 [Improvement](statistics)Support external table partition statistics (#20415)
Support collect statistics for HMS external table with specific partitions. Add session variables to limit the partitions to collect for whole table line number and columns statistics.
2023-06-10 12:28:53 +08:00
079fb0e56d [improvement](config)update FE config max_running_txn_num_per_db default value (#20478)
image update FE config max_running_txn_num_per_db default value: old value : 100 new value : 1000
2023-06-09 08:54:37 +08:00
325ddab34e [conf](pipeline) turn pipeline on by default (#20458) 2023-06-08 09:20:51 +08:00
c910e9b78b [doc](disk)fix disk capacity doc error (#20506) 2023-06-07 15:20:04 +08:00
d02737a293 [feature](struct-type) support struct_element function (#19045)
This commit support a function allows return a field column in named struct column.
Since the function can return any type, this commit also supports ANY_STRUCT_TYPE
and ANY_ELEMENT_TYPE.
2023-06-06 10:44:08 +08:00
59a0f80233 [Improve](array-function)Improve array function intersect (#20085)
now we just support array function with 2 arrays , but intersect operator can support more than 2 arrays
2023-06-05 10:38:48 +08:00
Pxl
8e39f0cf6b [Enchancement](Agg State) storage function name and result is nullable in agg state type (#20298)
storage function name and result is nullable in agg state type
2023-06-04 22:44:48 +08:00
422fcd6377 [fix](Nereids) forbid unexpected expression on filter and fix two more bugs (#20331)
fix below bugs:
1. not check filter's expression, aggregate function, grouping scalar function and window expression should not appear in filter
2. show not change nullable of aggregate function when it is window function in window expression
3. bitmap and other metric types should not appear in order by or partition by of window expression
2023-06-02 16:19:50 +08:00
e32eba8fdf [refactor](stats) Persist status of analyze task to FE meta data (#20264)
1. In the past, we use a BE table named `analysis_jobs` to persist the status of analyze jobs/tasks, however there are many flaws such as, if BE crashed analyze job/task would failed however the status of analyze job/task couldn't get updated.
2. Support `DROP ANALYZE JOB [job_id]` to delete analyze job
3. Support `SHOW ANALYZE TASK STATUS [job_id] ` to  get the task status of specific job
4. Restrict the execute condition of auto analyze, only when  the  last execution of auto analyze job finished a while ago could be executed again
5. Support analyze whole DB
2023-06-02 12:33:31 +08:00
5a3b97bbf2 [enhancement](struct-type)support comment for struct field (#20200)
support comment for struct field
2023-06-02 10:29:56 +08:00
ecdc5124be [feature-wip](duplicate-no-keys) schame change support for duplicate no keys (#19326) 2023-06-02 09:22:41 +08:00
519f01133a [feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811) 2023-06-01 13:09:58 +08:00
f9dfcb923d [Enhancement] Change Create Resource Group Grammar (#20249) 2023-05-31 15:23:24 +08:00
6f68ec9de0 support query queue (#20048)
support query queue (#20048)
2023-05-30 19:52:27 +08:00
0c98355fff [fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137)
1. Fix create catalog with resource replay bug.
	If user create catalog using `create catalog hive with resource xxx`, when replaying edit log,
	there is a bug that resource may be dropped, causing NPE and FE will fail to start.

	In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true.
	So that `with resource` will not be allowed, and it will be deprecated later.

	And also fix the replay bug to avoid NPE.

2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication.

	When user create 2 hive catalogs, one use simple auth, the other use kerberos auth.
	The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.`

	So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`.
	Which means this property will be added automatically when user creating hive catalog, to avoid such problem.

3. Fix calling `hdfsExists()` issue

	When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found.

3. Some code refactor

	Avoid import `org.apache.parquet.Strings`
2023-05-30 16:57:39 +08:00
d76be1315f [BUG]storage_min_left_capacity_bytes default value has integer overflow #19943 2023-05-29 19:50:31 +08:00
cc47ee480c [feat](stats) delete data size stat and Made task timeout configurable (#20090)
1. Delete the stats for data size, since it would cost too much time but useless
2. Make task time out configurable since when it's common to analyze a quite huge table that the default 10 min is not suitable
2023-05-29 16:40:59 +08:00
55ccddb62c [Conf](decimalv3) enable decimalv3 by default 2023-05-29 15:38:31 +08:00
5f9c6e076f [Fix](load)Make insert timeout accurate in show load statistics (#20068) 2023-05-28 21:19:06 +08:00
ae352997b4 [Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063) 2023-05-28 11:23:07 +08:00
9539bbf8ae Revert "[test](executor)add crud regression test for resource group (#19659)" (#20121)
This reverts commit 8b9813663d87afa7b359b31782f3864dc54881df.
2023-05-27 08:25:00 +08:00
93933308e6 [Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881) 2023-05-26 23:40:49 +08:00