96eb363b01
[fix](help-module)fix use regex match replaceAll may cause backtracking ( #24918 )
2023-10-07 21:09:17 -05:00
07f9f27fa9
[improvement](start script) start script can not set http proxy ( #25086 )
...
be clone snapshot using http, if set http proxy, then be clone snapshot will failed. so the start script forbit set env http proxy.
2023-10-08 10:06:06 +08:00
f8e4cefb8c
[typo](doc)Add be's enable_java_support configuration document ( #25069 )
2023-10-07 23:56:14 +08:00
7edc00a78f
[tools](tpc)make tpch-tools and tpcds-tools default scale factor 100 ( #25002 )
...
default sf change to 100G
2023-10-07 23:13:46 +08:00
238c349946
[Fix](replayer) Fix FE crash when replaying analysis logs. ( #25024 )
...
Issue Number: close #25023
The detail of this bug has been described at the above issue. We can check if current FE is a master node to avoid such problems.
2023-10-07 23:06:34 +08:00
0df32c8e3e
[Fix](Outfile) Use data_type_serde to export data to csv file format ( #24721 )
...
Modify the outfile logic, use the data type serde framework.
2023-10-07 22:50:44 +08:00
f3e95608cb
(Fix)(RoutineLoad)Query the transaction status NPE when the task has not yet started scheduling ( #25074 )
2023-10-07 07:26:49 -05:00
b380b8b0b5
[bugfix](multi-catalog) Esexternalcatalog is missing LastUpdateTime. ( #24559 )
2023-10-07 20:21:33 +08:00
26bc749afd
[bugfix](set_var) fix sql level exec_mem_limit does not take effect ( #25043 )
2023-10-07 20:15:25 +08:00
cb03703990
[fix](doc) spelling error for colocate join #25053 ( #25054 )
...
Issue: 25053
Change spell error for Colocate Join.
2023-10-07 19:51:07 +08:00
cb0076e585
[fix](insert) fix group commit be ut ( #24968 )
2023-10-07 19:50:05 +08:00
e5fe4e5b83
[refactor](stats) Refactor TableStatsMeta
...
1. Add a abstraction for column stats status which is required so furthur optimization and feature development
2. Enable analyze test in p0 that disabled unexpectedly before
2023-10-07 19:48:54 +08:00
8953179c11
[fix](multi-table) fix multi table task cannot end ( #25056 )
...
When exec multi table task, it can not end when exec plan error, which causes other routine load task can not submit.
2023-10-07 19:45:42 +08:00
5130a6c006
[improvement](jdbc catalog)Adjustment to JDBC External Table Configuration Based on Internal Table Settings ( #25059 )
...
This pull request addresses the behavior of the `lower_case_table_names` parameter for jdbc catalog's based on the configuration of the internal table's corresponding parameter.
Changes:
- For internal tables, if `lower_case_table_names` is set to 1 or 2, thejdbc catalog's parameter is forcefully set to `true`.
- For internal tables, if `lower_case_table_names` is set to 0, the jdbc catalog's parameter can be either `true` or `false` with a default value of `false`.
These adjustments ensure consistency and predictability when working with both internal and external table configurations in Doris.
2023-10-07 06:25:52 -05:00
9d0f4c0094
[minor](be) set fd number check to 60000 for BE start script ( #25078 )
...
Modify the BE fd number check to 60000,
because the default fd number value of some system is 65535, which is smaller than previous threshold 65536,
so reduce to 60000 to let Doris start normally in most of system.
2023-10-07 19:02:39 +08:00
94eec9be0f
[Monitor](doc)modify incorrect name for Cumulative Compaction Score ( #25082 )
...
Co-authored-by: xingying01 <xingying01@corp.netease.com >
2023-10-07 18:53:13 +08:00
20a7df6ecc
[monitor][doc]modify incorrect status for doris_be_engine_requests_total ( #25075 )
...
Co-authored-by: xingying01 <xingying01@corp.netease.com >
2023-10-07 16:51:52 +08:00
976335e236
[Fix](stream load) stearm load record use valid txn info when two txn with same label #24320
...
Co-authored-by: wangqingtao6 <wangqingtao6@jd.com >
2023-10-07 16:42:45 +08:00
59261174d5
[chore](unused) Remove unused variable CPU_HARD_LIMIT in task_group.cc ( #25076 )
...
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com >
2023-10-07 03:36:13 -05:00
1405f1efd2
[refactor](nereids) unify withSel/updateRowCountOnly/withRowCount ( #24997 )
...
1.refactor statistics functions withSel/updateRowCountOnly/withRowCount,
2. donot use Double.MAX in stats estimation
3. dateLikeType.rangeLength() do not throw DateTimeException.
2023-10-07 16:22:30 +08:00
335804bb25
[fix](pipelinex) fix multi cast sink without init ( #25066 )
2023-10-07 15:49:03 +08:00
3c9ff7af39
[feature](Nereids): push down topN through join ( #24720 )
...
Push TopN through Join.
JoinType just can be left/right outer join or cross join, because data of their one child can't be filtered.
new TopN is (original limit + original offset, 0) as limit and offset.
2023-10-07 14:58:53 +08:00
47694c5b36
[fix](jdbc catalog )fix jdbc catalog current_timestamp default ( #25016 )
...
This problem is caused when you read table data from Mariadb where the datatime type default value is set to current_timestamp().
2023-10-07 01:43:03 -05:00
0f6ea41220
[bug][auth]Show grant causes role errors in the memory. #24783 ( #24841 )
2023-10-07 14:06:09 +08:00
727fa2c0cd
[opt](tvf) refine the class of ExternalFileTableValuedFunction ( #24706 )
...
`ExternalFileTableValuedFunction` now has 3 derived classes:
- LocalTableValuedFunction
- HdfsTableValuedFunction
- S3TableValuedFunction
All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem.
So I refine the fields and methods of these classes.
Now there 3 kinds of properties of these tvfs:
1. File format properties
File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties.
So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`.
2. URI or file path
The URI or file path property indicate the file location. For different storage, the format of the uri are not same.
So they should be analyzed in each derived classes.
3. Other properties
All other properties which are special for certain tvf.
So they should be analyzed in each derived classes.
There are 2 new classes:
- `FileFormatConstants`: Define some common property names or variables related to file format.
- `FileFormatUtils`: Define some util methods related to file format.
After this PR, if we want to add some common properties for all these tvfs, only need to handled it in
`ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them.
### Behavior change
1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri`
2. Use `\t` as the default column separator of csv format, same as stream load
2023-10-07 12:44:04 +08:00
0e615a531e
[Feature](Job)Job tasks support the choice of persistence or storage in memory ( #24919 )
2023-10-06 23:20:36 -05:00
7b2ff38401
query cpu hard limit based on doris scheduler ( #24844 )
2023-10-07 12:03:07 +08:00
70f5b0006f
[fix](Nereids) ctas throw npe when default value is null ( #25009 )
2023-10-06 22:39:32 -05:00
ffad945dd1
[opt](optimizer) Recycle expired table stats #24777
...
Remove table stats when olap table is dropped
2023-10-07 11:31:45 +08:00
f1e948e5f4
[fix](planner)the common type of date and decimal should be double ( #24956 )
2023-10-07 11:27:19 +08:00
0631ed61b0
[feature](profilev2) Preliminary support for profilev2. ( #24881 )
...
You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1.
set profile_level = 1;
such as
sql
select count(*) from customer join item on c_customer_sk = i_item_sk
profile
Simple profile
PLAN FRAGMENT 0
OUTPUT EXPRS:
count(*)
PARTITION: UNPARTITIONED
VRESULT SINK
MYSQL_PROTOCAL
7:VAGGREGATE (merge finalize)
| output: count(partial_count(*))[#44 ]
| group by:
| cardinality=1
| TotalTime: avg 725.608us, max 725.608us, min 725.608us
| RowsReturned: 1
|
6:VEXCHANGE
offset: 0
TotalTime: avg 52.411us, max 52.411us, min 52.411us
RowsReturned: 8
PLAN FRAGMENT 1
PARTITION: HASH_PARTITIONED: c_customer_sk
STREAM DATA SINK
EXCHANGE ID: 06
UNPARTITIONED
TotalTime: avg 106.263us, max 118.38us, min 81.403us
BlocksSent: 8
5:VAGGREGATE (update serialize)
| output: partial_count(*)[#43 ]
| group by:
| cardinality=1
| TotalTime: avg 679.296us, max 739.395us, min 554.904us
| BuildTime: avg 33.198us, max 48.387us, min 28.880us
| ExecTime: avg 27.633us, max 40.278us, min 24.537us
| RowsReturned: 8
|
4:VHASH JOIN
| join op: INNER JOIN(PARTITIONED)[]
| equal join conjunct: c_customer_sk = i_item_sk
| runtime filters: RF000[bloom] <- i_item_sk(18000/16384/1048576)
| cardinality=17,740
| vec output tuple id: 3
| vIntermediate tuple ids: 2
| hash output slot ids: 22
| RowsReturned: 18.0K (18000)
| ProbeRows: 18.0K (18000)
| ProbeTime: avg 862.308us, max 1.576ms, min 666.28us
| BuildRows: 18.0K (18000)
| BuildTime: avg 3.8ms, max 3.860ms, min 2.317ms
|
|----1:VEXCHANGE
| offset: 0
| TotalTime: avg 48.822us, max 67.459us, min 30.380us
| RowsReturned: 18.0K (18000)
|
3:VEXCHANGE
offset: 0
TotalTime: avg 33.162us, max 39.480us, min 28.854us
RowsReturned: 18.0K (18000)
PLAN FRAGMENT 2
PARTITION: HASH_PARTITIONED: c_customer_id
STREAM DATA SINK
EXCHANGE ID: 03
HASH_PARTITIONED: c_customer_sk
TotalTime: avg 753.954us, max 1.210ms, min 499.470us
BlocksSent: 64
2:VOlapScanNode
TABLE: default_cluster:tpcds.customer(customer), PREAGGREGATION: ON
runtime filters: RF000[bloom] -> c_customer_sk
partitions=1/1, tablets=12/12, tabletList=1550745,1550747,1550749 ...
cardinality=100000, avgRowSize=0.0, numNodes=1
pushAggOp=NONE
TotalTime: avg 18.417us, max 41.319us, min 10.189us
RowsReturned: 18.0K (18000)
---------
Co-authored-by: yiguolei <676222867@qq.com >
2023-10-07 11:16:53 +08:00
83a9d07288
[refactor](segment iterator) remove some code to make the logic more clear ( #25050 )
...
Co-authored-by: yiguolei <yiguolei@gmail.com >
2023-10-07 11:14:28 +08:00
bd582aee75
[pipelineX](minor) refine code ( #25015 )
2023-10-07 10:45:33 +08:00
813c8f1e5a
[Improve](metric) Improve FE DorisMetricRegistry ( #24773 )
...
The current implementation needs to iterate all metrics in a lock,
which might cause latency spikes. This PR changes the underlying
data structure to ConcurrentHashMap so that removing metrics doesn't
need to block the entire registry.
2023-10-07 10:25:55 +08:00
42c52037fc
Revert "[Fix](Nereids) fix infer predicate lost cast of source expression ( #23692 )" ( #25008 )
...
This reverts commit be3618316f8411ad36d0a77f5b4405f2dbd128fa.
2023-10-06 21:18:29 -05:00
a9d12f7b82
[Debug](float) Add clang debug tune float accuracy ( #25041 )
2023-10-07 09:34:50 +08:00
534d942933
[improvement](tablet clone) impr further repair tablet sched priority ( #25046 )
2023-10-05 22:19:53 +08:00
136973d4fa
[fix](testcase) add state check for ADD INDEX before BUILD INDEX to avoid table state not normal ( #25038 )
2023-10-05 22:15:54 +08:00
c2b46e4df7
[fix](move-memtable) exclude rpc memory in flush mem-tracker ( #24722 )
2023-10-05 22:10:53 +08:00
db6c16058a
[improve](move-memtable) always share load streams ( #24763 )
2023-10-05 22:09:59 +08:00
d1f4d69032
[regression-test](merge-on-write) Add cases for partial update using insert statement with schema change ( #24902 )
2023-10-05 22:09:22 +08:00
93eedaff62
[opt](function) Use Dict to opt the function of time_round ( #25029 )
...
Before:
select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t | cnt |
+---------------------+--------+
| 1998-04-30 21:00:00 | 324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (3.39 sec)
after:
select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t | cnt |
+---------------------+--------+
| 1998-04-30 21:00:00 | 324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (2.19 sec)
2023-10-04 23:34:24 +08:00
4ce5213b1c
[fix](insert) Fix test_group_commit_stream_load and add more regression in test_group_commit_http_stream ( #24954 )
2023-10-03 20:56:24 +08:00
c298b1ca1a
[fix](timezone) fix parse timezone when include GMT or time zone short ids ( #25032 )
2023-10-03 20:53:16 +08:00
6e836fe381
[fix](jdbc catalog) fix jdbc catalog read bitmap data crash ( #25034 )
2023-10-03 20:52:47 +08:00
d77d56b2c6
[fix](regression test) fix ci error ( #25036 )
2023-10-03 20:52:00 +08:00
10f0c63896
[FIX](complex-type) fix agg table with complex type with replace state ( #24873 )
...
fix agg table with complex type with replace state
2023-10-03 16:32:58 +08:00
f8a3034dca
[Opt](performance) refactor and opt time round floor function ( #25026 )
...
refactor and opt time round floor function
2023-10-01 11:51:26 +08:00
113f0684a6
[chore](case) Simplify cold heat separation case ( #25010 )
2023-09-30 15:35:42 +08:00
642e5cdb69
[Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly ( #23395 )
2023-09-29 22:38:52 +08:00