913282b29b
[refactor](column) remove get_data_type in IColumn ( #25242 )
2023-10-10 20:27:15 +08:00
62a6b132be
[Fix](func numbers) Remove backend_nums argument of numbers function ( #25200 )
2023-10-10 20:25:58 +08:00
5f95e97c56
[fix](function) array distance should return null when result is nan ( #25214 )
2023-10-10 04:41:51 -05:00
6ca0f3fa5f
[Bug](writer) Fix ub in async writer ( #25218 )
2023-10-10 16:00:45 +08:00
7434f80300
[pipelineX](refactor) Refactor pending finish dependency ( #25181 )
2023-10-10 11:56:02 +08:00
880d0d7e70
[Bug](pipeline) Support the auto partition in pipeline load ( #25176 )
2023-10-10 11:51:12 +08:00
f5b826b66d
[fix](mark join) mark join column should be nullable ( #24910 )
2023-10-10 10:10:36 +08:00
b8621364d2
[FIX](serde)fix scale with decimalv2 in mysql writer which get real scale #25190
2023-10-10 09:09:57 +08:00
b58010c48e
[fix](export) BufferWritable must be committed before deconstruct ( #25185 )
...
F20231009 16:03:47.659968 3342535 string_buffer.hpp:48] Check failed: _now_offset == 0
*** Check failure stack trace: ***
@ 0x561a6f8e21e6 google::LogMessage::SendToLog()
@ 0x561a6f8de7b0 google::LogMessage::Flush()
@ 0x561a6f8e2a29 google::LogMessageFatal::~LogMessageFatal()
@ 0x561a4a409233 doris::vectorized::BufferWritable::~BufferWritable()
@ 0x561a6e202853 doris::vectorized::VCSVTransformer::write()
@ 0x561a6e1f19ba doris::vectorized::VFileResultWriter::_write_file()
@ 0x561a6e1f1522 doris::vectorized::VFileResultWriter::append_block()
@ 0x561a6e121bed
The error will occur in DEBUG mode, and doing export will invalid data.
It has been covered by baidu case.
2023-10-09 22:39:45 +08:00
53b46b7e6c
[FIX](filter) update for filter_by_select logic ( #25007 )
...
this pr is aim to update for filter_by_select logic and change delete limit
only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
2023-10-09 21:27:40 +08:00
4de3df6a46
[refactor](column) remove unused method and column definitions ( #25152 )
...
remove unused method and column definitions
using primitive type in predicate column to check datev1 and datev2
2023-10-09 17:14:35 +08:00
d7b6fe57df
[Bug](java-udf) fix java-udf memory leak ( #25151 )
2023-10-09 15:10:56 +08:00
451e299151
[Opt](performance) Optimize timeround with minute / second ( #25073 )
2023-10-08 23:14:23 +08:00
5c020be4d2
[Bug](join) corner case cause the mark join + null aware left join core dump in regression test in pipeline query engine ( #25087 )
2023-10-08 22:50:12 +08:00
7e9ffad933
[fix](ES catalog)Doris cannot parse ES date field without time zone ( #24864 )
...
1. Add support for Doris to parse ES date field without time zone info. eg: `2023-04-17T23:01:18.151`, this time will be treated as UTC time, since ES assumes that the time zone for time fields without time zones is UTC.
2. Change local time zone convertion from system local time zone to session variable time zone.
2023-10-08 19:28:08 +08:00
b91335dbb8
[refactor](columndecimal) is_decimal_v2 member is useless because column decimal could detect by itself ( #25110 )
...
Co-authored-by: yiguolei <yiguolei@gmail.com >
2023-10-08 18:09:19 +08:00
c3d9f42a3e
[fix](scanner) fix load cannot end when set exec_mem_limit ( #25090 )
2023-10-08 17:07:30 +08:00
6fe060b79e
[fix](streamload) fix http_stream retry mechanism ( #24978 )
...
If a failure occurs, doris may retry. Due to ctx->is_read_schema is a global variable that has not been reset in a timely manner, which may cause exceptions.
---------
Co-authored-by: yiguolei <676222867@qq.com >
2023-10-08 11:16:21 +08:00
feb1cbe9ed
[bug](partition_sort)partition sort need sort all data in two phase global ( #24960 )
...
#24886 this PR have mark phase in FE, now add those change in BE.
partition sort need sort all data in two pahse global
2023-10-08 10:46:43 +08:00
4e8cde127c
[Enhance](catalog)add table cache in paimon jni ( #25014 )
...
- fix get old schema after refresh paimon table
- add table cache in paimon jni
2023-10-08 10:36:18 +08:00
0df32c8e3e
[Fix](Outfile) Use data_type_serde to export data to csv file format ( #24721 )
...
Modify the outfile logic, use the data type serde framework.
2023-10-07 22:50:44 +08:00
7b2ff38401
query cpu hard limit based on doris scheduler ( #24844 )
2023-10-07 12:03:07 +08:00
0631ed61b0
[feature](profilev2) Preliminary support for profilev2. ( #24881 )
...
You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1.
set profile_level = 1;
such as
sql
select count(*) from customer join item on c_customer_sk = i_item_sk
profile
Simple profile
PLAN FRAGMENT 0
OUTPUT EXPRS:
count(*)
PARTITION: UNPARTITIONED
VRESULT SINK
MYSQL_PROTOCAL
7:VAGGREGATE (merge finalize)
| output: count(partial_count(*))[#44 ]
| group by:
| cardinality=1
| TotalTime: avg 725.608us, max 725.608us, min 725.608us
| RowsReturned: 1
|
6:VEXCHANGE
offset: 0
TotalTime: avg 52.411us, max 52.411us, min 52.411us
RowsReturned: 8
PLAN FRAGMENT 1
PARTITION: HASH_PARTITIONED: c_customer_sk
STREAM DATA SINK
EXCHANGE ID: 06
UNPARTITIONED
TotalTime: avg 106.263us, max 118.38us, min 81.403us
BlocksSent: 8
5:VAGGREGATE (update serialize)
| output: partial_count(*)[#43 ]
| group by:
| cardinality=1
| TotalTime: avg 679.296us, max 739.395us, min 554.904us
| BuildTime: avg 33.198us, max 48.387us, min 28.880us
| ExecTime: avg 27.633us, max 40.278us, min 24.537us
| RowsReturned: 8
|
4:VHASH JOIN
| join op: INNER JOIN(PARTITIONED)[]
| equal join conjunct: c_customer_sk = i_item_sk
| runtime filters: RF000[bloom] <- i_item_sk(18000/16384/1048576)
| cardinality=17,740
| vec output tuple id: 3
| vIntermediate tuple ids: 2
| hash output slot ids: 22
| RowsReturned: 18.0K (18000)
| ProbeRows: 18.0K (18000)
| ProbeTime: avg 862.308us, max 1.576ms, min 666.28us
| BuildRows: 18.0K (18000)
| BuildTime: avg 3.8ms, max 3.860ms, min 2.317ms
|
|----1:VEXCHANGE
| offset: 0
| TotalTime: avg 48.822us, max 67.459us, min 30.380us
| RowsReturned: 18.0K (18000)
|
3:VEXCHANGE
offset: 0
TotalTime: avg 33.162us, max 39.480us, min 28.854us
RowsReturned: 18.0K (18000)
PLAN FRAGMENT 2
PARTITION: HASH_PARTITIONED: c_customer_id
STREAM DATA SINK
EXCHANGE ID: 03
HASH_PARTITIONED: c_customer_sk
TotalTime: avg 753.954us, max 1.210ms, min 499.470us
BlocksSent: 64
2:VOlapScanNode
TABLE: default_cluster:tpcds.customer(customer), PREAGGREGATION: ON
runtime filters: RF000[bloom] -> c_customer_sk
partitions=1/1, tablets=12/12, tabletList=1550745,1550747,1550749 ...
cardinality=100000, avgRowSize=0.0, numNodes=1
pushAggOp=NONE
TotalTime: avg 18.417us, max 41.319us, min 10.189us
RowsReturned: 18.0K (18000)
---------
Co-authored-by: yiguolei <676222867@qq.com >
2023-10-07 11:16:53 +08:00
a9d12f7b82
[Debug](float) Add clang debug tune float accuracy ( #25041 )
2023-10-07 09:34:50 +08:00
c2b46e4df7
[fix](move-memtable) exclude rpc memory in flush mem-tracker ( #24722 )
2023-10-05 22:10:53 +08:00
db6c16058a
[improve](move-memtable) always share load streams ( #24763 )
2023-10-05 22:09:59 +08:00
93eedaff62
[opt](function) Use Dict to opt the function of time_round ( #25029 )
...
Before:
select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t | cnt |
+---------------------+--------+
| 1998-04-30 21:00:00 | 324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (3.39 sec)
after:
select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t | cnt |
+---------------------+--------+
| 1998-04-30 21:00:00 | 324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (2.19 sec)
2023-10-04 23:34:24 +08:00
10f0c63896
[FIX](complex-type) fix agg table with complex type with replace state ( #24873 )
...
fix agg table with complex type with replace state
2023-10-03 16:32:58 +08:00
f8a3034dca
[Opt](performance) refactor and opt time round floor function ( #25026 )
...
refactor and opt time round floor function
2023-10-01 11:51:26 +08:00
642e5cdb69
[Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly ( #23395 )
2023-09-29 22:38:52 +08:00
864a0f9bcb
[opt](pipeline) Make pipeline fragment context send_report asynchronized ( #23142 )
2023-09-28 17:55:53 +08:00
430634367a
[pipelineX](node)support file scan operator ( #24924 )
2023-09-27 22:10:43 +08:00
68087f6c82
[fix](json function) Fix the slow performance of get_json_path when processing JSONB ( #24631 )
...
When processing JSONB, automatically convert to jsonb_extract_string
2023-09-27 21:17:39 +08:00
947b116318
[pipelineX](fix) Fix BE crash due to ES scan operator ( #24983 )
2023-09-27 20:45:38 +08:00
5d138b6928
[remove](function) make execute_impl const and remove running_difference function ( #24935 )
2023-09-27 18:17:28 +08:00
5fc04b6aeb
[Improvement](hash) some refactor of process hash table probe impl ( #24461 )
...
some refactor of process hash table probe impl
2023-09-27 16:14:49 +08:00
aa4dbbedc7
[pipelineX](bug) Fix dead lock in exchange sink operator ( #24947 )
2023-09-27 15:40:25 +08:00
26818de9c8
[feature](jni) support complex types in jni framework ( #24810 )
...
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);
// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);
// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
2023-09-27 14:47:41 +08:00
c04e5bac39
[bug](pipelineX) fix java-udaf failed with open pipelineX ( #24939 )
2023-09-27 13:14:10 +08:00
24ee3607e1
[Bug](pipeline) nullprt may be close the sink if init failed ( #24926 )
2023-09-27 09:11:06 +08:00
55d1090137
[feature](insert) Support group commit stream load ( #24304 )
2023-09-26 20:57:02 +08:00
28869b0f82
[fix](Outfile) Use data_type_serde to export data to orc file format ( #24812 )
2023-09-26 19:46:42 +08:00
94082ae59c
[Fix](inverted index) fix tokenize function coredump ( #24896 )
2023-09-26 17:31:10 +08:00
082bcd820b
[feature](insert) Support wal for group commit insert ( #23053 )
2023-09-26 14:46:24 +08:00
a3427cb822
[pipelineX](fix) Fix nested loop join operator ( #24885 )
2023-09-26 13:27:34 +08:00
513e37bdbf
[pipelineX](node)support jdbc scan operator ( #24851 )
2023-09-26 10:02:51 +08:00
8191cd1dad
[Bug](ScanNode) Fix potential incorrect query result caused by concurrent NewOlapScanNode initialization and Compaction ( #24638 )
...
* Optimize fetch delete predicates
* Fix incorrect query result when compaction eliminate delete predicates between `NewOlapScanNode::_init_scanners` and `NewOlapScanner::init`
* Fix be ut
2023-09-25 22:24:35 +08:00
b38b8b4494
[pipelineX](fix) Fix BE crash caused by join and constant expr ( #24862 )
2023-09-25 21:01:09 +08:00
3b4d8b4ac8
[pipelineX](feature) Support schema scan operator ( #24850 )
2023-09-25 14:42:25 +08:00
9412775686
remove useless variable in scanctx ( #24849 )
...
remove useless variable in scanctx
2023-09-25 14:36:18 +08:00