d988193d39
[pipelineX](shuffle) block exchange sink by memory usage ( #26595 )
2023-11-09 21:28:22 +08:00
5d80e7dc2f
[Improvement](pipelineX) Improve local exchange on pipelineX engine ( #26464 )
2023-11-07 22:11:44 +08:00
99b45e1938
[fix](Outfile) Export DateTimev2 type of doris to ORC's TimeStamp type ( #25470 )
...
Previously,doris's `DateTimev2` was exported to orc as a `String` type.
Now, export doris's `DateTimev2` to orc timestamp type.
2023-10-29 15:59:38 +08:00
c1d64a7128
[Feature](datatype) Add IPv4/v6 data type for doris ( #24965 )
2023-10-26 17:33:28 +08:00
d6c64d305f
[chore](log) Add log to trace query execution #25739
2023-10-26 14:09:25 +08:00
31d2a9a4f5
[Enhancement](function) support fractions for convert_tz(datetimev2) ( #25915 )
...
mysql> select convert_tz('2019-08-01 01:01:02.123' , '+00:00', '+07:00');
+----------------------------------------------------------------------------------+
| convert_tz(cast('2019-08-01 01:01:02.123' as DATETIMEV2(3)), '+00:00', '+07:00') |
+----------------------------------------------------------------------------------+
| 2019-08-01 08:01:02.123 |
+----------------------------------------------------------------------------------+
1 row in set (0.18 sec)
2023-10-26 10:46:47 +08:00
7f66be84d5
[fix](Outfile) Infer the column name if the column is expression in select into outfile ( #25854 )
...
This pr do two things:
1. Infer the column name if the column is expression in `select into outfile`. The rule for column name generation can be refered in pr: #24990
2. fix bug that it will core dump if the `_schema` fails to build in the open phase in vorc_transformer.cpp
TODO:
1. Support infer the column name if the column is expression in `select into outfile` in new optimizer(Nereids).
2023-10-25 22:49:04 +08:00
e8f479882d
[pipelineX](local exchange) Add local exchange operator ( #25846 )
2023-10-25 18:45:02 +08:00
6b2eed779c
[feature](AuditLog) add scanRows scanBytes in auditlog ( #25435 )
2023-10-25 10:00:35 +08:00
08832d9f3a
[Fix](exec) Fix date dict dead loop. ( #25570 )
2023-10-24 02:51:43 +08:00
642c149e6a
remove datetime_value and move vecdatetime_value to doris namespace ( #25695 )
...
remove datetime_value and move vecdatetime_value to doris namespace
2023-10-20 22:08:17 +08:00
dc47087560
[fix](function) fix str_to_date default return type scale for nereids ( #24932 )
...
fix str_to_date default return type scale for nereids
2023-10-20 12:55:49 +08:00
b964ab76b3
[refactor](shuffle) Simplify hash partitioning strategy ( #25596 )
2023-10-19 19:28:22 +08:00
e77b98be88
[fix](months_diff) fix wrong result of months_diff ( #25577 )
2023-10-19 14:29:47 +08:00
08f305dd79
[chore](build) Fix compilation errors reported by GCC-13 ( #25439 )
...
1. Fix lots of compilation errors reported by GCC-13.
2. Fix the workflow BE UT (macOS).
2023-10-15 07:57:36 -05:00
37dbda6209
[pipelineX](refactor) Use class template to simplify join ( #25369 )
2023-10-13 16:51:55 +08:00
6f9a084d99
[Fix](Outfile) Use data_type_serde to export data to parquet file format ( #24998 )
2023-10-13 13:58:34 +08:00
f960b8c989
[bugfix](stream receiver) be will core during stop because receiver is not closed ( #25298 )
...
Co-authored-by: yiguolei <yiguolei@gmail.com >
2023-10-11 19:49:40 +08:00
b58010c48e
[fix](export) BufferWritable must be committed before deconstruct ( #25185 )
...
F20231009 16:03:47.659968 3342535 string_buffer.hpp:48] Check failed: _now_offset == 0
*** Check failure stack trace: ***
@ 0x561a6f8e21e6 google::LogMessage::SendToLog()
@ 0x561a6f8de7b0 google::LogMessage::Flush()
@ 0x561a6f8e2a29 google::LogMessageFatal::~LogMessageFatal()
@ 0x561a4a409233 doris::vectorized::BufferWritable::~BufferWritable()
@ 0x561a6e202853 doris::vectorized::VCSVTransformer::write()
@ 0x561a6e1f19ba doris::vectorized::VFileResultWriter::_write_file()
@ 0x561a6e1f1522 doris::vectorized::VFileResultWriter::append_block()
@ 0x561a6e121bed
The error will occur in DEBUG mode, and doing export will invalid data.
It has been covered by baidu case.
2023-10-09 22:39:45 +08:00
0df32c8e3e
[Fix](Outfile) Use data_type_serde to export data to csv file format ( #24721 )
...
Modify the outfile logic, use the data type serde framework.
2023-10-07 22:50:44 +08:00
93eedaff62
[opt](function) Use Dict to opt the function of time_round ( #25029 )
...
Before:
select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t | cnt |
+---------------------+--------+
| 1998-04-30 21:00:00 | 324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (3.39 sec)
after:
select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t | cnt |
+---------------------+--------+
| 1998-04-30 21:00:00 | 324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (2.19 sec)
2023-10-04 23:34:24 +08:00
f8a3034dca
[Opt](performance) refactor and opt time round floor function ( #25026 )
...
refactor and opt time round floor function
2023-10-01 11:51:26 +08:00
642e5cdb69
[Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly ( #23395 )
2023-09-29 22:38:52 +08:00
864a0f9bcb
[opt](pipeline) Make pipeline fragment context send_report asynchronized ( #23142 )
2023-09-28 17:55:53 +08:00
aa4dbbedc7
[pipelineX](bug) Fix dead lock in exchange sink operator ( #24947 )
2023-09-27 15:40:25 +08:00
28869b0f82
[fix](Outfile) Use data_type_serde to export data to orc file format ( #24812 )
2023-09-26 19:46:42 +08:00
a3427cb822
[pipelineX](fix) Fix nested loop join operator ( #24885 )
2023-09-26 13:27:34 +08:00
a48b19ceb6
[feature](Outfile) select into outfile supports to export struct/map/array type data to orc file format ( #24350 )
...
We do not support nested complex type in this pr.
2023-09-21 20:15:18 +08:00
85fb46bb71
[refactor](cache) Refactor preloaded timezone global cache ( #24694 )
...
Refactor preloaded timezone global cache
2023-09-21 17:26:41 +08:00
e59aa49f28
[feature](datetime-func)support milliseconds_add/sub/diff and microseconds_diff ( #24114 )
2023-09-20 10:38:56 +08:00
e54c4ef258
[pipelineX](dependency) refactor write dependency ( #24555 )
2023-09-19 18:01:42 +08:00
d24f3efd4a
[pipelineX](profile) Phase 1: refactor pipelineX detailed profile ( #24322 )
2023-09-15 16:14:05 +08:00
c5e7f55b63
[performance](executor) optimize time_round function ( #23058 )
...
optimize time_round function
2023-09-15 10:49:22 +08:00
4fbb25bc55
[Enhancement](function) Support date_trunc(date) and use it in auto partition ( #24341 )
...
Support date_trunc(date) and use it in auto partition
2023-09-14 16:53:09 +08:00
0896aefce3
[fix](local exchange) fix bug of accesssing released counter of local data stream receiver ( #24148 )
2023-09-11 09:52:31 +08:00
e140938d81
[Perfomance][export] Opt the export of CSV tranformer ( #24003 )
2023-09-08 20:26:54 +08:00
1a8913f8f4
[fix](shared hash table) fix p0 test failure ( #23907 )
2023-09-05 14:48:46 +08:00
a36c387a2b
[Refactor](transformer) convert to file format writer to transformer ( #23888 )
2023-09-05 10:50:10 +08:00
5853ed385e
[pipelineX](join) Support shared hash table ( #23876 )
2023-09-05 10:14:40 +08:00
eaf2a6a80e
[fix](date) return right date value even if out of the range of date dictionary( #23664 )
...
PR(https://github.com/apache/doris/pull/22360 ) and PR(https://github.com/apache/doris/pull/22384 ) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark.
How to fix:
1. Increase dictionary range: 1900 ~ 2038
2. The date out of 1900 ~ 2038 is regenerated.
2023-09-01 14:40:20 +08:00
c74ca15753
[pipeline](sink) Supprt Async Writer Sink of result file sink and memory scratch sink ( #23589 )
2023-08-31 22:44:25 +08:00
62c075bf7e
[improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor ( #23672 )
2023-08-31 14:44:17 +08:00
25b8831afd
[fix](Outfile) fix core dump when export data to orc file format using outfile ( #23586 )
...
* fix
* add test
2023-08-30 19:01:44 +08:00
7913354f78
add column number check for vsorted_run_merger ( #23584 )
2023-08-29 10:41:59 +08:00
f32efe5758
[Fix](Outfile) Fix that it does not report error when export table to S3 with an incorrect ak/sk/bucket ( #23441 )
...
Problem:
It will return a result although we use wrong ak/sk/bucket name, such as:
```sql
mysql> select * from demo.student
-> into outfile "s3://xxxx/exp_"
-> format as csv
-> properties(
-> "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com ",
-> "s3.region" = "ap-beijing",
-> "s3.access_key"= "xxx",
-> "s3.secret_key" = "yyyy"
-> );
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| 1 | 3 | 26 | s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
1 row in set (0.15 sec)
```
The reason for this is that we did not catch the error returned by `close()` phase.
2023-08-26 00:19:30 +08:00
f66f161017
[fix](multi-catalog)fix hive table with cosn location issue ( #23409 )
...
Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc).
This PR mainly changes:
1. Fix the bug of accessing files via cosn.
2. Add a new field `fs_name` in TFileRangeDesc
This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query
request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name
for each file, otherwise, it may return error:
`reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/ )`
2023-08-26 00:16:00 +08:00
9cacf9535a
[Opt](functions) Use preloaded cache to accelerate timezone parsing ( #22694 )
...
* opt
* bugfix
* fix ut
* fix stylecheck
2023-08-25 10:00:48 +08:00
0838ff4bf4
[fix](Outfile) fix bug that the fileSize is not correct when outfile is completed ( #22951 )
2023-08-18 22:31:44 +08:00
e289e03a1a
[fix](executor)fix no return with old type in time_round
2023-08-17 15:34:26 +08:00
d5df3bae25
[Bug](exchange) fix dcheck fail when VDataStreamRecvr input empty block ( #22992 )
...
fix dcheck fail when VDataStreamRecvr input empty block
2023-08-16 10:21:19 +08:00