Commit Graph

302 Commits

Author SHA1 Message Date
8dbd73988a [fix](recvr) catch exception of transmit_block (#39882)
BP #39881
2024-08-25 00:25:20 +08:00
460605ae3c [branch-2.1] pick some prs (#39860)
## Proposed changes

Issue Number: close #xxx

https://github.com/apache/doris/pull/38385 optimize parsing datetime
https://github.com/apache/doris/pull/38978 make stream load failure
message more clear and disable some error's stacktrace by default
https://github.com/apache/doris/pull/39255 fix random function coredump
https://github.com/apache/doris/pull/39324 fix function corr
inconsistency with doc
https://github.com/apache/doris/pull/39449 check auto partitoin nullity
when creating partition
https://github.com/apache/doris/pull/39695 make
DynamicPartitionScheduler immediately know interval's change
https://github.com/apache/doris/pull/39754 Add some partition expr check
on creating table
2024-08-24 17:26:42 +08:00
04e993c1de [refine](pipeline) refine some VDataStreamRecvr code (#35063) (#37802)
## Proposed changes
https://github.com/apache/doris/pull/35063
https://github.com/apache/doris/pull/35428
2024-08-22 19:55:17 +08:00
8ce8887b75 [branch-2.1](memory) Refactor refresh workload groups weighted memory ratio and record refresh interval memory growth (#39760)
pick #38168
overwrites changes in #37221 on workload_group_manager.cpp. If need to
pick 37221, ignore it.
2024-08-22 17:33:11 +08:00
017dad8c54 [fix](type)support runtime predicate for time type (#38258) (#38465)
## Proposed changes
https://github.com/apache/doris/pull/38258
Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-31 10:27:36 +08:00
a751372e76 [Feature](multi-catalog) Add memory tracker for orc reader/writer and arrow parquet writer。 (#37257)
## Proposed changes

backport #37234
2024-07-25 13:51:59 +08:00
7819c75e55 [fix](shuffle) Fix local exchange dependency blocking (#38160)
## Proposed changes

pick #38151

<!--Describe your changes.-->
2024-07-20 00:19:47 +08:00
4b31e52b24 [enhancement](runtimefilter) fix potential core in runtime filter sync filter size (#38058) (#38093)
pick #38058

## Proposed changes
IRuntimeFilter maybe deconstructed before the rpc finished, so that
could not use a raw pointer in closure. Has to use the context's shared
ptr.

---------

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-07-18 23:11:26 +08:00
88d771d360 [pipeline](fix) Avoid to use a freed dependency when cancelled (#34584) (#38046)
## Proposed changes

pick #34584
<!--Describe your changes.-->
2024-07-18 15:27:10 +08:00
0aeb768bf9 [Fix](export/outfile) Support compression when exporting data to Parquet / ORC. (#37167)
bp: #36490
2024-07-03 10:53:57 +08:00
72c20d3ccc [branch-2.1](function) fix date_format and from_unixtime core when meet long format string (#35883) (#36158)
pick #35883
2024-07-01 20:35:31 +08:00
Pxl
cb80ae906f [Bug](runtime-filter) disable sync filter when pipeline engine is off (#36994)
## Proposed changes
1. disable sync filter when pipeline engine is off
2. reduce some warning log
2024-06-28 16:59:26 +08:00
c84b56140c [Fix](outfile) Add a configuration for exporting data in Parquet format using select into outfile (#36143)
backport: #36142
2024-06-13 11:49:46 +08:00
1715bae26f [opt](parquet-writer) Specify the row group size when writing data to Parquet files. (#35081) (#36042)
bp #35081

Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
2024-06-07 17:57:11 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
Pxl
b143f0dfe2 [Improvement](date) shortcut for str to date parse (#35288)
shortcut for str to date parse
2024-05-25 17:47:20 +08:00
a6f7747d29 [feature](datatype) add BE config to allow zero date (#34961)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-05-23 19:12:39 +08:00
a8c24d7698 [Fix](function) fix overflow of date_add function (#35080)
fix overflow of date_add function
2024-05-22 10:02:59 +08:00
b96148c9cd [Fix](function) fix days/weeks_diff result wrong on BE #35104
select days_diff('2024-01-01 00:00:00', '2023-12-31 23:59:59');
should be 0 but got 1 on BE.
2024-05-22 10:00:26 +08:00
c7134faea9 [Fix](outfile) Fix the timing of setting the _is_closed flag in Parquet/ORC writer (#34668) 2024-05-15 10:28:22 +08:00
4dd5379951 [bugfix](hive)fix error for writing to hive for 2.1 (#34518)
mirror #34520
2024-05-14 23:27:29 +08:00
520774a24b [fix](serde) fix ipv4/v6 serde functions for arrow, orc, parquet format (#34042)
this PR is from @sjyango work in #32326,
wants merge #32326 into master branch, but it's draft and not maintain long time. so have this new PR.
Co-authored-by: sjyango <sjyang2022@zju.edu.cn>
2024-05-10 14:37:04 +08:00
Pxl
804586b342 [Improvement](sort) insert data by batch on VSortedRunMerger::get_next (#34363)
insert data by batch on VSortedRunMerger::get_next
2024-05-10 14:36:53 +08:00
a173513e27 [fix](pipelinex) exchange sink not set ready when source limit #34241 2024-04-29 20:58:50 +08:00
946d28646a [fix](outfile)Fixed orcOutputStream.close() throwing an exception during destruction causing the program to hang. (#34254)
bp #34243
2024-04-28 19:54:34 +08:00
30a68c1240 [fix](spill) use different algorithm to avoid partition data skew (#34162) 2024-04-27 11:20:36 +08:00
60e20a3afe [fix](pipeline_x) Crc32HashPartitioner should use ShuffleChannelIds (#34147) 2024-04-26 15:03:11 +08:00
25358564ca [Fix](compile) Fix gcc compile on master (#33864)
This is imported by #33511. wrongly used

ColumnStr<T> ();

which violate C++20 standard(see https://wg21.cmeerw.net/cwg/issue2237) but still supported by clang up until now(see llvm/llvm-project#58112)
2024-04-19 23:41:37 +08:00
657a29fd9e [refactor](partitioner) refine get channel id logics (#33765) 2024-04-18 19:05:24 +08:00
4863167f90 [refactor](pipelineX) Reduce prepare overhead (PART I) (#33550) 2024-04-17 23:42:12 +08:00
Pxl
341cb40693 [Chore](log) adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished (#33652)
adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished
2024-04-17 23:42:12 +08:00
48880c3e1a [Fix](timezone) fix miss of expected rounding of Date type with timezone #33553 2024-04-17 23:42:11 +08:00
3c9c6c18a8 [Enhancement](hive-writer) Write only regular fields to file in the hive-writer. (#33000) 2024-04-12 10:29:08 +08:00
Pxl
3081fc584d [Improvement](runtime-filter) support sync join node build side's size to init bloom runtime filter (#32180)
support sync join node build side's size to init bloom runtime filter
2024-04-11 09:31:50 +08:00
4963d60a07 [Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721) (#33446)
backport #32721.
2024-04-10 11:42:22 +08:00
59aa923bce [bug](function) fix milliseconds_diff function return wrong result (#32897)
* [bug](function) fix milliseconds_diff function return wrong result
2024-04-10 11:34:30 +08:00
2a0644f442 [Fix](function) Fix unix_timestamp core for string input (#32871) 2024-04-09 12:48:35 +08:00
d7a3ff1ddf [Fix](Outfile) Fix the column type mapping in the orc/parquet file format (#32281)
| Doris Type             | Orc Type                     |  Parquet Type                |
|---------------------|--------------------|------------------------|
| Date                            | Long (logical: DATE)                 |       int32 (Logical: Date)                                        |
| DateTime                    | TIMESTAMP (logical: TIMESTAMP)    |       int96                          |
2024-03-22 08:52:16 +08:00
0990014e94 [fix](datetime) fix datetime rounding on BE (#32075) 2024-03-21 14:07:19 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
20d6698c27 [bugfix](arm compile) could not compile on arm because -Werror=maybe-uninitialized 2024-03-14 12:11:25 +08:00
0159a75ced [bugfix](becore) be will core when stop because the map is modified during iterator (#32105)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-03-12 18:50:26 +08:00
4268634115 [fix](memory) Fix Allocator cancel pipelinex query #32048 2024-03-12 14:20:18 +08:00
68a5319da3 [fix](pipelineX) _local_channel_dependency is null in non pipelineX (#32054) 2024-03-12 14:19:04 +08:00
c0f2d0188b [feature](pipelineX) add mem control in local exchange sink (#31982) 2024-03-12 14:17:48 +08:00
808563470f [pipelineX](debug) Refactor code and complete debug string (#31733) 2024-03-06 13:07:49 +08:00
3451cd6c23 [fix](datetime) fix hour 24 on be (#31304) 2024-02-26 19:07:10 +08:00
52b9af06fb [pipelineX](refactor) Delete subclasses inherited from Dependency (#31216) 2024-02-22 13:01:48 +08:00
49dd411f87 [fix](datetime) fix datetime round on BE (#31205)
with tmp as (
            select CONCAT(
                YEAR('2024-02-06 03:37:07.157'), '-', 
                LPAD(MONTH('2024-02-06 03:37:07.157'), 2, '0'), '-',
                LPAD(DAY('2024-02-06 03:37:07.157'), 2, '0'), ' ',
                LPAD(HOUR('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(MINUTE('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(SECOND('2024-02-06 03:37:07.157'), 2, '0'), '.', "123456789" )
            AS generated_string)
            select generated_string, cast(generated_string as DateTime(6)) from tmp
before (incorrect round)

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123456              |
+-------------------------------+-----------------------------------------+
after (round up, keep consistent with mysql):

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123457              |
+-------------------------------+-----------------------------------------+
1 row in set (0.03 sec)
same work with #30744 but implemented on BE
2024-02-21 19:18:45 +08:00
f65844fae4 [Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533)
The UTF8 format of the Windows system has BOM. 

We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file.

**Usage:**
```sql
-- outfile:
select * from demo.student
into outfile "file:///xxx/export/exp_"
format as csv
properties(
    "column_separator" = ",",
    "with_bom" = "true"
);

-- Export:
EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_"
PROPERTIES(
    "format" = "csv",
    "with_bom" = "true"
);
```
2024-02-16 10:16:40 +08:00