Commit Graph

292 Commits

Author SHA1 Message Date
72c20d3ccc [branch-2.1](function) fix date_format and from_unixtime core when meet long format string (#35883) (#36158)
pick #35883
2024-07-01 20:35:31 +08:00
Pxl
cb80ae906f [Bug](runtime-filter) disable sync filter when pipeline engine is off (#36994)
## Proposed changes
1. disable sync filter when pipeline engine is off
2. reduce some warning log
2024-06-28 16:59:26 +08:00
c84b56140c [Fix](outfile) Add a configuration for exporting data in Parquet format using select into outfile (#36143)
backport: #36142
2024-06-13 11:49:46 +08:00
1715bae26f [opt](parquet-writer) Specify the row group size when writing data to Parquet files. (#35081) (#36042)
bp #35081

Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
2024-06-07 17:57:11 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
Pxl
b143f0dfe2 [Improvement](date) shortcut for str to date parse (#35288)
shortcut for str to date parse
2024-05-25 17:47:20 +08:00
a6f7747d29 [feature](datatype) add BE config to allow zero date (#34961)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-05-23 19:12:39 +08:00
a8c24d7698 [Fix](function) fix overflow of date_add function (#35080)
fix overflow of date_add function
2024-05-22 10:02:59 +08:00
b96148c9cd [Fix](function) fix days/weeks_diff result wrong on BE #35104
select days_diff('2024-01-01 00:00:00', '2023-12-31 23:59:59');
should be 0 but got 1 on BE.
2024-05-22 10:00:26 +08:00
c7134faea9 [Fix](outfile) Fix the timing of setting the _is_closed flag in Parquet/ORC writer (#34668) 2024-05-15 10:28:22 +08:00
4dd5379951 [bugfix](hive)fix error for writing to hive for 2.1 (#34518)
mirror #34520
2024-05-14 23:27:29 +08:00
520774a24b [fix](serde) fix ipv4/v6 serde functions for arrow, orc, parquet format (#34042)
this PR is from @sjyango work in #32326,
wants merge #32326 into master branch, but it's draft and not maintain long time. so have this new PR.
Co-authored-by: sjyango <sjyang2022@zju.edu.cn>
2024-05-10 14:37:04 +08:00
Pxl
804586b342 [Improvement](sort) insert data by batch on VSortedRunMerger::get_next (#34363)
insert data by batch on VSortedRunMerger::get_next
2024-05-10 14:36:53 +08:00
a173513e27 [fix](pipelinex) exchange sink not set ready when source limit #34241 2024-04-29 20:58:50 +08:00
946d28646a [fix](outfile)Fixed orcOutputStream.close() throwing an exception during destruction causing the program to hang. (#34254)
bp #34243
2024-04-28 19:54:34 +08:00
30a68c1240 [fix](spill) use different algorithm to avoid partition data skew (#34162) 2024-04-27 11:20:36 +08:00
60e20a3afe [fix](pipeline_x) Crc32HashPartitioner should use ShuffleChannelIds (#34147) 2024-04-26 15:03:11 +08:00
25358564ca [Fix](compile) Fix gcc compile on master (#33864)
This is imported by #33511. wrongly used

ColumnStr<T> ();

which violate C++20 standard(see https://wg21.cmeerw.net/cwg/issue2237) but still supported by clang up until now(see llvm/llvm-project#58112)
2024-04-19 23:41:37 +08:00
657a29fd9e [refactor](partitioner) refine get channel id logics (#33765) 2024-04-18 19:05:24 +08:00
4863167f90 [refactor](pipelineX) Reduce prepare overhead (PART I) (#33550) 2024-04-17 23:42:12 +08:00
Pxl
341cb40693 [Chore](log) adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished (#33652)
adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished
2024-04-17 23:42:12 +08:00
48880c3e1a [Fix](timezone) fix miss of expected rounding of Date type with timezone #33553 2024-04-17 23:42:11 +08:00
3c9c6c18a8 [Enhancement](hive-writer) Write only regular fields to file in the hive-writer. (#33000) 2024-04-12 10:29:08 +08:00
Pxl
3081fc584d [Improvement](runtime-filter) support sync join node build side's size to init bloom runtime filter (#32180)
support sync join node build side's size to init bloom runtime filter
2024-04-11 09:31:50 +08:00
4963d60a07 [Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721) (#33446)
backport #32721.
2024-04-10 11:42:22 +08:00
59aa923bce [bug](function) fix milliseconds_diff function return wrong result (#32897)
* [bug](function) fix milliseconds_diff function return wrong result
2024-04-10 11:34:30 +08:00
2a0644f442 [Fix](function) Fix unix_timestamp core for string input (#32871) 2024-04-09 12:48:35 +08:00
d7a3ff1ddf [Fix](Outfile) Fix the column type mapping in the orc/parquet file format (#32281)
| Doris Type             | Orc Type                     |  Parquet Type                |
|---------------------|--------------------|------------------------|
| Date                            | Long (logical: DATE)                 |       int32 (Logical: Date)                                        |
| DateTime                    | TIMESTAMP (logical: TIMESTAMP)    |       int96                          |
2024-03-22 08:52:16 +08:00
0990014e94 [fix](datetime) fix datetime rounding on BE (#32075) 2024-03-21 14:07:19 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
20d6698c27 [bugfix](arm compile) could not compile on arm because -Werror=maybe-uninitialized 2024-03-14 12:11:25 +08:00
0159a75ced [bugfix](becore) be will core when stop because the map is modified during iterator (#32105)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-03-12 18:50:26 +08:00
4268634115 [fix](memory) Fix Allocator cancel pipelinex query #32048 2024-03-12 14:20:18 +08:00
68a5319da3 [fix](pipelineX) _local_channel_dependency is null in non pipelineX (#32054) 2024-03-12 14:19:04 +08:00
c0f2d0188b [feature](pipelineX) add mem control in local exchange sink (#31982) 2024-03-12 14:17:48 +08:00
808563470f [pipelineX](debug) Refactor code and complete debug string (#31733) 2024-03-06 13:07:49 +08:00
3451cd6c23 [fix](datetime) fix hour 24 on be (#31304) 2024-02-26 19:07:10 +08:00
52b9af06fb [pipelineX](refactor) Delete subclasses inherited from Dependency (#31216) 2024-02-22 13:01:48 +08:00
49dd411f87 [fix](datetime) fix datetime round on BE (#31205)
with tmp as (
            select CONCAT(
                YEAR('2024-02-06 03:37:07.157'), '-', 
                LPAD(MONTH('2024-02-06 03:37:07.157'), 2, '0'), '-',
                LPAD(DAY('2024-02-06 03:37:07.157'), 2, '0'), ' ',
                LPAD(HOUR('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(MINUTE('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(SECOND('2024-02-06 03:37:07.157'), 2, '0'), '.', "123456789" )
            AS generated_string)
            select generated_string, cast(generated_string as DateTime(6)) from tmp
before (incorrect round)

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123456              |
+-------------------------------+-----------------------------------------+
after (round up, keep consistent with mysql):

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123457              |
+-------------------------------+-----------------------------------------+
1 row in set (0.03 sec)
same work with #30744 but implemented on BE
2024-02-21 19:18:45 +08:00
f65844fae4 [Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533)
The UTF8 format of the Windows system has BOM. 

We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file.

**Usage:**
```sql
-- outfile:
select * from demo.student
into outfile "file:///xxx/export/exp_"
format as csv
properties(
    "column_separator" = ",",
    "with_bom" = "true"
);

-- Export:
EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_"
PROPERTIES(
    "format" = "csv",
    "with_bom" = "true"
);
```
2024-02-16 10:16:40 +08:00
be31b8dc61 [Refactor](exchange) remove unless code in exchange and opt some code (#30813) 2024-02-05 21:59:52 +08:00
8ff8d94697 [fix](ip) change IPv6 to little-endian byte order storage (like IPv4) (#30730) 2024-02-05 21:56:57 +08:00
3315c16383 [enhance](function) refactor from_format_str and support more format (#30452) 2024-02-01 19:08:37 +08:00
713798d549 [feature](nereids)support mark join (#30133)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
2024-01-27 09:09:53 +08:00
24ed3e4103 [Fix](Expr&code-style) check prepare&open before every VExpr execute (#26673) 2024-01-23 10:09:54 +08:00
4d97f8ea75 [enhance](function) support two special format for str_to_date (#29823) 2024-01-12 12:00:32 +08:00
Pxl
3cf95d0fdf [Improvement](execute) optimize for ColumnNullable's serialize_vec/deserialize_vec (#28788)
optimize for ColumnNullable's serialize_vec/deserialize_vec
2024-01-12 11:59:52 +08:00
0d691c638b [Feature](profile)Support report runtime workload statistics #29591 2024-01-12 11:59:27 +08:00
fc4ca712ed [bugfix](core) using weak ptr in data stream receiver to avoid runtime state is deconstructed (#29410) 2024-01-12 11:48:39 +08:00
7287c0ca15 [Opt](exec)(multi-catalog) Opt date type reading. (#29571) 2024-01-12 11:48:39 +08:00