Commit Graph

278 Commits

Author SHA1 Message Date
946d28646a [fix](outfile)Fixed orcOutputStream.close() throwing an exception during destruction causing the program to hang. (#34254)
bp #34243
2024-04-28 19:54:34 +08:00
30a68c1240 [fix](spill) use different algorithm to avoid partition data skew (#34162) 2024-04-27 11:20:36 +08:00
60e20a3afe [fix](pipeline_x) Crc32HashPartitioner should use ShuffleChannelIds (#34147) 2024-04-26 15:03:11 +08:00
25358564ca [Fix](compile) Fix gcc compile on master (#33864)
This is imported by #33511. wrongly used

ColumnStr<T> ();

which violate C++20 standard(see https://wg21.cmeerw.net/cwg/issue2237) but still supported by clang up until now(see llvm/llvm-project#58112)
2024-04-19 23:41:37 +08:00
657a29fd9e [refactor](partitioner) refine get channel id logics (#33765) 2024-04-18 19:05:24 +08:00
4863167f90 [refactor](pipelineX) Reduce prepare overhead (PART I) (#33550) 2024-04-17 23:42:12 +08:00
Pxl
341cb40693 [Chore](log) adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished (#33652)
adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished
2024-04-17 23:42:12 +08:00
48880c3e1a [Fix](timezone) fix miss of expected rounding of Date type with timezone #33553 2024-04-17 23:42:11 +08:00
3c9c6c18a8 [Enhancement](hive-writer) Write only regular fields to file in the hive-writer. (#33000) 2024-04-12 10:29:08 +08:00
Pxl
3081fc584d [Improvement](runtime-filter) support sync join node build side's size to init bloom runtime filter (#32180)
support sync join node build side's size to init bloom runtime filter
2024-04-11 09:31:50 +08:00
4963d60a07 [Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721) (#33446)
backport #32721.
2024-04-10 11:42:22 +08:00
59aa923bce [bug](function) fix milliseconds_diff function return wrong result (#32897)
* [bug](function) fix milliseconds_diff function return wrong result
2024-04-10 11:34:30 +08:00
2a0644f442 [Fix](function) Fix unix_timestamp core for string input (#32871) 2024-04-09 12:48:35 +08:00
d7a3ff1ddf [Fix](Outfile) Fix the column type mapping in the orc/parquet file format (#32281)
| Doris Type             | Orc Type                     |  Parquet Type                |
|---------------------|--------------------|------------------------|
| Date                            | Long (logical: DATE)                 |       int32 (Logical: Date)                                        |
| DateTime                    | TIMESTAMP (logical: TIMESTAMP)    |       int96                          |
2024-03-22 08:52:16 +08:00
0990014e94 [fix](datetime) fix datetime rounding on BE (#32075) 2024-03-21 14:07:19 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
20d6698c27 [bugfix](arm compile) could not compile on arm because -Werror=maybe-uninitialized 2024-03-14 12:11:25 +08:00
0159a75ced [bugfix](becore) be will core when stop because the map is modified during iterator (#32105)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-03-12 18:50:26 +08:00
4268634115 [fix](memory) Fix Allocator cancel pipelinex query #32048 2024-03-12 14:20:18 +08:00
68a5319da3 [fix](pipelineX) _local_channel_dependency is null in non pipelineX (#32054) 2024-03-12 14:19:04 +08:00
c0f2d0188b [feature](pipelineX) add mem control in local exchange sink (#31982) 2024-03-12 14:17:48 +08:00
808563470f [pipelineX](debug) Refactor code and complete debug string (#31733) 2024-03-06 13:07:49 +08:00
3451cd6c23 [fix](datetime) fix hour 24 on be (#31304) 2024-02-26 19:07:10 +08:00
52b9af06fb [pipelineX](refactor) Delete subclasses inherited from Dependency (#31216) 2024-02-22 13:01:48 +08:00
49dd411f87 [fix](datetime) fix datetime round on BE (#31205)
with tmp as (
            select CONCAT(
                YEAR('2024-02-06 03:37:07.157'), '-', 
                LPAD(MONTH('2024-02-06 03:37:07.157'), 2, '0'), '-',
                LPAD(DAY('2024-02-06 03:37:07.157'), 2, '0'), ' ',
                LPAD(HOUR('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(MINUTE('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(SECOND('2024-02-06 03:37:07.157'), 2, '0'), '.', "123456789" )
            AS generated_string)
            select generated_string, cast(generated_string as DateTime(6)) from tmp
before (incorrect round)

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123456              |
+-------------------------------+-----------------------------------------+
after (round up, keep consistent with mysql):

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123457              |
+-------------------------------+-----------------------------------------+
1 row in set (0.03 sec)
same work with #30744 but implemented on BE
2024-02-21 19:18:45 +08:00
f65844fae4 [Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533)
The UTF8 format of the Windows system has BOM. 

We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file.

**Usage:**
```sql
-- outfile:
select * from demo.student
into outfile "file:///xxx/export/exp_"
format as csv
properties(
    "column_separator" = ",",
    "with_bom" = "true"
);

-- Export:
EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_"
PROPERTIES(
    "format" = "csv",
    "with_bom" = "true"
);
```
2024-02-16 10:16:40 +08:00
be31b8dc61 [Refactor](exchange) remove unless code in exchange and opt some code (#30813) 2024-02-05 21:59:52 +08:00
8ff8d94697 [fix](ip) change IPv6 to little-endian byte order storage (like IPv4) (#30730) 2024-02-05 21:56:57 +08:00
3315c16383 [enhance](function) refactor from_format_str and support more format (#30452) 2024-02-01 19:08:37 +08:00
713798d549 [feature](nereids)support mark join (#30133)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
2024-01-27 09:09:53 +08:00
24ed3e4103 [Fix](Expr&code-style) check prepare&open before every VExpr execute (#26673) 2024-01-23 10:09:54 +08:00
4d97f8ea75 [enhance](function) support two special format for str_to_date (#29823) 2024-01-12 12:00:32 +08:00
Pxl
3cf95d0fdf [Improvement](execute) optimize for ColumnNullable's serialize_vec/deserialize_vec (#28788)
optimize for ColumnNullable's serialize_vec/deserialize_vec
2024-01-12 11:59:52 +08:00
0d691c638b [Feature](profile)Support report runtime workload statistics #29591 2024-01-12 11:59:27 +08:00
fc4ca712ed [bugfix](core) using weak ptr in data stream receiver to avoid runtime state is deconstructed (#29410) 2024-01-12 11:48:39 +08:00
7287c0ca15 [Opt](exec)(multi-catalog) Opt date type reading. (#29571) 2024-01-12 11:48:39 +08:00
be56bf06cf [feature](function) support ip function named is_ip_address_in_range(addr, cidr) (#29681) 2024-01-12 11:44:21 +08:00
767de7afe8 Revert "[feature](pipelineX) control exchange sink by memory usage (#28814)" (#29652)
This reverts commit e326ebb63e4e07d8ee6595561ab19dc5d411f592.
2024-01-08 21:48:51 +08:00
eb4c389b0b [feature](function) support ip functions isipv4string and isipv6string (#28556) 2024-01-07 13:03:11 +08:00
f54f79515c [Bug](fix) str_to_date "" should be null (#29402) 2024-01-03 08:25:22 +08:00
3dc3e81734 [Improvement](datatype) Update Parser for IPv4/v6 data types (#29044)
Transforming from parsing std:: string to parsing char * to accelerate the parsing of ipv4/v6 data types.
2023-12-28 11:00:38 +08:00
6d26aca4ca [fix](pipeline) sort_merge should throw exception in has_next_block if got failed status (#29076)
Test in regression-test/suites/datatype_p0/decimalv3/test_decimalv3_overflow.groovy::249 sometimes failed when there are multiple BEs and FE process report status slowly for some reason.

explain select k1, k2, k1 * k2 from test_decimal128_overflow2 order by 1,2,3
--------------

+----------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                                            |
|   OUTPUT EXPRS:                                                                                                            |
|     k1[#5]                                                                                                                 |
|     k2[#6]                                                                                                                 |
|     (k1 * k2)[#7]                                                                                                          |
|   PARTITION: UNPARTITIONED                                                                                                 |
|                                                                                                                            |
|   HAS_COLO_PLAN_NODE: false                                                                                                |
|                                                                                                                            |
|   VRESULT SINK                                                                                                             |
|      MYSQL_PROTOCAL                                                                                                        |
|                                                                                                                            |
|   111:VMERGING-EXCHANGE                                                                                                    |
|      offset: 0                                                                                                             |
|                                                                                                                            |
| PLAN FRAGMENT 1                                                                                                            |
|                                                                                                                            |
|   PARTITION: HASH_PARTITIONED: k1[#0], k2[#1]                                                                              |
|                                                                                                                            |
|   HAS_COLO_PLAN_NODE: false                                                                                                |
|                                                                                                                            |
|   STREAM DATA SINK                                                                                                         |
|     EXCHANGE ID: 111                                                                                                       |
|     UNPARTITIONED                                                                                                          |
|                                                                                                                            |
|   108:VSORT                                                                                                                |
|   |  order by: k1[#5] ASC, k2[#6] ASC, (k1 * k2)[#7] ASC                                                                   |
|   |  offset: 0                                                                                                             |
|   |                                                                                                                        |
|   102:VOlapScanNode                                                                                                        |
|      TABLE: regression_test_datatype_p0_decimalv3.test_decimal128_overflow2(test_decimal128_overflow2), PREAGGREGATION: ON |
|      partitions=1/1 (test_decimal128_overflow2), tablets=8/8, tabletList=22841,22843,22845 ...                             |
|      cardinality=6, avgRowSize=0.0, numNodes=1                                                                             |
|      pushAggOp=NONE                                                                                                        |
|      projections: k1[#0], k2[#1], (k1[#0] * k2[#1])                                                                        |
|      project output tuple id: 1                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------+
36 rows in set (0.03 sec)
Why failed:

Multiple BEs
Fragments 0 and 1 are MUST on different BEs
Pipeline task of VOlapScanNode which executes k1*k2 failed sets query status to cancelled
Pipeline task of VSort call try close, send Cancelled status to VMergeExchange
sort_curso did not throw exception when it meets error
2023-12-27 10:06:01 +08:00
7081139bdc [fix](block) fix be core while mutable block merge may cause different row size between columns in origin block (#27943) 2023-12-25 20:35:22 +08:00
e326ebb63e [feature](pipelineX) control exchange sink by memory usage (#28814) 2023-12-25 10:31:50 +08:00
0b9b1be1f1 [fix](function) Fix from_second functions overflow and wrong result (#28685) 2023-12-22 10:22:49 +08:00
e8d0569d8b [refine](pipelineX)Make the 'set ready' logic of SenderQueue in pipelineX the same as that in the pipeline (#28488) 2023-12-20 19:26:00 +08:00
c00dca70e6 [pipelineX](local shuffle) Support parallel execution despite of tablet number (#28266) 2023-12-14 12:53:54 +08:00
78b0fec33a [Fix](Outfile) Support export nested complex type data to orc file format (#28182) 2023-12-13 11:55:27 +08:00
ea275e687a [pipelineX](minor) remove unused code (#28016) 2023-12-05 19:41:40 +08:00
10483ea12c [fix](profile) fix error set with peak_memory_usage in pipeline #27749 2023-12-02 14:12:38 +08:00