Commit Graph

253 Commits

Author SHA1 Message Date
f65844fae4 [Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533)
The UTF8 format of the Windows system has BOM. 

We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file.

**Usage:**
```sql
-- outfile:
select * from demo.student
into outfile "file:///xxx/export/exp_"
format as csv
properties(
    "column_separator" = ",",
    "with_bom" = "true"
);

-- Export:
EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_"
PROPERTIES(
    "format" = "csv",
    "with_bom" = "true"
);
```
2024-02-16 10:16:40 +08:00
be31b8dc61 [Refactor](exchange) remove unless code in exchange and opt some code (#30813) 2024-02-05 21:59:52 +08:00
8ff8d94697 [fix](ip) change IPv6 to little-endian byte order storage (like IPv4) (#30730) 2024-02-05 21:56:57 +08:00
3315c16383 [enhance](function) refactor from_format_str and support more format (#30452) 2024-02-01 19:08:37 +08:00
713798d549 [feature](nereids)support mark join (#30133)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
2024-01-27 09:09:53 +08:00
24ed3e4103 [Fix](Expr&code-style) check prepare&open before every VExpr execute (#26673) 2024-01-23 10:09:54 +08:00
4d97f8ea75 [enhance](function) support two special format for str_to_date (#29823) 2024-01-12 12:00:32 +08:00
Pxl
3cf95d0fdf [Improvement](execute) optimize for ColumnNullable's serialize_vec/deserialize_vec (#28788)
optimize for ColumnNullable's serialize_vec/deserialize_vec
2024-01-12 11:59:52 +08:00
0d691c638b [Feature](profile)Support report runtime workload statistics #29591 2024-01-12 11:59:27 +08:00
fc4ca712ed [bugfix](core) using weak ptr in data stream receiver to avoid runtime state is deconstructed (#29410) 2024-01-12 11:48:39 +08:00
7287c0ca15 [Opt](exec)(multi-catalog) Opt date type reading. (#29571) 2024-01-12 11:48:39 +08:00
be56bf06cf [feature](function) support ip function named is_ip_address_in_range(addr, cidr) (#29681) 2024-01-12 11:44:21 +08:00
767de7afe8 Revert "[feature](pipelineX) control exchange sink by memory usage (#28814)" (#29652)
This reverts commit e326ebb63e4e07d8ee6595561ab19dc5d411f592.
2024-01-08 21:48:51 +08:00
eb4c389b0b [feature](function) support ip functions isipv4string and isipv6string (#28556) 2024-01-07 13:03:11 +08:00
f54f79515c [Bug](fix) str_to_date "" should be null (#29402) 2024-01-03 08:25:22 +08:00
3dc3e81734 [Improvement](datatype) Update Parser for IPv4/v6 data types (#29044)
Transforming from parsing std:: string to parsing char * to accelerate the parsing of ipv4/v6 data types.
2023-12-28 11:00:38 +08:00
6d26aca4ca [fix](pipeline) sort_merge should throw exception in has_next_block if got failed status (#29076)
Test in regression-test/suites/datatype_p0/decimalv3/test_decimalv3_overflow.groovy::249 sometimes failed when there are multiple BEs and FE process report status slowly for some reason.

explain select k1, k2, k1 * k2 from test_decimal128_overflow2 order by 1,2,3
--------------

+----------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                                            |
|   OUTPUT EXPRS:                                                                                                            |
|     k1[#5]                                                                                                                 |
|     k2[#6]                                                                                                                 |
|     (k1 * k2)[#7]                                                                                                          |
|   PARTITION: UNPARTITIONED                                                                                                 |
|                                                                                                                            |
|   HAS_COLO_PLAN_NODE: false                                                                                                |
|                                                                                                                            |
|   VRESULT SINK                                                                                                             |
|      MYSQL_PROTOCAL                                                                                                        |
|                                                                                                                            |
|   111:VMERGING-EXCHANGE                                                                                                    |
|      offset: 0                                                                                                             |
|                                                                                                                            |
| PLAN FRAGMENT 1                                                                                                            |
|                                                                                                                            |
|   PARTITION: HASH_PARTITIONED: k1[#0], k2[#1]                                                                              |
|                                                                                                                            |
|   HAS_COLO_PLAN_NODE: false                                                                                                |
|                                                                                                                            |
|   STREAM DATA SINK                                                                                                         |
|     EXCHANGE ID: 111                                                                                                       |
|     UNPARTITIONED                                                                                                          |
|                                                                                                                            |
|   108:VSORT                                                                                                                |
|   |  order by: k1[#5] ASC, k2[#6] ASC, (k1 * k2)[#7] ASC                                                                   |
|   |  offset: 0                                                                                                             |
|   |                                                                                                                        |
|   102:VOlapScanNode                                                                                                        |
|      TABLE: regression_test_datatype_p0_decimalv3.test_decimal128_overflow2(test_decimal128_overflow2), PREAGGREGATION: ON |
|      partitions=1/1 (test_decimal128_overflow2), tablets=8/8, tabletList=22841,22843,22845 ...                             |
|      cardinality=6, avgRowSize=0.0, numNodes=1                                                                             |
|      pushAggOp=NONE                                                                                                        |
|      projections: k1[#0], k2[#1], (k1[#0] * k2[#1])                                                                        |
|      project output tuple id: 1                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------+
36 rows in set (0.03 sec)
Why failed:

Multiple BEs
Fragments 0 and 1 are MUST on different BEs
Pipeline task of VOlapScanNode which executes k1*k2 failed sets query status to cancelled
Pipeline task of VSort call try close, send Cancelled status to VMergeExchange
sort_curso did not throw exception when it meets error
2023-12-27 10:06:01 +08:00
7081139bdc [fix](block) fix be core while mutable block merge may cause different row size between columns in origin block (#27943) 2023-12-25 20:35:22 +08:00
e326ebb63e [feature](pipelineX) control exchange sink by memory usage (#28814) 2023-12-25 10:31:50 +08:00
0b9b1be1f1 [fix](function) Fix from_second functions overflow and wrong result (#28685) 2023-12-22 10:22:49 +08:00
e8d0569d8b [refine](pipelineX)Make the 'set ready' logic of SenderQueue in pipelineX the same as that in the pipeline (#28488) 2023-12-20 19:26:00 +08:00
c00dca70e6 [pipelineX](local shuffle) Support parallel execution despite of tablet number (#28266) 2023-12-14 12:53:54 +08:00
78b0fec33a [Fix](Outfile) Support export nested complex type data to orc file format (#28182) 2023-12-13 11:55:27 +08:00
ea275e687a [pipelineX](minor) remove unused code (#28016) 2023-12-05 19:41:40 +08:00
10483ea12c [fix](profile) fix error set with peak_memory_usage in pipeline #27749 2023-12-02 14:12:38 +08:00
Pxl
d969047b50 [Refactor](join) refactor of hash join (#27557)
Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table

Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: BiteTheDDDDt <pxl290@qq.com>
2023-11-28 19:46:00 +08:00
f565f60bc3 [refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587) 2023-11-28 13:02:30 +08:00
ea7eca9345 [pipelineX](bug) Add some logs (#27596) 2023-11-28 10:02:13 +08:00
baadc14e60 [Enhancement](function) support unix_timestamp with float (#26827)
---------

Co-authored-by: YangWithU <plzw8@outlook.com>
2023-11-27 09:58:53 +08:00
1b3512d942 [pipelineX](bug) Fix cancel timeout (#27396) 2023-11-22 22:31:34 +08:00
5442e8d1fc [pipelineX](dependency) split different dependencies (#27366) 2023-11-22 12:50:39 +08:00
b1eef30b49 [pipelineX](dependency) Wake up task by dependencies (#26879)
---------

Co-authored-by: Mryange <2319153948@qq.com>
2023-11-18 03:20:24 +08:00
d988193d39 [pipelineX](shuffle) block exchange sink by memory usage (#26595) 2023-11-09 21:28:22 +08:00
5d80e7dc2f [Improvement](pipelineX) Improve local exchange on pipelineX engine (#26464) 2023-11-07 22:11:44 +08:00
99b45e1938 [fix](Outfile) Export DateTimev2 type of doris to ORC's TimeStamp type (#25470)
Previously,doris's `DateTimev2` was exported to orc as a `String` type.
Now, export doris's `DateTimev2` to orc timestamp type.
2023-10-29 15:59:38 +08:00
c1d64a7128 [Feature](datatype) Add IPv4/v6 data type for doris (#24965) 2023-10-26 17:33:28 +08:00
d6c64d305f [chore](log) Add log to trace query execution #25739 2023-10-26 14:09:25 +08:00
31d2a9a4f5 [Enhancement](function) support fractions for convert_tz(datetimev2) (#25915)
mysql> select convert_tz('2019-08-01 01:01:02.123' , '+00:00', '+07:00');
+----------------------------------------------------------------------------------+
| convert_tz(cast('2019-08-01 01:01:02.123' as DATETIMEV2(3)), '+00:00', '+07:00') |
+----------------------------------------------------------------------------------+
| 2019-08-01 08:01:02.123                                                          |
+----------------------------------------------------------------------------------+
1 row in set (0.18 sec)
2023-10-26 10:46:47 +08:00
7f66be84d5 [fix](Outfile) Infer the column name if the column is expression in select into outfile (#25854)
This pr do two things:
1. Infer the column name if the column is expression in `select into outfile`. The rule for column name generation can be refered in pr: #24990 
2. fix bug that it will core dump if the `_schema` fails to build in the open phase in vorc_transformer.cpp


TODO:
1. Support infer the column name if the column is expression in `select into outfile` in new optimizer(Nereids).
2023-10-25 22:49:04 +08:00
e8f479882d [pipelineX](local exchange) Add local exchange operator (#25846) 2023-10-25 18:45:02 +08:00
6b2eed779c [feature](AuditLog) add scanRows scanBytes in auditlog (#25435) 2023-10-25 10:00:35 +08:00
08832d9f3a [Fix](exec) Fix date dict dead loop. (#25570) 2023-10-24 02:51:43 +08:00
Pxl
642c149e6a remove datetime_value and move vecdatetime_value to doris namespace (#25695)
remove datetime_value and move vecdatetime_value to doris namespace
2023-10-20 22:08:17 +08:00
dc47087560 [fix](function) fix str_to_date default return type scale for nereids (#24932)
fix str_to_date default return type scale for nereids
2023-10-20 12:55:49 +08:00
b964ab76b3 [refactor](shuffle) Simplify hash partitioning strategy (#25596) 2023-10-19 19:28:22 +08:00
e77b98be88 [fix](months_diff) fix wrong result of months_diff (#25577) 2023-10-19 14:29:47 +08:00
08f305dd79 [chore](build) Fix compilation errors reported by GCC-13 (#25439)
1. Fix lots of compilation errors reported by GCC-13.
2. Fix the workflow BE UT (macOS).
2023-10-15 07:57:36 -05:00
37dbda6209 [pipelineX](refactor) Use class template to simplify join (#25369) 2023-10-13 16:51:55 +08:00
6f9a084d99 [Fix](Outfile) Use data_type_serde to export data to parquet file format (#24998) 2023-10-13 13:58:34 +08:00
f960b8c989 [bugfix](stream receiver) be will core during stop because receiver is not closed (#25298)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-10-11 19:49:40 +08:00