Commit Graph

4107 Commits

Author SHA1 Message Date
d4c2f70673 [fix](regression_test) failed in distinct_streaming_agg (#34404) 2024-05-06 10:11:39 +08:00
7248420cfd [chore](session_variable) Add 'data_queue_max_blocks' to prevent the DataQueue from occupying too much memory. (#34017) (#34395) 2024-05-05 21:20:33 +08:00
8da260ee0d [fix](hdfs)read 'fs.defaultFS' from core-site.xml for hdfs load which has no default fs (#34217) (#34372)
bp #34217
Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
2024-05-01 00:31:49 +08:00
35f8563a75 [feature](iceberg) support iceberg equality delete (#34223) (#34327)
bp #34223

Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
2024-04-30 11:51:29 +08:00
75470ede1a [fix](test) Fix some testcases #34203 2024-04-30 08:35:03 +08:00
b15fc2a906 [Cherry-pick](branch-2.1) Pick #34043 and #34112 (#34318)
* [Enhancement](full compaction) Add run status support for full compaction (#34043)

* The usage is `curl http://{ip}:{host}/api/compaction/run_status?tablet_id={tablet_id}`
e.g. `curl http://127.0.0.1:8040/api/compaction/run_status?tablet_id=10084`

If full compaction is running, the output will be
```
{
"status" : "Success",
"run_status" : true,
"msg" : "compaction task for this tablet is running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```
else the ouput will be
```
{
"status" : "Success",
"run_status" : false,
"msg" : "compaction task for this tablet is not running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```

* 2

* 2

* [Fix](partial update) Fix rowset not found error when doing partial update (#34112)

Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown.

Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.
2024-04-30 07:26:23 +08:00
996222c30e [fix](test) let test_ntile_function happy with Nereids (#34294) 2024-04-29 20:59:48 +08:00
7cb00a8e54 [Feature](hive-writer) Implements s3 file committer. (#34307)
Backport #33937.
2024-04-29 19:56:49 +08:00
1bfe0f0393 [feature](iceberg)support read iceberg complex type,iceberg.orc format and position delete. (#33935) (#34256)
master #33935
2024-04-29 14:40:12 +08:00
20bd0c2987 [FIX](cases )fix ipv6 value for regress case 2024-04-29 13:37:29 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00
11039ade7b [opt](paimon) support mapping Paimon column type "Row" to Doris type "Struct" (#34239)
backport: #33786
2024-04-28 19:38:50 +08:00
1fda68f738 [feature](planner) Support select constant from dual syntax sugar (#34200) (#34232)
In MySQL, it's common to use a simplified syntax like `SELECT constant FROM dual`
which is equivalent to just `SELECT constant`.
This syntax is often used by BI tools when utilizing MySQL connectors to verify connection validity.
To enhance compatibility and ensure seamless integration with such tools,
we have now implemented this feature in Doris.

### Key Changes:
- Doris now interprets `SELECT constant FROM dual` as `SELECT constant`, aligning with MySQL's behavior.
- This update ensures that BI tools can use standard MySQL connectors without modifications or errors when connecting to Doris.
2024-04-28 15:56:16 +08:00
45556686ea [fix](test) fix some external test cases (#34209)
Fix some test cases and enable `test_information_schema_external` suite
2024-04-27 23:25:33 +08:00
cd1c9edd71 [fix](pipeline-load) fix no error url when data quality error and total rows is negative (#34072) (#34204)
Co-authored-by: HHoflittlefish777 <77738092+HHoflittlefish777@users.noreply.github.com>
2024-04-27 18:19:08 +08:00
36e80af327 [fix](schema change) fix the defineName field is not the same when copying column (#34201)
* [fix](schema change) fix the defineName field is not the same when copying column

* fix
2024-04-27 11:59:07 +08:00
cf700a62b6 [test](case) fix unstable case without order by distinct row (#34167) 2024-04-27 11:20:36 +08:00
c998e2f714 [Enhancement](planner) Support string input for sql_select_limit (#34177) 2024-04-27 02:29:47 +08:00
414fbd353e [fix](ES catalog)Make col != '' behavior consistent with SQL (#34151)
In SQL syntax, `col != ''` equals `col.length() > 0`.
It means that this column must exist in ES doc fields and its content is not empty.
In this PR, we make a special translation for this binary predicate to keep the behavior of both consistent.

---------

Co-authored-by: Luennng <luennng@gmail.com>
2024-04-27 02:29:33 +08:00
c125148deb [opt](Nereids) bucket shuffle downgrade expansion (#34088)
Expand bucket shuffle downgrade condition, which originally requiring a single partition after pruning, basic table and bucket number < para number. Currently, we expect this option can be used for disabling bucket shuffle more efficiently, without above restrictions.

Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>
2024-04-27 02:29:33 +08:00
0f0c0a266b [opt](parquet)Skip page with offset index (#33082)
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
2024-04-26 15:06:16 +08:00
acc2b532e7 [Test](hive-writer) Adjust test_hive_write_partitions regression test to resolve special characters issue with git on windows. (#34026) 2024-04-26 15:05:47 +08:00
b24ff9953d [fix](Nereids) column pruning should prune map in cte consumer (#34079)
we save bi-map in cte consumer to get the maping between producer and consumer.
the consumer's output is decided by the map in it.
so, cte consumer should be output prunable, and should remove useless entry from map when do column pruning
2024-04-26 15:05:19 +08:00
b41a5339d3 [Fix](nereids) fix rule merge_aggregate when has project (#33892) 2024-04-26 15:05:09 +08:00
a34ed4643a [fix](planner)date_add function should accept date type as its param (#34035) 2024-04-26 15:04:45 +08:00
5adc823b14 [fix](nereids)move ReplaceVariableByLiteral rule to analyze phase (#33997) 2024-04-26 15:04:45 +08:00
b7b87fbb95 [fix](planner)cast expr should do nothing in compactForLiteral method (#34047) 2024-04-26 15:04:45 +08:00
50f9d47e96 [test](hive) run suite cases both in hive2 and hive3 (#33874) (#34156)
bp #33874

Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-04-26 13:48:09 +08:00
55d5ed9ab6 [test](streamload) add load empty file regression test (#34110) 2024-04-26 07:42:09 +08:00
75644392f4 [fix](Nereids) support aggregate function only in having statement (#34086)
SQL like

> SELECT 1 AS c1 FROM t HAVING count(1) > 0 OR c1 IS NOT NULL
2024-04-26 07:41:45 +08:00
a237f7ec6e [feature](Nereids): add equal set in functional dependencies (#33642) 2024-04-26 07:41:45 +08:00
2b4f4ca796 [Fix](nereids) fix cases unstable of hint (#34101)
fix cases unstable of hint, remove unused cases and project nodes and use string contains in order to avoid unstable problem.
2024-04-26 07:41:30 +08:00
af9c885ae4 fix some unstable regression tests 2024-04-26 07:38:40 +08:00
9083bf7e14 revert "[Improvementation](join) empty_block shall be set true when build blo… (#33977)"
This reverts commit e3ed861e4b6a602ea874b6501998578952291f38.
2024-04-25 23:33:11 +08:00
Pxl
e3ed861e4b [Improvementation](join) empty_block shall be set true when build blo… (#33977)
empty_block shall be set true when build block only one row
2024-04-25 15:07:56 +08:00
987f755206 [Fix](nereids) fix rule SimplifyWindowExpression (#34099)
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-04-25 15:07:09 +08:00
789a16ec6b [fix](fe) Fix SHOW CREATE TABLE with AUTO PARTITION (#34071)
AUTO PARTITION grammar has changed since #31585, but the output
of SHOW CREATE TABLE was left out to change, so the result is not
able to be recognized by the FE parser.
2024-04-25 15:05:58 +08:00
47b54d4bd5 Fix remote scan pool (#33976) 2024-04-25 15:04:43 +08:00
450f443413 [fix](decommission) fix cann't decommission mtmv (#33823) 2024-04-25 12:01:44 +08:00
a15a8e119f [fix](mtmv) Fix exception when create materialized view with cte (#33988)
Fix exception when create materialized view with cte, after this fix, can create materialized view with following
```
        CREATE MATERIALIZED VIEW mv_with_cte
            BUILD IMMEDIATE REFRESH AUTO ON MANUAL
            DISTRIBUTED BY RANDOM BUCKETS 2
            PROPERTIES ('replication_num' = '1')
            AS
            with `test_with` AS (
            select l_partkey, l_suppkey
            from lineitem
            union
            select
              ps_partkey, ps_suppkey
            from
            partsupp)
            select * from test_with;
```

this is brought from https://github.com/apache/doris/pull/28144
2024-04-25 12:01:44 +08:00
eaacba644d [fix](auth)can not grant priv to __internal_schema (#34009)
mysql> grant SELECT_PRIV on `_internal_schema`.* to 'test'@'%'; ERROR 1102 (42000): errCode = 2, detailMessage = Incorrect database name '_internal_schema'
2024-04-25 12:01:44 +08:00
ac038b3d4f [fix](auto bucket) Fix auto bucket regression case occasional fail (#34069) 2024-04-25 12:01:44 +08:00
ef73533e27 [Feat](nereids) add transform rule SimplifyWindowExpression (#33647)
rewrite func(para) over (partition by unique_keys)
1. func() is count(non-null) or rank/dense_rank/row_number -> 1
2. func(para) is min/max/sum/avg/first_value/last_value -> para
 e.g
select max(c1) over(partition by pk) from t1;
-> select c1 from t1;
2024-04-25 12:01:44 +08:00
800bb3d4ba [Feat](nereids) add expression rewrite rule LikeToEqualRewrite (#33803)
like expressions without fuzzy matching are rewritten into equivalent expressions
2024-04-25 12:01:44 +08:00
2f996a574f [Feat](nereids) nereids add alter view (#33970)
nereids support alter view stmt.
e.g. ALTER VIEW example_db.example_view
(
c1 COMMENT "column 1",
c2 COMMENT "column 2",
c3 COMMENT "column 3"
)
AS SELECT k1, k2, SUM(v1) FROM example_table
GROUP BY k1, k2
2024-04-25 12:01:44 +08:00
edff4137fe [fix](mtmv) Mv check name (#34016) 2024-04-25 12:01:44 +08:00
cc3decffa4 [bug](test) fix test case failed with fuuzy fold constatnt to false (#34052) 2024-04-24 19:42:08 +08:00
d5275c55b4 [bug](fold) fix fold date/datetime error as null (#33845)
the LocalDateTime/LocalDate value maybe null, so need check it firstly.
if it's null, could return NullLiteral directly.
2024-04-24 19:41:42 +08:00
080c07ad87 [bug](random distribution) fix data loss and incorrect in random distribution table #33962 2024-04-24 17:13:50 +08:00
8d98c71079 [FIX]fix cidr func with const param (#33968) 2024-04-24 17:13:50 +08:00