Commit Graph

8594 Commits

Author SHA1 Message Date
646ba2cc88 [bugfix](scannode) 1. make rows_read correct 2. use single scanner if has limit clause (#16473)
make rows_read correct so that the scheduler could using this correctly.
use single scanner if has limit clause. Move it from fragment context to scannode.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-02-09 14:12:18 +08:00
21cdbec982 [fix](docs) fix some errors in docs (#16546)
Co-authored-by: hechao <hechao@selectdb.com>
2023-02-09 13:50:42 +08:00
338277b748 [doc](flink-connector) Update the flink connector docs to the latest (#14856) 2023-02-09 12:48:59 +08:00
d52fab6316 [typo](docs)modified some text errors (#16544)
Co-authored-by: wangtao <wangtao01@tianyancha.com>
2023-02-09 11:59:49 +08:00
0142ef8b95 [improvement](scanner) Supports bthread scanner (#16031) 2023-02-09 10:24:56 +08:00
531616b8ee [Fix](bucket)fix partition with no history data && AutoBucketUtilsTest (#16516)
fix partition with no history data && AutoBucketUtilsTest (#16515)
2023-02-09 10:17:25 +08:00
9f8753ffd2 [bugfix](vertical_compaction) fix base_compaction delete_sign handler (#16469)
In vertical base compaction, same rows will be filtered in vertical_merge_iterator,
we should skip these filtered rows when set agg flag of delete sign.
For example, schema is a,b,delete_sign, and data is
1,1,1
1,1,0
1,1,0
2,2,1
2,2
and Block we get in VerticalBlockReader is
1,1,1
2,2,1
and we should set agg flag idex 0,4 to true when handle delete sign, so
we add a function continuous_agg_count to skip same rows filtered in
VerticalMergeIterator.
2023-02-09 10:13:41 +08:00
e1f1386395 [fix](cooldown) Rewrite update cooldown conf (#16488)
Remove error-prone CooldownJob, and use CooldownConfHandler to update Tablet's cooldown conf.
Some bug fix about cooldown.
2023-02-09 09:12:55 +08:00
e6b0d94459 [enhancement][docs] add docs for newly added two compaction method (#16529) (#16530)
Co-authored-by: yixiutt <102007456+yixiutt@users.noreply.github.com>
Co-authored-by: zhengyu <freeman.zhang1992@gmail.com>

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-02-09 09:07:33 +08:00
2d7a9c9c11 add the batch interval time of sink in spark connector doc (#16501) 2023-02-09 08:39:30 +08:00
d1c6b81140 [Bug](log) add some log to find out bug (#16518) 2023-02-08 21:23:02 +08:00
f0b0eedbc5 [fix](planner)group_concat lost order by info in second phase merge agg (#16479) 2023-02-08 20:48:52 +08:00
a512469537 [fix](planner) cannot process more than one subquery in disjunct (#16506)
before this PR, Doris cannot process sql like that
```sql
CREATE TABLE `test_sq_dj1` (
    `c1` int(11) NULL,
    `c2` int(11) NULL,
    `c3` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "in_memory" = "false",
    "storage_format" = "V2",
    "disable_auto_compaction" = "false"
);

CREATE TABLE `test_sq_dj2` (
    `c1` int(11) NULL,
    `c2` int(11) NULL,
    `c3` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "in_memory" = "false",
    "storage_format" = "V2",
    "disable_auto_compaction" = "false"
);

insert into test_sq_dj1 values(1, 2, 3), (10, 20, 30), (100, 200, 300);
insert into test_sq_dj2 values(10, 20, 30);

-- core
SELECT * FROM test_sq_dj1 WHERE c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 < 10;

-- invalid slot
SELECT * FROM test_sq_dj1 WHERE c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 IN (SELECT c2 FROM test_sq_dj2) OR c1 < 10;
```

there are two problems:
1. we should remove redundant sub-query in one conjuncts to avoid generate useless join node
2. when we have more than one sub-query in one disjunct. we should put the conjunct contains the disjunct at the top node of the set of mark join nodes. And pop up the mark slot to the top node.
2023-02-08 18:46:06 +08:00
bb334de00f [enhancement](load) Change transaction limit from global level to db level (#15830)
Add transaction size quota for database

Co-authored-by: wuhangze <wuhangze@jd.com>
2023-02-08 18:04:26 +08:00
f71fc3291f [Bug](fix) right anti join error result when batch size is low (#16510) 2023-02-08 17:26:19 +08:00
666f7096f2 [Fix](multi catalog)(planner) Fix external table statistic collection bug (#16486)
Add index id to column statistic id. Refresh statistic cache after analyze.
2023-02-08 16:51:30 +08:00
b06e6b25c9 [improvement](fuzzy) print fuzzy session variable in FE audit log (#16493)
* [improvement](fuzzy) print fuzzy session variable in FE audit log
2023-02-08 16:38:04 +08:00
d956cb13af [Bug](point query) Reusable in PointQueryExecutor should call init before add to LookupCache (#16489)
Otherwise in high concurrent query, _block_pool maybe used before Reusable::init done in other threads
2023-02-08 16:05:59 +08:00
e11437d1fe [fix](planner) npe in RewriteBinaryPredicatesRule (#16401)
RewriteBinaryPredicatesRule rewrite expression like 
`cast(A decimal) > decimal` to `A > some_other_bigint`
in order to:
1. push down the rewrite predicate 
2. avoid convert column A to decimal

We get the datatype of `A` by `expr0.getSrcSlotRef().getColumn().getType()`.
However, when A is result of a function from sub-query, this rule is not applicable.
For example:
```
select * 
from (
       select TIMESTAMPDIFF(MINUTE,startTime,endTime) AS timediff 
        from CNC_SliceSate) T  
where timediff > 5.0;
```
we cannot push predicate down to OlapScan(CNC_SliceSate) to save effort.
2023-02-08 15:57:35 +08:00
f6a20f844b [fix](hashjoin) join produce blocks with rows larger than batch size: handle join with other conjuncts (#16402) 2023-02-08 14:26:35 +08:00
2883f67042 [fix](iceberg) update iceberg docs and add credential properties (#16429)
Update iceberg docs
Add new s3 credential and properties
2023-02-08 13:53:01 +08:00
41947c73eb [Feature](array-function) Support array functions for nested type datev2 and datetimev2 (#16382) 2023-02-08 12:51:07 +08:00
98c741d664 [fix](Nereids): FilterOrSelf shouldn't And all predicates.. (#16491) 2023-02-08 12:42:22 +08:00
2fd7833a12 [fix](doc): fix typo of tpch.md (#16229) 2023-02-08 12:01:21 +08:00
583001bd92 [Bug](share hash table) Support shared hash table on Nereids (#16474) 2023-02-08 11:51:27 +08:00
713c11b42b [typo](docs) Fix some errors in the description (#16452) 2023-02-08 11:47:39 +08:00
2350ef1a64 Modify thrift_rpc_timeout_ms default value documentation (#16464) 2023-02-08 11:35:38 +08:00
afdaf2d70e [Doc](Jdbc Catlog) JDBC Catalog support Insert operation (#16454) 2023-02-08 10:59:20 +08:00
254790c564 [fix](nereids) FE nereids use DateV2Literal instead of 'cast datev2' (#16386)
BE already support DateV2Literal, and hence, remove code in FE which convert DateV2Literal to Cast datev2
2023-02-08 10:51:35 +08:00
81dbed70c2 [fix](Nereids) back off on tpch p1 (#16478)
adjust nullable on empty set should apply after unnested sub-query
some function should propagate nullable when args are datev2 or datetimev2
add back tpch sf0.1 nereids regression test
2023-02-08 10:43:13 +08:00
289a4b2ea4 [fix](func) fix truncate float type result error (#16468)
When the argument of truncate function is float type, it can match both truncate(DECIMALV3) and truncate(DOUBLE), if the match is truncate(DECIMALV3), the precision is lost when converting float to DECIMALV3(38, 0).

Here I modify it to match truncate(DOUBLE) for now, maybe we still need to solve the problem of losing precision when converting float to DECIMALV3.
2023-02-08 08:57:43 +08:00
cf18de14b5 [fix](writer) add _is_closed state to DeltaWriter and avoid write/close core after close (#16453) 2023-02-07 22:40:26 +08:00
91325e5ca3 [fix](pipeline) incorrect result when disabling sharing hash table (#16476) 2023-02-07 21:25:32 +08:00
a4c28e6efa [Fix](Nereids) runtime filter cannot generate when expression is cast. (#16120) 2023-02-07 20:28:07 +08:00
f90d844a53 [improvement](compaction) enable compaction in TABLET_NOTREADY (#16470)
If alter task in queue, compaction is not enabled and may cause too much version.
Keep last 10 version in new tablet so that base tablet's max version will
not be merged and than we can copy data from base tablet to new tablet.
2023-02-07 19:58:23 +08:00
1d0fdff98a [Bug](sort) disable 2phase read for sort by expressions exclude slotref (#16460)
```
create table tbl1 (k1 varchar(100), k2 string) distributed by hash(k1) buckets 1 properties("replication_num" = "1");

insert into tbl1 values(1, "alice");

select cast(k1 as INT) as id from tbl1 order by id limit 2;
```

The above query could pass `checkEnableTwoPhaseRead` since the order by element is SlotRef but actually it's an function call expr
2023-02-07 19:42:54 +08:00
9114896178 [DecimalV3](opt) opt the function of decimalv3 to_string logic (#16427) 2023-02-07 13:28:07 +08:00
796d51ae2e [enhance](fuzzy)set rewriteOrToInPredicateThreshold=2/10000 in fuzzy mode (#16456)
* set rewriteOrToInPredicateThreshold=2/10000 in fuzzy mod

* fmt
2023-02-07 12:45:27 +08:00
d390e63a03 [enhancement](stream receiver) make stream receiver exception safe (#16412)
make stream receiver exception safe
change get_block(block**) to get_block(block* , bool* eos) unify stream semantic
2023-02-07 12:44:20 +08:00
6fdd35a6f2 [enhancement](mpp process) remove unused method and make report process more clear (#16441)
both update status and open_vectorized_internal will call send_report and stop report thread. move update_status code to open method and remove unnecessary send_report and stop_report_thread.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-07 12:28:55 +08:00
bed1ab7c19 [Feature](Nereids) Add hint to enable pre-aggregation when scan OLAP table. (#15614)
This pr added support for the pre-aggregation hint. Users could use /*+PREAGGOPEN*/ to enable pre-preaggregation for OLAP table.
For example:
Let's say we have an aggregate-keys table t (k1 int, k2 int, v1 int sum, v2 int sum). Pre-aggregation could be enabled by query with a hint: select k1, v1 from t /*+PREAGGOPEN*/.
2023-02-07 11:59:10 +08:00
27216dc7e0 [improvement](multi-catalog) push down all predicates into rowgroup/page filtering for ParquetReader (#16388)
Tow improvements:
1. Refactor rowgroup&page filtering in `ParquetReader`, and use the operator overloading of Doris native c++ type to process comparison.
2. Support decimal/decimal v3/date/datev2/datetime/datetimev2
2023-02-07 11:32:57 +08:00
0b8c6315fb [fix](broker load) Fix hll_hash(null) in broker load report incorrect Exception (#16293)
Co-authored-by: wuhangze <wuhangze@jd.com>
2023-02-07 11:32:20 +08:00
91229bb87d [Bug](makr join) Fix mark join with other conjuncts (#16435) 2023-02-07 09:31:41 +08:00
a13beca0de [Fix](load)Use lower case for load column names. #16422
The columns name in stream load and broker load are case sensitive, make it case insensitive. This would be consist with query, because query sql columns name are case insensitve.
2023-02-07 09:18:37 +08:00
dcbcec0775 [regression](fuzzy)fuzzy enable_fold_constant_by_be (#16448)
* [fuzzy](test) fuzzy some session variables stably according to pull_request_id

* fuzzy enable_fold_constant_by_be

---------

Co-authored-by: stephen <hello_stephen@@qq.com>
2023-02-07 09:17:50 +08:00
3334e3f393 [fix](restore) do not set default replication_allocation when restore with property reserve_replica = true (#15562)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-02-06 22:38:03 +08:00
0744aeb201 [fix](docs) fix the 404 bad link of website doc (#16284) 2023-02-06 18:56:07 +08:00
36a5e0a2a9 [bugfix](array) fix element revert on error in DataTypeArray::from_string (#16434)
* fix array from_string element revert on error

* add testcase
2023-02-06 18:27:36 +08:00
2bee26b05a [fix](merge-on-write) fix that the query result has duplicate keys (#16336)
* [fix](merge-on-write) fix that the query result has duplicate keys

* add ut
2023-02-06 17:09:53 +08:00