Commit Graph

8171 Commits

Author SHA1 Message Date
57098bf669 [fix](thirdparty) fix bug for build_thirdparty (#15910) 2023-01-14 21:02:59 +08:00
98c74f9ab8 [improvement](signal) add tid during core dump,the tid is equal to tid in be.INFO (#15893)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-14 18:40:02 +08:00
c88c6b3c41 [chore](build) Fix linkage errors on macOS (arm64) (#15922)
macOS with Apple Silicon chip doesn't support AVX2 instructions, we should build CLucene without AVX2 support.
2023-01-14 18:37:56 +08:00
84d6938a73 [Bug](pipeline) Fix BE crash caused by pipeline (#15890)
* [Bug](pipeline) Fix BE crash caused by pipeline

* update
2023-01-14 18:37:19 +08:00
0a57f12578 [Bug](datev2) Fix bugs for datev2 (#15860)
These bugs are found when I run regression test with enable_date_conversion on
2023-01-14 18:36:36 +08:00
c4475a8dbc [Enhencement](jdbc scanner) add profile for jdbc scanner (#15914) 2023-01-14 10:28:59 +08:00
313e14d220 [Bugfix] (ROLLUP) fix the coredump when add rollup by link schema change (#15654)
Because of the rollup has the same keys and the keys's order is same, BE will do linked schema change. The base tablet's segments will link to the new rollup tablet. But the unique id from the base tablet is starting from 0 and as the rollup tablet also. In this case, the unique id 4 in the base table is column 'city', but in the rollup tablet is 'cost'. It will decode the varcode page to bigint page so that be coredump. It needs to be rejected.

I think that if a rollup add by link schema change, it means this rollup is redundant. It brings no additional revenue and wastes storage space. So It needs to be rejected.
2023-01-14 10:20:07 +08:00
2810029d24 [fix](multi-catalog) fix bug that replay init catalog may happen after catalog is dropped (#15919) 2023-01-14 09:41:37 +08:00
cedbed67be [feature-wip](MTMV) Support table aliases when creating a materialized view with multiple tables (#15849)
## Use Case

mysql> CREATE TABLE t_user (
    ->   event_day DATE,
    ->   id bigint,
    ->   username varchar(20)
    -> )
    -> DISTRIBUTED BY HASH(id) BUCKETS 10
    -> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.07 sec)

mysql> CREATE TABLE t_user_pv(
    ->   event_day DATE,
    ->   id bigint,
    ->   pv bigint
    -> )
    -> DISTRIBUTED BY HASH(id) BUCKETS 10
    -> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.09 sec)

mysql> CREATE MATERIALIZED VIEW mv
    -> BUILD IMMEDIATE REFRESH COMPLETE
    -> START WITH "2022-10-27 19:35:00"
    -> NEXT 1 SECOND
    -> KEY (username)
    -> DISTRIBUTED BY HASH(username) BUCKETS 10
    -> PROPERTIES ('replication_num' = '1')
    -> AS SELECT t1.username ,t2.pv FROM t_user t1 LEFT JOIN t_user_pv t2 on t1.id = t2.id;
Query OK, 0 rows affected (0.10 sec)

mysql> DESC mv;
+----------+-------------+------+-------+---------+-------+
| Field    | Type        | Null | Key   | Default | Extra |
+----------+-------------+------+-------+---------+-------+
| username | VARCHAR(20) | Yes  | true  | NULL    |       |
| pv       | BIGINT      | Yes  | false | NULL    | NONE  |
+----------+-------------+------+-------+---------+-------+
2 rows in set (0.02 sec)
2023-01-14 01:29:32 +08:00
2580c88c1b [feature](multi-catalog) support oracle jdbc catalog (#15862) 2023-01-14 00:01:33 +08:00
d8990522fb [conf](compaction) enable vertical_compaction ordered_data_compaction (#14945) 2023-01-13 23:12:42 +08:00
a788623ee2 doris largeint type execute where query, the result is incorrect (#15034) 2023-01-13 23:12:02 +08:00
336148384b docs: fix small error (#15158) 2023-01-13 23:11:06 +08:00
bddf20dd72 [fix](docs) fix the command order of broker load doc. (#15861)
Co-authored-by: smallhibiscus <844981280>
2023-01-13 22:55:56 +08:00
692e1eb535 [typo](doc)rename column add 1.2 label (#15892) 2023-01-13 22:54:59 +08:00
38de7ce6c9 [typo](doc) add solutions to solve the problem of fetching stream load record slowly (#15911) 2023-01-13 22:53:00 +08:00
fbe68e7ec8 [regression-test](MTMV) Make the case test_create_mtmv more robust (addendum) (#15909) 2023-01-13 22:51:47 +08:00
ecb5aea182 [Feature-WIP](inverted index) inverted index writer's implementation (#15821) 2023-01-13 21:30:44 +08:00
bd2280b4ce [fix](planner) move join reorder to the single node planner (#15817)
Reorder in analyze phase would produce a stmt which its corresponding SQL could not be analyzed correctly and cause an analyze exception that may be happened in the stmt rewrite, since the rewriter will reset and reanalyze the rewritten stmt.
2023-01-13 19:42:12 +08:00
514de605b6 [Bug](predicate) add double predicate creator (#15762)
Add one double predicator the same as integer predicate creator.
2023-01-13 18:34:09 +08:00
049f8ad2f9 [Bug](sort)fix merge sorter might div zero when block bytes less than block rows (#15859)
If block bytes are bigger than the corresponding block's rows, then the avg_size_per_row would be zero. Which would end up diving zero in the following logic.
2023-01-13 18:33:40 +08:00
e979cc444a [improvement](multi-catalog) support hive 1.x (#15886)
The inferface of hive metastore changes from version to version.
Currently, Doris use hive 2.3.7 as hms client version.
When using to connect hive 1.x, some interface such as get_table_req does not exist
in hive 1.x. So we can't get metadata from hive 1.x.

In this PR, I copied the HiveMetastoreClient from hive 2.3.7 release, and modify some of interface's
implementation, so that it will use old interface to connect to hive 1.x.

And when creating hms catalog, you can specify the hive version, eg:

CREATE CATALOG `hive` PROPERTIES (
  "hive.metastore.uris" = "thrift://127.0.0.1:9083",
  "type" = "hms",
  "hive.version" = "1.1"
);
If hive.version does not specified, Doris will use hive 2.3.x compatible interface to visit hms.
2023-01-13 18:32:12 +08:00
1489e3cfbf [Fix](file system) Make the constructor of XxxFileSystem a private method (#15889)
Since Filesystem inherited std::enable_shared_from_this , it is dangerous to create native point of FileSystem.
To avoid this behavior, making the constructor of XxxFileSystem a private method and using the static method create(...) to get a new FileSystem object.
2023-01-13 15:32:16 +08:00
a8dacfbfd9 [opt](planner) return bigint literal when cast date literal to bigint type (#15613) 2023-01-13 12:58:04 +08:00
c1963e799a [fix](nereids)upgrade signature datatype bug (#15867)
ComputeSignatureHelper.upgradeDateOrDateTimeToV2()
we upgrate return date type, but forget to upgrade arguments datatype.

The same problem in upgradeDecimalV2ToV3()
2023-01-13 12:54:25 +08:00
67378a2dc3 [fix](nereids) fix bug in SequenceFunction legality check (#15812)
1. fix bug in sequence_match function
2. do type promotion instead of explicit cast for
  - varcharLiteral -> stringLiteral
  - charLiteral->stringLiteral
2023-01-13 12:09:53 +08:00
34bb9cd5d3 [fix](parquet-reader) fix coredump when load datatime data to doris from parquet (#15794)
`date_time_v2` will check scale when constructed datatimev2:
```
LOG(FATAL) << fmt::format("Scale {} is out of bounds", scale);
```

This [PR](https://github.com/apache/doris/pull/15510) has fixed this issue, but parquet does not use constructor to create `TypeDescriptor`, leading the `scale = -1` when reading datetimev2 data.
2023-01-13 11:51:11 +08:00
5e59954531 [typo](docs)Update ARRAY.md (#15781)
Added DATEV2 and DATETIMEV2 in array element types.
2023-01-13 11:45:50 +08:00
b1fb1277dd [fix](bitmap) fix bitmap iterator comparison error (#15779)
Fix the bug that bitmap.begin() == bitmap.end() is always true when the bitmap contains a single value.
2023-01-13 11:37:07 +08:00
9468711f9f [Bug](join) fix bug null aware left anti join not correct result (#15841) 2023-01-13 10:18:05 +08:00
688a0bb96a [feature](multi-catalog) support clickhouse jdbc catalog (#15780) 2023-01-13 10:07:22 +08:00
16862d9b43 [refactor](remove unused code) remove buffer pool and disk io mgr (#15853)
* [refactor](remove buffer pool and disk io mgr) remove unused code


Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-13 09:42:58 +08:00
bae29157aa [fix](olap) dictionary cannot be sorted after inserting some null values (#15829) 2023-01-13 09:28:55 +08:00
730571e386 [fix](sort spill) fix bug of failed to create spilled file (#15864)
Also increase buffered block size when it has started to spill.
2023-01-13 09:23:26 +08:00
174e5e601f [refactor](rpc fn) decouple vectorized remote function from row-based one (#15871) 2023-01-13 09:21:33 +08:00
a7af869bfd [opt](Nereids) group_concat to support more cases (#15815)
enhance group_concat to support group_concat(cast(slot), ...) and support call it with 1 argument.
2023-01-13 00:41:13 +08:00
9d41994c17 [opt](Nereids) throw exception when aliasedQuery has no alias(#15854) 2023-01-13 00:35:16 +08:00
14e3879c4b [regression-test](MTMV) Make the case test_create_mtmv more robust (#15866)
## Proposed changes

1. Check the state of MTMV task as the loop condition.
2. Check the data in materialized view.

## Problem summary

There are some minor issues with #15546.
1. The case used a retry strategy as the loop condition, it may not be stable while the host machine is busy.
2. The case didn't check the final data in materialized view.
2023-01-13 00:13:24 +08:00
be110ffaf6 [thirdparty](clucene) add clucene deps for doris inverted index (#15807)
As part of Inverted Index DSIP steps, we'd like to contribute our inverted index implementations step by step.
First of all we need to introduce clucene to doris thirdparty libs, because inverted index implementations are based on 
lucence API and index file format, also we add our features and performance improvements base on clucene, so we 
need to maintain the repo ourselves
2023-01-12 21:59:19 +08:00
d23646793c [fix](nereids) binding group by key on agg.output if output is slot (#15623)
case 1
`select count(1) from t1 join t2 on t1.a = t2.a group by a`
`group by a` is ambiguous

case 2
`select t1.a from t1 join t2 on t1.a = t2.a group by a`
`group by a` is bound on t1.a
2023-01-12 16:34:56 +08:00
0fbdf8e3e1 [Refactor](table function) Decouple vectorized table functions from non-vectorized ones (#15772) 2023-01-12 15:08:21 +08:00
ef0e0cf68d [enhancement](load) refine the reduce memory policy when process memory is nearly full (#15685)
If process memory is almost full but data load don't consume more than 5% (50% * 10%) of total memory, we don't need to reduce memory of load jobs
2023-01-12 14:43:33 +08:00
e48d715a3b [typo](doc) add hive catalog faq related to kerberos and hdfs (#15826) 2023-01-12 14:21:22 +08:00
7441b4dc96 [Feature](function) Support width_bucket function (#14396) 2023-01-12 13:59:21 +08:00
8c47c57264 [regression-test](array) fix abnormal test on function array_intersect (#15848) 2023-01-12 13:57:11 +08:00
b893c0efd8 [typo](doc)any_value add 1.2 label #15827 2023-01-12 12:16:29 +08:00
92dd7c442a [enhancement](unique key) disable concurrent flush memtable for unique key (#15802) 2023-01-12 12:10:50 +08:00
39697bb83e [fix](Nereids) make the type of the first parameter in window_funnel is intergerLike (#15810) 2023-01-12 11:53:28 +08:00
b86e781727 [testcase](index)change wait timeout from 1m to 2m in index_meta testcase #15838 2023-01-12 11:27:04 +08:00
ea0ef0d880 [fix](session-variable) repeat_max_num should be forwarded (#15840)
repeat_max_num should be forwarded to master, or stmt like:
insert into tbl values(repeat("a", 1000)) will not be affected by this session variable.
2023-01-12 10:51:35 +08:00