Commit Graph

9089 Commits

Author SHA1 Message Date
3d0beec01d [fix](orc) fix heap-use-after-free and potential memory leak of orc reader (#17431)
fix heap-use-after-free
The OrcReader has a internal FileInputStream, If the file is empty, the memory of FileInputStream will leak.
Besides, there is a Statistics instance in FileInputStream. FileInputStream maybe delete if the orc reader
is inited failed, but Statistics maybe used when orc reader is closed, causing heap-use-after-free error.

Potential memory leak
When init file scanner in file scan node, the file scanner prepare failed, the memory of file scanner will leak.
2023-03-06 08:42:35 +08:00
0801883604 [fix](merge-on-write) fix that delete bitmap is not calculated correctly when clone tablet (#17334) 2023-03-05 22:04:28 +08:00
5190a496ac [fix](rebalance) fix that the clone operation is not performed due to incorrect condition judgment (#17381) 2023-03-05 21:58:33 +08:00
d8a231f340 [Improvement](auth)(step-2) add ranger authorizer for hms catalog (#17424) 2023-03-05 21:50:44 +08:00
d08b231073 [fix](segcompaction) core when doing segcompaction for cancelling load(#16731) (#17432)
segcompaction is async and in parallel with load job. If the load job is
canncelling, memory structures will be destroyed and cause segcompaction
crash. This commit will wait segcompaction finished before destruction.
2023-03-05 21:24:32 +08:00
779d94f932 [fix](metrics)Delete the extra underline for metrics (#17397) 2023-03-05 16:38:43 +08:00
afb5def385 [enhancement](timeout) replace query timeout with exec timeout (#17360) 2023-03-05 11:03:59 +08:00
59bf305c5d [Improve](point query) put tablet fetch interface which is high concurrent point query operation to light_work_pool (#17400)
Since the point query lookup is very light weight
2023-03-05 10:36:50 +08:00
627b5ee302 [enhancement](k8s) Support fqdn mode for fe in k8s enviroment (#17329) 2023-03-05 10:18:56 +08:00
7b4fc412c5 [typo](docs) Optimize documents so that users can better understand. (#17295) 2023-03-04 21:02:45 +08:00
b9b028099d [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id (#17362)
* [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id

NewLoadStreamMgr already has pipe and other info. Do not need save the pipe into fragment state. and FragmentState should be more clear.

But this pr will change the behaviour of BE.
I will pick the pr to doris 1.2.3 and add the load id to FE support. The user could upgrade from 1.2.3 to 2.x
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-04 16:19:36 +08:00
2b014a0464 [Improve](doris::Status performance) fix the performance issue due to copy of std::string (#17411) 2023-03-04 15:08:59 +08:00
82df2ae9d8 [feature](mysql) Support secure MySQL connection to FE (#17138)
Background:
Doris currently does not support SSL connection from MySQL clients, it's not secure enough in some cases, especially access Doris via the public internet.

Solution:
- Use TLS1.2 protocol to encrypt information.
- Implementation details
  * server <--- connect <--- client
  * if enable SSL: {
  * server <--- SSL connection request packet <--- client
  * server <--- SSL Exchange ---> client } (we will add this `if` logic part in this PR)
  * server ---> handshake request packet ---> client
  * server <--- encrypted data ---> client (this part will be realized in this PR)
- reference1 https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase.html#sect_protocol_connection_phase_initial_handshake_ssl_handshake
- reference2 https://www.rfc-editor.org/rfc/rfc5246

close #16313

Signed-off-by: Yukang Lian <yukang.lian2022@gmail.com>
Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Co-authored-by: morningman <morningman@163.com>
2023-03-04 12:14:48 +08:00
9f7386243f [Fix](regression-test)fix some unfixed-answer test #17408 2023-03-04 12:13:41 +08:00
9aecd517b0 [test](Nereids) turn on all test in scalar function w (#17269)
turn on all test case in scalar function W except width_bucket(fix be bug in next PR)
turn off all test case for group_concat(distinct order by)
fix return nullable in TimestampArithmetic
2023-03-04 08:23:50 +08:00
c9179bd155 [chore](workflow) Fix the workflow BE UT (macOS) (#17403)
#17292 enabled detect_container_overflow which made the workflow BE UT (macOS) fail.
2023-03-03 23:34:54 +08:00
b501a9e7ab [improvement](inverted index)use reference to avoid bitmap copy for performance (#17352)
Query runtime is reduced from 10s to 1s for a MATCH query that match 40 million rows out of 44 million rows.
2023-03-03 21:00:49 +08:00
eea0cbec74 [enhancement](transaction) Reduce hold writeLock time for DatabaseTransactionMgr to improve stability of stream load (#17380)
Clear transaction state log occupies too much time, so we change clear transaction log level from info to debug


Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-03-03 19:06:39 +08:00
5c265d8183 [fix](vec)crashing caused by parallel output file (#17384) 2023-03-03 19:03:53 +08:00
9f97cd029f [Feature] (Nereids) add check to disable unsupported type (#17196)
1. disable decimalv3
2. disable json
3. disable complex type: array, map, struct
4. disable switch: group_by_and_having_use_alias
2023-03-03 17:57:48 +08:00
17164cf7a8 [fix](docs) add logic for batch delete when sequence column exists (#17367)
* [fix](docs) add logic for batch delete when sequence column exists.

Signed-off-by: nextdreamblue <zxw520blue1@163.com>

* add docs

Signed-off-by: nextdreamblue <zxw520blue1@163.com>

* fix docs 2

Signed-off-by: nextdreamblue <zxw520blue1@163.com>

---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-03-03 16:28:31 +08:00
0fada66e03 [fix](cooldown) Fix deadlock in tablet clone (#17252) 2023-03-03 15:53:12 +08:00
b5b595519a [fix](log) use logger to replace printStackTrace() (#17382)
Use Logger to replace printStackTrace to better locate problems.
2023-03-03 14:51:30 +08:00
cc5fa509ad [fix](cooldown) Fix bug in concurrent update_cooldown_conf and operations that update cooldowned data (#17086) 2023-03-03 14:36:58 +08:00
Pxl
ae689c3a0b [Bug](regression-test) remove some exception title on regression case (#17374)
remove some exception title on regression case
2023-03-03 14:18:10 +08:00
3b94ca5ceb [chore](macOS) Use LLVM Clang by default (#17292)
Use LLVM Clang by default
2023-03-03 14:18:02 +08:00
6ce8200d9e [doc](typo) external-table-load.md (#17234)
* fix: external-table-load.md

The SQL with a syntax error.

* fix: external-table-load.md (Chinese)

The SQL with a syntax error.
2023-03-03 14:11:19 +08:00
11994b76d7 add the tag <version since="dev"> for insert_timeout. (#17316)
Co-authored-by: smallhibiscus <844981280>
2023-03-03 14:10:49 +08:00
ba108d40d8 [docs](link) Fix some links in docs is broken (#17335)
* [docs](link) Fix some links in docs is broken

* fix_typo
2023-03-03 14:08:05 +08:00
f5d958ccf9 [fix](MTMV) Reset insert timeout in handleInsert (#17249)
In #16343, we split the timeout variable into two ones (one is for query and another is for insertion).

The function `ConnectProcessor::handleQuery` uses the corresponding session variable to change the timeout for the queries requested by MySQL client. However, the function `StmtExecutor::handleInsert` doesn't use the session variable to change the timeout, so we can't change the timeout for the CTAS and MTMV insertion job.
2023-03-03 11:32:50 +08:00
c96571c236 [Bug](decimalv2) decimal value is filtered by mistake (#17353)
Reason: column_name[k5], decimal value is not valid for definition, value=123.123, precision=9, scale=3, min=-999999.999, max=-999999.999; . src line [];

#17273
2023-03-03 10:40:19 +08:00
e82b827bc8 [optimize](vectorization)Optimize to_string's performance. (#17076) 2023-03-03 10:35:59 +08:00
94e9a226a6 [Bug](Block compression) Fix bug if config::compress_rowbatches=false then the block column values could be empty (#17325) 2023-03-03 10:31:12 +08:00
f5232e5c01 [vectorized](bug) fix some open enable_fold_constant_by_be failed cases (#17240) 2023-03-03 10:30:20 +08:00
449f2953c9 [Improvement](auth)(step-1) add ranger authorizer for hms catalog (#17153) 2023-03-03 09:45:08 +08:00
227d2b0bf9 [log](schema change) add schema change type log (#17349) 2023-03-03 08:25:28 +08:00
cd7e03575b [tools](tpc-ds) add script tools to run tpc-ds conveniently (#17366)
build-tpcds-tools.sh
gen-tpcds-data.sh
gen-tpcds-queries.sh
create-tpcds-tables.sh
load-tpcds-data.sh
run-tpcds-queries.sh
generate data and queries support specify SCALE,
create table may need to be edited handly to specify BUCKETS or change int to bigint if SCALE is too big.

---------

Co-authored-by: stephen <hello_stephen@@qq.com>
2023-03-03 08:24:07 +08:00
ba82cd10c6 [Enhencement](Jdbc catalog) Add two optional properties for jdbc catalog (#17245)
1. The first property is `only_specified_database`:
In the past, `Jdbc Catalog` will synchronize all database from source database.
Now we add a parameter called `only_specified_database` to jdbc catalog to allow only the specified database to be synchronized, eg:

```sql
create resource if not exists ${resource_name} properties(
    "type"="jdbc",
    "user"="root",
    "password"="123456",
    "jdbc_url" = "jdbc:mysql://172.18.0.1:${mysql_port}/doris_test?useSSL=false",
    "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar",
    "driver_class" = "com.mysql.cj.jdbc.Driver",
    "only_specified_database" = "true"
);
```
if `only_specified_database` is `true`, jdbc catalog will only synchronize the database which is specified in `jdbc_url`.

2. The second property is `lower_case_table_names`:
This property will synchronize jdbc external data source table names in lower case.

```sql
create resource if not exists ${resource_name} properties(
  "type"="jdbc",
  "user"="doris_test",
  "password"="123456",
  "jdbc_url" = "jdbc:oracle:thin:@172.18.0.1:${oracle_port}:${SID}",
  "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/ojdbc8.jar",
  "driver_class" = "oracle.jdbc.driver.OracleDriver",
  "lower_case_table_names" = "true"
);
```
2023-03-03 00:47:46 +08:00
22983fc813 [fix](test) regression test file has been removed (#17332)
Data file in https://github.com/apache/doris/pull/16342/files has been removed for path conflict,
so change the resource path.
2023-03-03 00:44:48 +08:00
3eeeff09fd [enhancement](nereids) convert string literal to commontype in in-expr and cass-when-expr (#17200) 2023-03-02 22:05:35 +08:00
93d2d461b4 [feature](Nereids): pushdown complex project through left semi/anti Join. (#17186) 2023-03-02 21:41:08 +08:00
a1399043fe [fix](Nereids) fold constant on BE could not process alias (#17259)
1. could not use static INSTANCE for FoldConstantOnBE rule, because it is stateful
2. if expression root is Alias, should use its child to do const collection
2023-03-02 19:16:23 +08:00
27352afdf6 [fix](fe)support multi distinct group_concat (#17237)
* [fix](fe)support multi distinct group_concat

* update based on comments
2023-03-02 17:53:13 +08:00
33349e1457 [fix](Nereids) fold 'version()' function (#17172)
For compatibility with legacy planner, we fold version() with GlobalVariable.version in Nereids
2023-03-02 17:35:41 +08:00
823d968452 [fix](expr) avoid crashing caused by big depth of expression tree (#17314) 2023-03-02 16:55:53 +08:00
39f59f554a [improvement](dry-run)(tvf) support csv schema in tvf and add "dry_run_query" variable (#16983)
This CL mainly changes:

Support specifying csv schema manually in s3/hdfs table valued function

s3 (
'URI' = 'https://bucket1/inventory.dat',
'ACCESS_KEY'= 'ak',
'SECRET_KEY' = 'sk',
'FORMAT' = 'csv',
'column_separator' = '|',
'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)',
'use_path_style'='true'
)
Add new session variable dry_run_query

If set to true, the real query result will not be returned, instead, it will only return the number of returned rows.

mysql> select * from bigtable;
+--------------+
| ReturnedRows |
+--------------+
| 10000000     |
+--------------+
This can avoid large result set transmission time and focus on real execution time of query engine.
For debug and analysis purpose.
2023-03-02 16:51:27 +08:00
17f4990bd3 [enhancement](functioncontext) function context should use shared ptr and simply function context (#17311)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-02 16:23:54 +08:00
698e9cd243 [fix](demo)fix cdc failed to synchronize datetime type in mysql, and added JsonDebeziumSchemaSerializer (#16971)
* [fix](demo)fix cdc failed to synchronize datetime type in mysql, and added JsonDebeziumSchemaSerializer
* add licenses for DateToStringConverter
2023-03-02 14:14:58 +08:00
9f088f6e90 [feature](json) add json_valid function (#17247)
add json_valid function

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-03-02 14:08:52 +08:00
9155d8b9d1 [fix](delete) fix 'is null' or 'is not null' delete predicate will get wrong result (#17190)
fix 'is null' or 'is not null' delete predicate will get wrong result

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-03-02 14:05:44 +08:00