Commit Graph

1894 Commits

Author SHA1 Message Date
c47368f80c [fix] (udf) fix check_fn and fn_call function name not same (#8132) 2022-02-22 09:18:07 +08:00
5cc8cb1b93 [improvement](txn) Add PreCommitTime for the result of SHOW TRANSACTION stmt (#8124)
Add `PreCommitTime` for  the result of ` SHOW TRANSACTION;` and `SHOW PROC '/transactions/{DbId}/{state}';`.
2022-02-19 12:02:07 +08:00
9df5b2dfdc [fix](variables) Fix bug that execute showVariablesStmt with where expression return empty resultset (#8094)
This Bug is introduced by PR #7936 , which change key type of connectionMap from Long to Integer,
which cause connectionMap could not find connectContext by connectionId
2022-02-19 11:58:17 +08:00
8892780091 [Vectorized][Feature] support agg function percentile&&percentile_approx (#8066) 2022-02-18 13:42:24 +08:00
Z6N
920a6db5a7 Fix username@cluster:passwod is modified to cluster: username:passwod causes authentication failure (#8115)
Fix username@cluster:passwod is modified to cluster: username:passwod causes authentication failure 

Co-authored-by: z6n <ztmailgo@gmail.com>
2022-02-18 11:19:17 +08:00
b7e07ee472 [fix](cache) Throws ClassCastException when there are multiple EXCEPT, INTERSECT and UNION in the local view (#8083)
Issue Number: close #8082
Throws ClassCastException when there are multiple EXCEPT, INTERSECT and UNION in the local view.
2022-02-18 10:56:37 +08:00
Pxl
e0dbf48682 [Vectorized] [AggFunction] Support group_concat (#8086) 2022-02-17 14:19:07 +08:00
289aacb78c [improvement] enable check_java_version (#8034)
Enable to check the Java version when Doris starts, to prevent the user experience caused by the inconsistency 
between the compiled version and the running version.
If the Java version is compiled and the Java version is run, it will not start, and a prompt message will be given.
2022-02-17 11:16:45 +08:00
26289c28b0 [fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072)
1. Fix the problem of BE crash caused by destruct sequence. (close #8058)
2. Add a new BE config `compaction_task_num_per_fast_disk`

    This config specify the max concurrent compaction task num on fast disk(typically .SSD).
    So that for high speed disk, we can execute more compaction task at same time,
    to compact the data as soon as possible

3. Avoid frequent selection of unqualified tablet to perform compaction.
4. Modify some log level to reduce the log size of BE.
5. Modify some clone logic to handle error correctly.
2022-02-17 10:52:08 +08:00
264f38471c [feature](spark-load) add Hive Bitmap UDFs (#8036)
Hive Bitmap UDF provides UDFs for generating bitmap and bitmap operations in hive tables.
The bitmap in Hive is exactly the same as the Doris bitmap.
The bitmap in Hive can be imported into Doris through spark bitmap load.
2022-02-17 10:45:20 +08:00
e6fedff68f [Refactor][heartbeat] Make get fe heart response by thrift (#8035)
* [Refactor] Make get fe heart response by thrift

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-02-17 10:25:51 +08:00
Pxl
143c4085ee [Feature][Vectorized] support aggregate function ndv()/approx_count_distinct() (#8044) 2022-02-16 14:30:13 +08:00
a46af29051 [fix](meta) fix bug that FE can't start due to wrong image reading (#8045)
should remove decommission job type from enum
2022-02-16 11:58:40 +08:00
aee9273a09 [typo] translate comment in Chinese to English in SingleNodePlanner (#8038) 2022-02-16 11:57:12 +08:00
bb4881bb04 [fix](planner) fix using clause npe (#7952)
Issue Number: close #7953
2022-02-16 11:56:44 +08:00
a6bf8c13eb [Feature](Transaction) Support two phase commit (2PC) for stream load (#7473)
The two phase batch commit means:
During Stream load, after data is written, the message will be returned to the client,
the data is invisible at this point and the transaction status is PRECOMMITTED.
The data will be visible only after COMMIT is triggered by client.
    
1. User can invoke the following interface to trigger commit operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc

    
2.User can invoke the following interface to trigger abort operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc
2022-02-16 11:55:04 +08:00
6ccf9dbc56 [feature-wip](statistics) Step1: Statistics collection framework (#7880)
Framework code for statistics collection,
containing only the main data structures, no implementation details.
This pr will not affect any existing code
and users will not be able to create statistics job.
2022-02-16 11:08:48 +08:00
884fddbf33 [fix](compatibility) Fix compatibility issue of PRowBatch and some tablet sink bugs (#8000)
1. set both `tuple_offsets` and `new_tuple_offsets` in PRowBatch for compatibility
2. set FE config `repair_slow_replica` default to false
   Avoid impacting the load process after upgrading.
   Eg, if there are only 2 replicas, one is with high version count. After upgrade,
   that replica will be set to bad, so that the load process will be stopped
   because only 1 replica is alive.
3. Fix a bug that NodeChannel may be blocked at `close_wait()`
   Forget to set `add_batch_finish` flag after the last rpc finished.
4. Fix a NPE of RoutineLoadScheduler
2022-02-15 11:23:19 +08:00
1ff0df9f54 [refactor] Remove old schema change rollup backend decommission code (#8030) 2022-02-14 09:29:50 +08:00
969cd0c391 [fix](fe-ui)Solve the problem that the web UI playground preview table data, the field is the wrong problem (#8016)
Solve the problem that the web UI playground preview table data, the field is the wrong problem
2022-02-14 09:28:32 +08:00
1278796e51 [fix](backup) fix backup job finished with error message issue (#7997) 2022-02-12 16:01:05 +08:00
ee26cd2d07 [fix] (grouping set) fix Unexpected exception: bitIndex < 0: -1 (#7989) 2022-02-12 15:18:08 +08:00
5bd9fdb8c1 [Improvement] print log foreground if not use --daemon to start fe (#7995) 2022-02-10 17:19:39 +08:00
92b690f3eb [feature-wip](iceberg) Step2: add table creation strict mode and support refresh iceberg table or db. (#7981)
1. Add `iceberg_table_creation_strict_mode` in `fe.conf` to control iceberg external table creation, when data type  is not supported in Doris.
2. Add `REFRESH` syntax to synchronize the Iceberg table and database.
3. Support create Iceberg external table with specific column definitions.
2022-02-10 15:08:04 +08:00
df2c7563b0 [improvement](log) Add query id info in error log for easy tracking (#7975)
This PR #7936 change some FE log level to debug, so that when error happens, it is not easy to find out
which SQL cause the error.
So I add stmt id and query id in error log, so that user can use these identifiers to find SQL in fe.audit.log
2022-02-09 13:07:28 +08:00
eeaf6725fd (fix)[lateral-view] Solve the problem of not recognizing the lateral view on the view (#7968)
If the tableRef behind represents a CTE or a view,
the tableRef will be reset during semantic parsing.
The new tableRef needs to inherit the lateral view property of the origin tableRef
to ensure that the lateral view is not accidentally lost during parsing.
2022-02-09 13:07:03 +08:00
Pxl
0553ce2944 [feature](vectorization) support function topn && remove some unused code (#7793) 2022-02-09 13:05:31 +08:00
3048ce8a4f [improvement][refactor](vec) Refactor serde of vec block and using brpc attachment (#7939)
This PR mainly changes:

1. Change the define of PBlock

    The new PBlock consists of a set of PColumnMeta and a binary buffer.
    The PColumnMeta records the metadata information of all columns in the Block,
    while the buffer stores the serialized binary data of all columns.
    
2. Refactor the serialize/deserialize method of data type

    Rewrite the `serialize()/deserialize()` of IDataType. And also add
    a new method `get_uncompressed_serialized_bytes()` to get the total length
    of uncompressed serialized data of a column.
    
3. Rewrite the serialize/deserialize method of Block

    Now, when serializing a Block to PBlock, it will first get the total length
    of uncompressed serialized data of all columns in this Block, and then allocate
    the memory to write the serialized data to the buffer.
    
4. Use brpc attachment to transmit the serialized column data
2022-02-08 11:11:42 +08:00
ecbd4bcae0 [fix](catalog) Fix bug that The MetaObject lock design of fe would cause some problems with consistent meta when catalog do replay operation (#6650)
1. If the table or db has been dropped,we will get write lock failed or just skip or throw exception,
2. and if we recover table or db, we must ensure that unmark dropped state after writing recover journal. 
3. db.dropTable corresponds to db.createTable, I don't move table.markDropped method to the db.dropTable,
    for that all meta added to db or catalog must after writing recover journal, so we must invoke markDropped
    and unmarkDropped method outside the dropTable and createTable method.
2022-02-08 10:01:52 +08:00
c6defb2faf [improvement](query) Improve fe high concurrent query performance (#7936) 2022-02-08 09:54:59 +08:00
f8d086d87f [feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519)
Support implement UDF through GRPC protocol. This brings several benefits: 
1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf
2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected

But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large.

Create function like

```
CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES (
  "SYMBOL"="add_int",
  "OBJECT_FILE"="127.0.0.1:9999",
  "TYPE"="RPC"
);
```
Function service need to implement `check_fn` and `fn_call` methods
Note:
THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!
2022-02-08 09:25:09 +08:00
2ffd7fc80a [fix](load priv) modify error msg of checking table priv (#7817) 2022-02-06 08:33:41 +08:00
c0e59e59aa [fix][refactor] fix bugs and refactor some code by lint (#7871)
1. Fix some `passedByValue` issues.
2. Fix some `dereferenceBeforeCheck` issues.
3. Fix some `uninitMemberVar` issues.
4. Fix some iterator `eraseDereference` issues.
5. Fix compile issue introduced from #7923 #7905 #7848
2022-02-01 14:31:14 +08:00
58ad8b7ec9 (improvement)[test] Combine multiple tests to use only one doris cluster (#7934)
This PR mainly includes the following two changes:
1. Shorten FE single measurement time
In Doris's FE unit test, starting a Doris cluster is a time-consuming operation.
In this PR, the unit tests of some small functions are merged into @QueryPlanTest,
the same cluster is used centrally,
so as to avoid the problem that the overall unit test time of FE is too long.

2. Refine the logic of "PR 7851"
Although the function can be implemented correctly in PR #7851,
the logic is not brief enough.
This PR mainly succinct redundant code in terms of engineering implementation.
2022-01-31 22:16:44 +08:00
8c179bb09f [fix](alter) fix sql analyzed failed after increase the default bucket num of the table. (#7932)
Distribution info of partitions are deep copied from olapTable.
2022-01-31 22:16:08 +08:00
4ada8e4854 [fix](httpv2) make http v2 and v1 interface compatible (#7848)
http v2 TableSchemaAction adds the return value of aggregation_type,
and modifies the corresponding code of Flink/Spark Connector
2022-01-31 22:12:34 +08:00
c1fef37399 [improvement](runtime-filter) Support adaptive runtime filter(#7546) (#7645)
Change 1: Support an adaptive runtime filter: IN_OR_BLOOM_FILTER
    The processing logic is
    If the number of rows in the right table < runtime_filter_max_in_num, then IN predicate will work
    If the number of rows in the right table >= runtime_filter_max_in_num, then Bloom filter can take effect

Change 2: The default runtime filter is changed to filter: IN_OR_BLOOM_FILTER
2022-01-30 16:46:52 +08:00
4c7525cf2c [improvement](show) Support that user can use show data skew statement instead of admin (#7914)
* [improvement](show) Support that user can use show data skew statement instead of admin
This PR mainly do two things:
1. Support that user can use show data skew statement instead of admin
2. Fix fe ut failed caused by pr [improvement](rewrite) Make RewriteDateLiteralRule to be compatible with mysql #7876 and pr [feature-wip](iceberg) Step1: Support create Iceberg external table #7391

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-01-29 10:45:03 +08:00
1d900d8605 (fix)[planner] Fix the right tuple ids in empty set node (#7931)
The tuple ids of the empty set node must be exactly the same as the tuple ids of the origin root node.
In the issue, we found that once the tree where the root node is located has a window function,
the tuple ids of the empty set node cannot be calculated correctly.

This pr mostly fixes the problem.
In order to calculate the correct tuple ids,
the tuple ids obtained from the SelectStmt.getMaterializedTupleIds() function in the past
are changed to directly use the tuple ids of the origin root node.

Although we tried to fix #7929 by modifying the SelectStmt.getMaterializedTupleIds() function,
this method can't get the tuple of the last correct window function.
So we use other ways to construct tupleids of empty nodes.
2022-01-29 09:46:05 +08:00
071be928f9 [fix](vectorized) fix bug multi distinct function get wrong type (#7900) 2022-01-28 22:31:41 +08:00
3a7bb7e144 [improvement](fe-meta-version)Some if conditions do not use the FeMetaVersion constant (#7879) 2022-01-28 22:25:17 +08:00
f93ac89a67 [fix](lateral-view) fix bugs of lateral view with CTE or where clause (#7865)
fix bugs of lateral view with CTE or where clause.
The error case can be found in newly added tests in `TableFunctionPlanTest.java`
But there are still some bugs not being fixed, so the unit test is annotated with @Ignore

This PR contains the change is #7824 :

> Issue Number: close #7823
> 
> After the subquery is rewritten, the rewritten stmt needs to be reset
> (that is, the content of the first analyze semantic analysis is cleared),
> and then the rewritten stmt can be reAnalyzed.
> 
> The lateral view ref in the previous implementation forgot to implement the reset function.
> This caused him to keep the first error message in the second analyze.
> Eventually, two duplicate tupleIds appear in the new stmt and are marked with different tuple.
> From the explain string, the following syntax will have an additional wrong join predicate.
> ```
> Query: explain select k1 from test_explode lateral view explode_split(k2, ",") tmp as e1  where k1 in (select k3 from tbl1);
> Error equal join conjunct: `k3` = `k3`
> ```
> 
> This pr mainly adds the reset function of the lateral view
> to avoid possible errors in the second analyze
> when the lateral view and subquery rewrite occur at the same time.
2022-01-28 22:24:23 +08:00
22830ea498 [feature](show) add new statement show proc '/current_query_stmts' (#7487)
To show the the query statement at first level.
2022-01-28 22:23:13 +08:00
dee79d98a8 [improvement](explain) Displays cast information with implicit conversions in verbose (#7851)
Displays cast information with implicit conversions in verbose.
2022-01-27 10:37:38 +08:00
d2386dd85d [improvement](rewrite) Make RewriteDateLiteralRule to be compatible with mysql (#7876) 2022-01-27 10:32:18 +08:00
d69b7bff2e [feature](meta) Support show compactionTooSlowTablets and oversizeTablets (#7821)
Add more columns in `show proc "/statistic"`
2022-01-27 10:26:41 +08:00
3b8d48f08b [feature-wip](iceberg) Step1: Support create Iceberg external table (#7391)
Close related #7389

Support create Iceberg external table in Doris. 

This is the first step to support Iceberg external table.

### Create Iceberg external table
This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions.

1. Create an Iceberg external table directly

```sql
    CREATE [EXTERNAL] TABLE table_name 
    ENGINE = ICEBERG
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.table" = "icberg_table_name",
    "iceberg.hive.metastore.uris"  =  "thrift://192.168.0.1:9083",
    "iceberg.catalog.type"  =  "HIVE_CATALOG"
    );
```

2. Create an Iceberg database and automatically create all the tables under that db.

```sql
    CREATE DATABASE db_name 
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
```

### Show table creation

1. For individual tables you can view them with `help show create table`.

```sql 
mysql> show create table iceberg_db.logs_1;
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                                                                                                                                                                                                                                                                                 |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs_1 | CREATE TABLE `logs_1` (
  `level` varchar(-1) NOT NULL COMMENT "null",
  `event_time` datetime NOT NULL COMMENT "null",
  `message` varchar(-1) NOT NULL COMMENT "null"
) ENGINE=ICEBERG
COMMENT "ICEBERG"
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.table" = "logs_1",
"iceberg.hive.metastore.uris"  =  "thrift://10.10.10.10:9087",
"iceberg.catalog.type"  =  "HIVE_CATALOG"
) |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

2. For Iceberg database, you can view it with `help show table creation`.

```sql
mysql> show table creation from iceberg_db;
+--------+---------+---------------------+---------------------------------------------------------+
| Table  | Status  | Create Time         | Error Msg                                               |
+--------+---------+---------------------+---------------------------------------------------------+
| logs   | fail    | 2021-12-14 13:50:10 | Cannot convert unknown type to Doris type: list<string> |
| logs_1 | success | 2021-12-14 13:50:10 |                                                         |
+--------+---------+---------------------+---------------------------------------------------------+
2 rows in set (0.00 sec)
```

  This is a new syntax.
  
  Show table creation records in Iceberg database:
  
  Syntax:
  ```sql
      SHOW TABLE CREATION [FROM db] [LIKE mask]
  ```
2022-01-27 10:22:47 +08:00
015371ac72 [fix](grouping-set) Fix the bug of grouping set core in both vec and non vec query engine (#7800) 2022-01-26 16:15:30 +08:00
4bdeef3b64 [chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)
1. fix problems when build fe_plugins
2. format
3. add docs about dump data using mysql dump
2022-01-26 09:11:23 +08:00
b435a54304 [fix] Consider backend status when more than one backends exists in same host (#7784) 2022-01-26 09:10:34 +08:00