doris

Author	SHA1	Message	Date
yiguolei	1ff0df9f54	[refactor] Remove old schema change rollup backend decommission code (#8030 )	2022-02-14 09:29:50 +08:00
jiafeng.zhang	969cd0c391	[fix](fe-ui)Solve the problem that the web UI playground preview table data, the field is the wrong problem (#8016 ) Solve the problem that the web UI playground preview table data, the field is the wrong problem	2022-02-14 09:28:32 +08:00
GoGoWen	1278796e51	[fix](backup) fix backup job finished with error message issue (#7997 )	2022-02-12 16:01:05 +08:00
Zhengguo Yang	ee26cd2d07	[fix] (grouping set) fix Unexpected exception: bitIndex < 0: -1 (#7989 )	2022-02-12 15:18:08 +08:00
yiguolei	5bd9fdb8c1	[Improvement] print log foreground if not use --daemon to start fe (#7995 )	2022-02-10 17:19:39 +08:00
qiye	92b690f3eb	[feature-wip](iceberg) Step2: add table creation strict mode and support refresh iceberg table or db. (#7981 ) 1. Add `iceberg_table_creation_strict_mode` in `fe.conf` to control iceberg external table creation, when data type is not supported in Doris. 2. Add `REFRESH` syntax to synchronize the Iceberg table and database. 3. Support create Iceberg external table with specific column definitions.	2022-02-10 15:08:04 +08:00
Mingyu Chen	df2c7563b0	[improvement](log) Add query id info in error log for easy tracking (#7975 ) This PR #7936 change some FE log level to debug, so that when error happens, it is not easy to find out which SQL cause the error. So I add stmt id and query id in error log, so that user can use these identifiers to find SQL in fe.audit.log	2022-02-09 13:07:28 +08:00
EmmyMiao87	eeaf6725fd	(fix)[lateral-view] Solve the problem of not recognizing the lateral view on the view (#7968 ) If the tableRef behind represents a CTE or a view, the tableRef will be reset during semantic parsing. The new tableRef needs to inherit the lateral view property of the origin tableRef to ensure that the lateral view is not accidentally lost during parsing.	2022-02-09 13:07:03 +08:00
Pxl	0553ce2944	[feature](vectorization) support function topn && remove some unused code (#7793 )	2022-02-09 13:05:31 +08:00
Mingyu Chen	3048ce8a4f	[improvement][refactor](vec) Refactor serde of vec block and using brpc attachment (#7939 ) This PR mainly changes: 1. Change the define of PBlock The new PBlock consists of a set of PColumnMeta and a binary buffer. The PColumnMeta records the metadata information of all columns in the Block, while the buffer stores the serialized binary data of all columns. 2. Refactor the serialize/deserialize method of data type Rewrite the `serialize()/deserialize()` of IDataType. And also add a new method `get_uncompressed_serialized_bytes()` to get the total length of uncompressed serialized data of a column. 3. Rewrite the serialize/deserialize method of Block Now, when serializing a Block to PBlock, it will first get the total length of uncompressed serialized data of all columns in this Block, and then allocate the memory to write the serialized data to the buffer. 4. Use brpc attachment to transmit the serialized column data	2022-02-08 11:11:42 +08:00
caiconghui	ecbd4bcae0	[fix](catalog) Fix bug that The MetaObject lock design of fe would cause some problems with consistent meta when catalog do replay operation (#6650 ) 1. If the table or db has been dropped，we will get write lock failed or just skip or throw exception， 2. and if we recover table or db， we must ensure that unmark dropped state after writing recover journal. 3. db.dropTable corresponds to db.createTable, I don't move table.markDropped method to the db.dropTable, for that all meta added to db or catalog must after writing recover journal, so we must invoke markDropped and unmarkDropped method outside the dropTable and createTable method.	2022-02-08 10:01:52 +08:00
caiconghui	c6defb2faf	[improvement](query) Improve fe high concurrent query performance (#7936 )	2022-02-08 09:54:59 +08:00
Zhengguo Yang	f8d086d87f	[feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519 ) Support implement UDF through GRPC protocol. This brings several benefits: 1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf 2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large. Create function like ``` CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES ( "SYMBOL"="add_int", "OBJECT_FILE"="127.0.0.1:9999", "TYPE"="RPC" ); ``` Function service need to implement `check_fn` and `fn_call` methods Note: THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!	2022-02-08 09:25:09 +08:00
Henry2SS	2ffd7fc80a	[fix](load priv) modify error msg of checking table priv (#7817 )	2022-02-06 08:33:41 +08:00
Mingyu Chen	c0e59e59aa	[fix][refactor] fix bugs and refactor some code by lint (#7871 ) 1. Fix some `passedByValue` issues. 2. Fix some `dereferenceBeforeCheck` issues. 3. Fix some `uninitMemberVar` issues. 4. Fix some iterator `eraseDereference` issues. 5. Fix compile issue introduced from #7923 #7905 #7848	2022-02-01 14:31:14 +08:00
EmmyMiao87	58ad8b7ec9	(improvement)[test] Combine multiple tests to use only one doris cluster (#7934 ) This PR mainly includes the following two changes: 1. Shorten FE single measurement time In Doris's FE unit test, starting a Doris cluster is a time-consuming operation. In this PR, the unit tests of some small functions are merged into @QueryPlanTest, the same cluster is used centrally, so as to avoid the problem that the overall unit test time of FE is too long. 2. Refine the logic of "PR 7851" Although the function can be implemented correctly in PR #7851, the logic is not brief enough. This PR mainly succinct redundant code in terms of engineering implementation.	2022-01-31 22:16:44 +08:00
spaces-x	8c179bb09f	[fix](alter) fix sql analyzed failed after increase the default bucket num of the table. (#7932 ) Distribution info of partitions are deep copied from olapTable.	2022-01-31 22:16:08 +08:00
jiafeng.zhang	4ada8e4854	[fix](httpv2) make http v2 and v1 interface compatible (#7848 ) http v2 TableSchemaAction adds the return value of aggregation_type, and modifies the corresponding code of Flink/Spark Connector	2022-01-31 22:12:34 +08:00
924060929	c1fef37399	[improvement](runtime-filter) Support adaptive runtime filter(#7546 ) (#7645 ) Change 1: Support an adaptive runtime filter: IN_OR_BLOOM_FILTER The processing logic is If the number of rows in the right table < runtime_filter_max_in_num, then IN predicate will work If the number of rows in the right table >= runtime_filter_max_in_num, then Bloom filter can take effect Change 2: The default runtime filter is changed to filter: IN_OR_BLOOM_FILTER	2022-01-30 16:46:52 +08:00
caiconghui	4c7525cf2c	[improvement](show) Support that user can use show data skew statement instead of admin (#7914 ) * [improvement](show) Support that user can use show data skew statement instead of admin This PR mainly do two things: 1. Support that user can use show data skew statement instead of admin 2. Fix fe ut failed caused by pr [improvement](rewrite) Make RewriteDateLiteralRule to be compatible with mysql #7876 and pr [feature-wip](iceberg) Step1: Support create Iceberg external table #7391 Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2022-01-29 10:45:03 +08:00
EmmyMiao87	1d900d8605	(fix)[planner] Fix the right tuple ids in empty set node (#7931 ) The tuple ids of the empty set node must be exactly the same as the tuple ids of the origin root node. In the issue, we found that once the tree where the root node is located has a window function, the tuple ids of the empty set node cannot be calculated correctly. This pr mostly fixes the problem. In order to calculate the correct tuple ids, the tuple ids obtained from the SelectStmt.getMaterializedTupleIds() function in the past are changed to directly use the tuple ids of the origin root node. Although we tried to fix #7929 by modifying the SelectStmt.getMaterializedTupleIds() function, this method can't get the tuple of the last correct window function. So we use other ways to construct tupleids of empty nodes.	2022-01-29 09:46:05 +08:00
zhangstar333	071be928f9	[fix](vectorized) fix bug multi distinct function get wrong type (#7900 )	2022-01-28 22:31:41 +08:00
blackstar-baba	3a7bb7e144	[improvement](fe-meta-version)Some if conditions do not use the FeMetaVersion constant (#7879 )	2022-01-28 22:25:17 +08:00
Mingyu Chen	f93ac89a67	[fix](lateral-view) fix bugs of lateral view with CTE or where clause (#7865 ) fix bugs of lateral view with CTE or where clause. The error case can be found in newly added tests in `TableFunctionPlanTest.java` But there are still some bugs not being fixed, so the unit test is annotated with @Ignore This PR contains the change is #7824 : > Issue Number: close #7823 > > After the subquery is rewritten, the rewritten stmt needs to be reset > (that is, the content of the first analyze semantic analysis is cleared), > and then the rewritten stmt can be reAnalyzed. > > The lateral view ref in the previous implementation forgot to implement the reset function. > This caused him to keep the first error message in the second analyze. > Eventually, two duplicate tupleIds appear in the new stmt and are marked with different tuple. > From the explain string, the following syntax will have an additional wrong join predicate. > ``` > Query: explain select k1 from test_explode lateral view explode_split(k2, ",") tmp as e1 where k1 in (select k3 from tbl1); > Error equal join conjunct: `k3` = `k3` > ``` > > This pr mainly adds the reset function of the lateral view > to avoid possible errors in the second analyze > when the lateral view and subquery rewrite occur at the same time.	2022-01-28 22:24:23 +08:00
caiconghui	22830ea498	[feature](show) add new statement show proc '/current_query_stmts' (#7487 ) To show the the query statement at first level.	2022-01-28 22:23:13 +08:00
zhengshiJ	dee79d98a8	[improvement](explain) Displays cast information with implicit conversions in verbose (#7851 ) Displays cast information with implicit conversions in verbose.	2022-01-27 10:37:38 +08:00
caiconghui	d2386dd85d	[improvement](rewrite) Make RewriteDateLiteralRule to be compatible with mysql (#7876 )	2022-01-27 10:32:18 +08:00
caiconghui	d69b7bff2e	[feature](meta) Support show compactionTooSlowTablets and oversizeTablets (#7821 ) Add more columns in `show proc "/statistic"`	2022-01-27 10:26:41 +08:00
qiye	3b8d48f08b	[feature-wip](iceberg) Step1: Support create Iceberg external table (#7391 ) Close related #7389 Support create Iceberg external table in Doris. This is the first step to support Iceberg external table. ### Create Iceberg external table This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions. 1. Create an Iceberg external table directly ```sql CREATE [EXTERNAL] TABLE table_name ENGINE = ICEBERG [COMMENT "comment"] PROPERTIES ( "iceberg.database" = "iceberg_db_name", "iceberg.table" = "icberg_table_name", "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083", "iceberg.catalog.type" = "HIVE_CATALOG" ); ``` 2. Create an Iceberg database and automatically create all the tables under that db. ```sql CREATE DATABASE db_name [COMMENT "comment"] PROPERTIES ( "iceberg.database" = "iceberg_db_name", "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083", "iceberg.catalog.type" = "HIVE_CATALOG" ); ``` ### Show table creation 1. For individual tables you can view them with `help show create table`. ```sql mysql> show create table iceberg_db.logs_1; +--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Table \| Create Table \| +--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| logs_1 \| CREATE TABLE `logs_1` ( `level` varchar(-1) NOT NULL COMMENT "null", `event_time` datetime NOT NULL COMMENT "null", `message` varchar(-1) NOT NULL COMMENT "null" ) ENGINE=ICEBERG COMMENT "ICEBERG" PROPERTIES ( "iceberg.database" = "doris", "iceberg.table" = "logs_1", "iceberg.hive.metastore.uris" = "thrift://10.10.10.10:9087", "iceberg.catalog.type" = "HIVE_CATALOG" ) \| +--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` 2. For Iceberg database, you can view it with `help show table creation`. ```sql mysql> show table creation from iceberg_db; +--------+---------+---------------------+---------------------------------------------------------+ \| Table \| Status \| Create Time \| Error Msg \| +--------+---------+---------------------+---------------------------------------------------------+ \| logs \| fail \| 2021-12-14 13:50:10 \| Cannot convert unknown type to Doris type: list<string> \| \| logs_1 \| success \| 2021-12-14 13:50:10 \| \| +--------+---------+---------------------+---------------------------------------------------------+ 2 rows in set (0.00 sec) ``` This is a new syntax. Show table creation records in Iceberg database: Syntax: ```sql SHOW TABLE CREATION [FROM db] [LIKE mask] ```	2022-01-27 10:22:47 +08:00
HappenLee	015371ac72	[fix](grouping-set) Fix the bug of grouping set core in both vec and non vec query engine (#7800 )	2022-01-26 16:15:30 +08:00
Zhengguo Yang	4bdeef3b64	[chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804 ) 1. fix problems when build fe_plugins 2. format 3. add docs about dump data using mysql dump	2022-01-26 09:11:23 +08:00
zhengshengjun	b435a54304	[fix] Consider backend status when more than one backends exists in same host (#7784 )	2022-01-26 09:10:34 +08:00
qiye	461b352d3e	[fix](function) Change digital_masking function arg type to BIGINT (#7888 ) Change digital_masking function arg type to BIGINT to fix the wrong result.	2022-01-25 22:28:05 +08:00
luozenglin	ee0037e1af	[fix](ldap) fix ldap password logic (#7862 ) 1. write ldap info to image; 2. optimizing LdapClient class thread safety.	2022-01-25 09:59:24 +08:00
Mingyu Chen	8aa9faa7cb	[chore](docker) Add docker dev image with ldb-toolchain (#7838 ) Add docker images `apache/incubator-doris:build-env-ldb-toolchain-latest`, which is built with ldb-toolchain	2022-01-24 21:12:15 +08:00
weizuo93	be9ebbc14d	[fix] Fix bug that wrong exception message returned by insert statement (#7832 ) `insert` statement may return exception message `Execute timeout` after load data failed. But the real reason is that there exists unhealthy backend, not execute timeout. ``` MySQL [ssb]> insert into lineorder_flat select * from lineorder_flat; ERROR 1105 (HY000): errCode = 2, detailMessage = Execute timeout ```	2022-01-24 21:11:45 +08:00
wudi	60c6bb4f92	[Feature][flink-connector] support flink delete option (#7457 ) * Flink Connector supports delete option on Unique models Co-authored-by: wudi <wud3@shuhaisc.com>	2022-01-23 20:24:41 +08:00
Zhengguo Yang	d1b1723c74	[fix](export) fix export failed when table has hidden columns (#7813 ) fix export failed when table has hidden columns	2022-01-22 10:19:15 +08:00
weizuo93	ed39ff1500	[feature](compaction) Support triggering compaction for a specific partition manually (#7521 ) Add statement to trigger cumulative or base compaction for a specified partition.	2022-01-21 09:27:06 +08:00
Mingyu Chen	ef984a6a72	[improvement](load) Improve load fault tolerance (#7674 ) Currently, if we encounter a problem with a replica of a tablet during the load process, such as a write error, rpc error, -235, etc., it will cause the entire load job to fail, which results in a significant reduction in Doris' fault tolerance. This PR mainly changes: 1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job. 2. fix a bug introduced from #7754 that may cause BE coredump	2022-01-20 09:23:21 +08:00
Mingyu Chen	5fc0a9f40d	[improvement](Load) Cancel the load job ASAP when encounter unqualified data (#6319 ) This PR mainly changes: 1. Help to Cancel the load job ASAP when encounter unqualified data. Solution is described in #6318 . Also replace some std::stringstream with fmt::memory_buffer to avoid performance issues. 2. fix a NPE bug when create user with empty host 3. fix compile warning after rebasing the master(vectorization)	2022-01-18 13:13:55 +08:00
Mingyu Chen	efb4e189df	[fix](lateral-view) Fix some lateral view bugs (#7772 ) 1. Fix bug that BE may crash when input node of TableFunctionNode has non-null column 2. Fix bug that TableFunctionNode may not return all results	2022-01-18 12:09:32 +08:00
Mingyu Chen	3494c8973b	[improvement](colocation) Add a new config to delay the relocation of colocation group (#7656 ) 1. Add a new FE config `colocate_group_relocate_delay_second` The relocation of a colocation group may involve a large number of tablets moving within the cluster. Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible. Relocation usually occurs after a BE node goes offline or goes down. This config is used to delay the determination of BE node unavailability. The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group will not be triggered. 2. Change the priority of colocate tablet repair and balance task from HIGH to NORMAL 3. Add a new FE config allow_replica_on_same_host If set to true, when creating table, Doris will allow to locate replicas of a tablet on same host. And also the tablet repair and balance will be disabled. This is only for local test, so that we can deploy multi BE on same host and create table with multi replicas.	2022-01-18 10:26:36 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
Adonis Ling	5c7863c683	[improvement](fe-unit-test) Fix port in use when the cluster starts in UT. (#7768 )	2022-01-16 10:42:56 +08:00
GoGoWen	88a3d08fee	[fix] fix NPE in SysVariableDesc::equal (#7766 )	2022-01-16 10:42:24 +08:00
shee	8b7d7e4dac	[improvement] create/drop index support if [not] exist (#7748 ) create or drop index clause support if [not] exist	2022-01-16 10:40:44 +08:00
Henry2SS	4a3cbf52e3	[fix](show-load) fix show load with the same column name in Where Clause (#7523 )	2022-01-15 09:54:43 +08:00
lovingfeel	fe80d1417f	[style] replace Chinese comments with English comments (#7732 )	2022-01-14 09:35:06 +08:00
Mingyu Chen	902ab93043	[fix](session-variable) fix bug that checkpoint may overwrite the global variables (#7526 ) We should create temporary object for some static fields when doing checkpoint, to avoid there variables to be overwritten by the checkpoint process.	2022-01-14 09:25:10 +08:00

1 2 3 4 5 ...

817 Commits