doris

Author	SHA1	Message	Date
yinzhijian	627b5ee302	[enhancement](k8s) Support fqdn mode for fe in k8s enviroment (#17329 )	2023-03-05 10:18:56 +08:00
abmdocrt	82df2ae9d8	[feature](mysql) Support secure MySQL connection to FE (#17138 ) Background: Doris currently does not support SSL connection from MySQL clients, it's not secure enough in some cases, especially access Doris via the public internet. Solution: - Use TLS1.2 protocol to encrypt information. - Implementation details * server <--- connect <--- client * if enable SSL: { * server <--- SSL connection request packet <--- client * server <--- SSL Exchange ---> client } (we will add this `if` logic part in this PR) * server ---> handshake request packet ---> client * server <--- encrypted data ---> client (this part will be realized in this PR) - reference1 https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase.html#sect_protocol_connection_phase_initial_handshake_ssl_handshake - reference2 https://www.rfc-editor.org/rfc/rfc5246 close #16313 Signed-off-by: Yukang Lian <yukang.lian2022@gmail.com> Co-authored-by: Gavin Chou <gavineaglechou@gmail.com> Co-authored-by: morningman <morningman@163.com>	2023-03-04 12:14:48 +08:00
WenYao	b5b595519a	[fix](log) use logger to replace printStackTrace() (#17382 ) Use Logger to replace printStackTrace to better locate problems.	2023-03-03 14:51:30 +08:00
Mingyu Chen	30df268c1f	[fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244 ) We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml, if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash. This CL mainly changes: Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist. Refactor the HDFSCommonBuilder so that it can return error correctly. Add BE IP info in status, so that we can get ip from error msg like: ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err: [INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends. This CL refactor this logic and also change some FE config: prefer_compute_node_for_external_table If set to true, query on external table will prefer to assign to compute node. And the max number of compute node is controlled by min_backend_num_for_external_table. If set to false, query on external table will assign to any node. min_backend_num_for_external_table Only take effect when prefer_compute_node_for_external_table is true. If the compute node number is less than this value, query on external table will try to get some mix node to assign, to let the total number of node reach this value. If the compute node number is larger than this value, query on external table will assign to compute node only.	2023-03-02 11:09:55 +08:00
yinzhijian	201cf9c8df	Revert "[enhancement](k8s) Support fqdn mode for fe in k8s enviroment (#16315 )" (#17278 ) This reverts commit 48afd77e37d63e2989cd85ab12b39a273fcd284e. There is meta problem	2023-03-02 00:44:54 +08:00
morrySnow	722755efe9	[fix](planner) change back legacy planner type coercion (#17070 ) revert legacy planner change in #16844	2023-03-01 20:55:56 +08:00
yinzhijian	48afd77e37	[enhancement](k8s) Support fqdn mode for fe in k8s enviroment (#16315 )	2023-03-01 10:54:39 +08:00
Zhengguo Yang	b51ce415e7	[Feature](load) Add submitter and comments to load job (#16878 ) * [Feature](load) Add submitter and comments to load job	2023-02-28 09:06:19 +08:00
huangzhaowei	d3a6cab716	[Fix](MySQLLoad) Fix load a big local file bug since bytebuffer from mysql packet using the same byte array (#16901 ) Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array. And the later bytes will overwrite the previous one and make wrong bytes order among the network. Copy the byte array and then fill it into network.	2023-02-28 00:06:44 +08:00
yongjinhou	c3538ca804	[Enhancement](HttpServer) Add http interface authentication (#16571 ) 1. Organize http documents 2. Add http interface authentication for FE 3. Support https interface for FE 4. Provide authentication interface 5. Add http interface authentication for BE 6. Support https interface for BE	2023-02-24 10:59:33 +08:00
amory	7229751bd9	[Improve](map-type) Add contains_null for map (#16948 ) Add contains_null for map type.	2023-02-23 20:47:26 +08:00
zhannngchen	edead494cb	[Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509 )	2023-02-23 15:47:17 +08:00
morrySnow	7956800df7	[refactor](Nereids) let type coercion same with legacy planner (#16844 ) - change for Nereids 1. add a variable length parameter to the ctor of Count for a good error reporting of Count(a, b) 2. refactor StringRegexPredicate, let it inherit from ScalarFunction 3. remove useless class TypeCollection 4. use catalog.Type.Collection to check expression arguments type 5. change type coercion for TimestampArithmetic, divide, integral divide, comparison predicate, case when and in predicate. Let them same as legacy planner. - change for legacy planner 1. change the common type of floating and Decimal from Decimal to Double	2023-02-22 17:29:37 +08:00
TengJianPing	ed05f3b480	[regression-test](fuzzy) fuzzy session variable batch_size (#16384 )	2023-02-21 17:53:19 +08:00
zhangstar333	5291f14aff	[vectorized](udf) java udf support array type (#16841 )	2023-02-20 10:00:25 +08:00
xy720	73f7979b73	[fix](struct-type) forbid struct-type to be distributed key/aggregation key and add more tests (#16626 ) This commits forbid struct and map type to be distributed key/aggregation key. The sql such as: select distinct stuct_col from struct_table will report an error.	2023-02-19 15:16:36 +08:00
xy720	45427b86be	[regression](struct-type) add more regression tests for struct and map type (#16790 ) This commit forbid struct and map column in Materialized view and add more regression tests.	2023-02-18 20:42:17 +08:00
xy720	0c56a4622c	[Feature](struct-type) Add implicitly cast for struct-type (#16613 ) Currently not support insert {1, 'a'} into struct<f1:tinyint, f2:varchar(20)> This commit will support implicitly cast the char type in the struct to varchar. Add implicitly cast for struct-type.	2023-02-15 16:55:00 +08:00
lihangyu	de85c57715	[Improve](point query) support retry different backends in PointQueryExecutor (#16380 )	2023-02-14 07:31:31 +08:00
huangzhaowei	77be0d13c3	[BugFix](Load) Add a secure path for MySql Load to load local file from fe node (#16653 ) MySql load can load fe server node, but it will cause secure issue that user use it to detect the fe node local file. For this reason, add a configuration named mysql_load_server_secure_path to set a secure path to load data. By default, load fe local file feature is disabled by this configuration.	2023-02-13 14:39:51 +08:00
huangzhaowei	f41a2055d3	[feature](Load)Remove user/password in properties for mysql load to avoid double auth. (#16073 ) Use FE cluster token to auth stream load. This auth is only open for be, and fe auth still only support http basic auth. I will use this auth for mysql load to build a no-auth stream load from fe to be. And this will avoid double auth in mysql load. More information to see the design doc.	2023-02-13 10:00:08 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
Kang	d9924c9b8e	[Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (#16514 ) 1. add limit threshold for topn runtime pushdown and key topn optimization 2. use unified session variable topn_opt_limit_threshold for all topn optimizations 3. add fuzzy support for topn_opt_limit_threshold	2023-02-10 12:56:33 +08:00
zhangdong	8758cd412f	[feature](auth)Implementing privilege management with rbac model (#16091 ) change implement of auth to rbac each user has one default role which can not be drop; if you grant priv to user,it will grant to default role , In the current pr, the user can still only have one role other than the default role, but in the future, the user and role will be many-to-many rename PaloRole,PaloAuth,PaloPrivilege to Role,Auth,Privilege	2023-02-10 12:30:49 +08:00
xy720	1b3902baa2	[Feature](Complex-type) Add struct and map type to Doris (#16444 ) This commit support: 1、Insert + select for struct/map type 2、Json stream load for struct type 3、m[key] function for map type How to use: Set the fe config to create table for struct and map type 1、admin set frontend config("enable_struct_type" = "true"); 2、admin set frontend config("enable_map_type" = "true"); #16547 Co-authored-by: xy720 <xuyang25@baidu.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: cambyzju <zhuxiaoli01@baidu.com> Co-authored-by: hucheng01 <hucheng01@baidu.com>	2023-02-10 11:00:33 +08:00
Gabriel	885fe1516f	[refactor](datev2) refine logics of auto conversion (#16552 ) * [refactor](datev2) refine logics of auto conversion * uodate * update * Revert "uodate" This reverts commit 2609a13b4022b4a603bf992fad64c133def266e0.	2023-02-10 10:06:47 +08:00
plat1ko	e1f1386395	[fix](cooldown) Rewrite update cooldown conf (#16488 ) Remove error-prone CooldownJob, and use CooldownConfHandler to update Tablet's cooldown conf. Some bug fix about cooldown.	2023-02-09 09:12:55 +08:00
Henry2SS	bb334de00f	[enhancement](load) Change transaction limit from global level to db level (#15830 ) Add transaction size quota for database Co-authored-by: wuhangze <wuhangze@jd.com>	2023-02-08 18:04:26 +08:00
Dongyang Li	dcbcec0775	[regression](fuzzy)fuzzy enable_fold_constant_by_be (#16448 ) * [fuzzy](test) fuzzy some session variables stably according to pull_request_id * fuzzy enable_fold_constant_by_be --------- Co-authored-by: stephen <hello_stephen@@qq.com>	2023-02-07 09:17:50 +08:00
huangzhaowei	1146bde695	[feature-wip](MTMV) Support refresh mtmv (#16218 ) Support using this sql to refresh mtmv manually. It can generate a mtmv task right now. ``` REFRESH MATERIALIZED VIEW test_mv_view [complete]; ``` You can use `show mtmv task` to show the latest task. In this pr, I also try to clear the mtmv tasks when drop the mtmv to make sure test suite to be right	2023-02-04 20:17:45 +08:00
xy720	b1fd124f02	[feature](struct-type/map-type) Add switch for struct and map type for creating table (#16379 ) Add switches to forbid uses creating table with struct or map column.	2023-02-03 13:46:52 +08:00
YueW	bb179b77f7	[Feature-WIP](inverted index) support array type for inverted index reader (#16355 )	2023-02-02 16:14:14 +08:00
Gabriel	17bec356a3	[Bug](decimalv3) always use `decimalv3` for `show create table` (#16295 )	2023-02-01 09:54:42 +08:00
Zhengguo Yang	ec4a56922f	[enhancement](memory) reduce memory usage for failed broker loads (#15895 ) * [enhancement](memory) reduce memory usage for failed broker loads	2023-01-30 10:22:31 +08:00
huangzhaowei	c6bc0a03a4	[feature](Load)Suppot MySQL Load Data (#15511 ) Main subtask of [DSIP-28](https://cwiki.apache.org/confluence/display/DORIS/DSIP-028%3A+Suppot+MySQL+Load+Data) ## Problem summary Support mysql load syntax as below: ```sql LOAD DATA [LOCAL] INFILE 'file_name' INTO TABLE tbl_name [PARTITION (partition_name [, partition_name] ...)] [COLUMNS TERMINATED BY 'string'] [LINES TERMINATED BY 'string'] [IGNORE number {LINES \| ROWS}] [(col_name_or_user_var [, col_name_or_user_var] ...)] [SET (col_name={expr \| DEFAULT} [, col_name={expr \| DEFAULT}] ...)] [PROPERTIES (key1 = value1 [, key2=value2]) ] ``` For example, ```sql LOAD DATA LOCAL INFILE 'local_test.file' INTO TABLE db1.table1 PARTITION (partition_a, partition_b, partition_c, partition_d) COLUMNS TERMINATED BY '\t' (k1, k2, v2, v10, v11) set (c1=k1,c2=k2,c3=v10,c4=v11) PROPERTIES ("auth" = "root:", "strict_mode"="true") ``` Note that in this pr the property named `auth` must be set since stream load need auth. I will optimize it later.	2023-01-29 14:44:59 +08:00
jakevin	7e7fd5d049	[cleanup](fe) cleanup useless code. (#16129 ) * [cleanup](Nereids): cleanup useless code. * revert ErrorCode.java	2023-01-28 18:44:43 +08:00
AKIRA	2daa5f3fef	[fix](statistics) Fix statistics related threads continuously spawn as doing checkpoint #16088	2023-01-21 07:58:33 +08:00
Mingyu Chen	726427b795	[refactor](fe) refactor and upgrade dependency tree of FE and support AWS glue catalog (#16046 ) 1. Spark dpp Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module. So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar` will not be moved into `fe/lib`, which reduce the size of FE output. 2. Modify start_fe.sh Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that when loading classes with same qualified name, it will be got from doris-fe.jar firstly. 3. Upgrade hadoop and hive version hadoop: 2.10.2 -> 3.3.3 hive: 2.3.7 -> 3.1.3 4. Override the IHiveMetastoreClient implementations from dependency `ProxyMetaStoreClient.java` for Aliyun DLF. `HiveMetaStoreClient.java` for origin Apache Hive metastore. Because I need to modified some of their method to make them compatible with different version of Hive. 5. Exclude some unused dependencies to reduce the size of FE output Now it is only 370MB (Before is 600MB) 6. Upgrade aws-java-sdk version to 1.12.31 7. Support AWS Glue Data Catalog 8. Remove HudiScanNode(no longer support)	2023-01-20 14:42:16 +08:00
minghong	74c0677d62	[fix](planner) fix bugs in uncheckedCastChild (#15905 ) 1. `uncheckedCastChild` may generate redundant `CastExpr` like `cast( cast(XXX as Date) as Date)` 2. generate DateLiteral to replace cast(IntLiteral as Date)	2023-01-19 15:51:08 +08:00
lihangyu	3894de49d2	[Enhancement](topn) support two phase read for topn query (#15642 ) This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`. TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase: 1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode. 2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine. After the second phase read, Block will contain all the data needed for the query	2023-01-19 10:01:33 +08:00
Mingyu Chen	4b49d05e97	[refactor](fe) remove type related class to fe-common to reduce java-udf jar size (#15808 )	2023-01-17 00:01:15 +08:00
pengxiangyu	58c520dbfd	[Feature](remote) Cooldown cold data to object storage only one replica (#15832 )	2023-01-14 23:58:00 +08:00
Zhengguo Yang	503b6ee4da	[chore](vulnerability) fix fe high risk vulnerability scanned by bug scanner (#15649 )	2023-01-10 17:44:18 +08:00
xueweizhang	ba177a15cb	[feature-wip](recover) new recover ddl and support show catalog recycle bin (#13067 )	2022-10-31 17:44:56 +08:00
Mingyu Chen	5af1439934	[feature](auth) support user password policy and alter user stmt (#13051 )	2022-10-11 16:37:35 +08:00
jakevin	976e7685db	[minor](*): remove redundant log and unused code. (#11620 )	2022-08-10 19:28:04 +08:00
Lightman	486cf0ebd4	[Feature] Lightweight schema change of add/drop column (#10136 ) * [Schema Change] support fast add/drop column (#49) * [feature](schema-change) support fast schema change. coauthor: yixiutt * [schema change] Using columns desc from fe to read data. coauthor: Lchangliang * [feature](schema change) schema change optimize for add/drop columns. 1.add uniqueId field for class column. 2.schema change for add/drop columns directly update schema meta Co-authored-by: yixiutt <yixiu@selectdb.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com> [Feature](schema change) fix write and add regression test (#69) Co-authored-by: yixiutt <yixiu@selectdb.com> [schema change] be ssupport that delete use newest schema add delete regression test fix regression case (#107) tmp [feature](schema change) light schema change exclude rollup and agg/uniq/dup key type. [feature](schema change) fe olapTable maxUniqueId write in disk. [feature](schema change) add rpc iface for sc add column. [feature](schema change) add columnsDesc to TPushReq for ligtht sc. resolve the deadlock when schema change (#124) fix columns from fe don't has bitmap_index flag (#134) add update/delete case construct MATERIALIZED schema from origin schema when insert fix not vectorized compaction coredump use segment cache choose newest schema by schema version when compaction (#182) [bugfix](schema change) fix ligth schema change problem. [feature](schema change) light schema change add alter job. (#1) fix be ut [bug] (schema change) unique drop key column should not light schema change [feature](schema change) add schema change regression-test. fix regression test [bugfix](schema change) fix multi alter clauses for light schema change. (#2) [bugfix](schema change) fix multi clauses calculate column unique id (#3) modify PushTask process (#217) [Bugfix](schema change) fix jobId replay cause bdbje exception. [bug](schema change) fix max col unique id repeatitive. (#232) [optimize](schema change) modify pendingMaxColUniqueId generate rule. fix compaction error * fix be ut * fix snapshot load core fix unique_id error (#278) [refact](fe) remove redundant code for light schema change. (#4) [refact](fe) remove redundant code for light schema change. (#4) format fe core format be core fix be ut modify fe meta version fix rebase error flush schema into rowset_meta in old table [refactor](schema change) refact fe light schema change. (#5) delete the change of schemahash and support get max version schema * modify for review * fix be ut * fix schema change test	2022-07-12 19:41:06 +08:00
Ashin Gau	6a54fc2fe5	[feature-wip](multi-catalog)(resubmit) add catalog level privileges (#10345 )	2022-06-23 14:10:11 +08:00
Mingyu Chen	47dba440d0	Revert "[feature-wip](multi-catalog) add CatalogPrivTable to support unified authority management of datalake (#10246 )" (#10297 ) This reverts commit 41cb4c8f9cf1b58fb33a1e46d2b7db803a15a59f.	2022-06-21 15:55:15 +08:00
Ashin Gau	41cb4c8f9c	[feature-wip](multi-catalog) add CatalogPrivTable to support unified authority management of datalake (#10246 ) Supported: 1. Change FeMetaVersion to 111, compatible with upgrade from 110. 2. Add catalog level privileges, and degrade global level privileges to catalog level if FeMetaVersion < 111. 3. Support 'show all grants', 'show roles' statement. 4. Previous version of SQL syntax. Todo: 1. three-segment format catalog.database.table in SQL syntax. 2. User document for the unified authority management of datalake. 3. LDAP services to provide authentication.	2022-06-21 10:26:50 +08:00

1 2

77 Commits