doris

Author	SHA1	Message	Date
Mingyu Chen	39f59f554a	[improvement](dry-run)(tvf) support csv schema in tvf and add "dry_run_query" variable (#16983 ) This CL mainly changes: Support specifying csv schema manually in s3/hdfs table valued function s3 ( 'URI' = 'https://bucket1/inventory.dat', 'ACCESS_KEY'= 'ak', 'SECRET_KEY' = 'sk', 'FORMAT' = 'csv', 'column_separator' = '\|', 'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)', 'use_path_style'='true' ) Add new session variable dry_run_query If set to true, the real query result will not be returned, instead, it will only return the number of returned rows. mysql> select * from bigtable; +--------------+ \| ReturnedRows \| +--------------+ \| 10000000 \| +--------------+ This can avoid large result set transmission time and focus on real execution time of query engine. For debug and analysis purpose.	2023-03-02 16:51:27 +08:00
xueweizhang	9f088f6e90	[feature](json) add json_valid function (#17247 ) add json_valid function Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-03-02 14:08:52 +08:00
Mingyu Chen	30df268c1f	[fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244 ) We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml, if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash. This CL mainly changes: Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist. Refactor the HDFSCommonBuilder so that it can return error correctly. Add BE IP info in status, so that we can get ip from error msg like: ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err: [INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends. This CL refactor this logic and also change some FE config: prefer_compute_node_for_external_table If set to true, query on external table will prefer to assign to compute node. And the max number of compute node is controlled by min_backend_num_for_external_table. If set to false, query on external table will assign to any node. min_backend_num_for_external_table Only take effect when prefer_compute_node_for_external_table is true. If the compute node number is less than this value, query on external table will try to get some mix node to assign, to let the total number of node reach this value. If the compute node number is larger than this value, query on external table will assign to compute node only.	2023-03-02 11:09:55 +08:00
gitccl	b0c5250bf9	[Enhancement](tvf) support trim_double_quotes and skip_lines for S3 and HDFS table valued function (#17224 ) support trim_double_quotes and skip_lines for S3 and HDFS table valued function	2023-03-01 23:41:31 +08:00
Mingyu Chen	d44c4b1300	[improvement][fix](catalog) check required properties when creating catalog and fix jdbc catalog issue (#17209 ) Check required properties when creating catalog. To avoid some strange error when missing required properties This PR add checks for: hms catalog: check the validation of dfs.ha properties jdbc catalog: check jdbc_url, driver_url, driver_class is set. Fix NPE when init MasterCatalogExecutor The MasterCatalogExecutor may be called by FrontendServiceImpl from BE, which does not have ConnectionContext. Add more jdbc url param to resolve Chinese issue add useUnicode=true&characterEncoding=utf-8 by default in jdbc catalog when connecting to MySQL Update FAQ doc of catalog	2023-03-01 17:08:36 +08:00
yadao	ff8902370c	[improvement](doc) Supplementary Bulk Deletion Notes (#17113 ) * 补充批量删除注意事项 * 按照批量删除文档前文的介绍, 用户可能会开启`show_hidden_columns`的session variable来查看表是否支持批量删除. * 后续按示例进行DELETE/MERGE的导入作业后, 如果在同一个session中执行`select count() from xxx`语句时, 可能会发现结果与预期不一致可能无法快速联想到是因为之前开启的session variable导致被删除的语句也被查出来了. * supplement batch deletion notes for English doc	2023-03-01 13:35:20 +08:00
Alissa Tung	cfc2d45795	[typo](docs) fix typo (#17208 )	2023-03-01 07:41:21 +08:00
DongLiang-0	eeca16d7a0	[fix](doc)adjust Flink connector document structure and add SchemaChange example (#17231 )	2023-03-01 07:40:56 +08:00
yagagagaga	475368c62d	[typo](docs) Add some details about AES encryption. (#17243 ) * [typo](docs) Add some details about AES encryption. * Update aes.md * Update aes.md * Update aes.md * Update aes.md	2023-03-01 07:40:11 +08:00
tarepanda1024	7369261f33	[typo](docs)update hight-concurrent-point-query.md (#17248 ) Co-authored-by: liuxiaodong <liuxiaodong1@corp.netease.com>	2023-03-01 07:37:27 +08:00
Luzhijing	b0de8d1925	[doc][community]correct the number of committers (#16905 )	2023-02-28 10:48:06 +08:00
Zhengguo Yang	b51ce415e7	[Feature](load) Add submitter and comments to load job (#16878 ) * [Feature](load) Add submitter and comments to load job	2023-02-28 09:06:19 +08:00
zhannngchen	84413f33b8	[enhancement](merge-on-write) add skip_delete_bitmap session variable for debug purpose (#17127 )	2023-02-27 23:31:28 +08:00
yongjinhou	c807596c51	[Docs](docs) Modify plugin documents (#17161 ) * modify plugin docs * add qe_slow_log_ms description * add version describtion	2023-02-27 18:42:02 +08:00
Stalary	95837b7958	[Enhancement](ES): Support mapping es date format and replace simple json with jackson (#16806 ) * Support mapping es date format, default/yyyy-MM-dd HH:mm:ss/yyyy-MM-dd/epoch_millis * Replace simple json with jackson, resolve column order random problem * Add es array doc version	2023-02-27 14:47:21 +08:00
奕冷	c0360f80bb	[enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases (#15339 ) Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param, which enables to limit the number of elements in result array.	2023-02-27 14:22:30 +08:00
huangzhaowei	2626995fc1	[Doc](Load)Add mysql load document (#16483 ) * Add doc * 1 * doc2 * review again * fix comment * fix comment * format * add recommand dir * cleint --local-infile * add streaming_load_max_mb	2023-02-27 13:25:34 +08:00
WenYao	f228cfdd00	[enhancement](session-variable)add a use_fix_replica session variable to fix query replica (#17101 ) Add use_fix_replica session variable, so that we can be better debug replica inconsistencies problem. If use_fix_replica default is -1, which means not fix, else we will choose the {use_fix_replica} smallest replica.	2023-02-27 10:20:23 +08:00
DuRipeng	aefcc98715	[Enhancement](datetimev2-enhance) support 'microseconds_sub' function for datetimev2 (#17130 ) Based on #16970 , introduce microseconds_sub function for datetimev2	2023-02-27 08:47:30 +08:00
Bowen Liang	d8eb3ec6f7	fix set command example to `enable_pipeline_engine` (#17103 )	2023-02-26 11:06:04 +08:00
caoliang-web	14e80b18c8	Add csv file header filter documentation example (#17115 )	2023-02-26 11:05:45 +08:00
wangtianyi2004	32d08c9556	Update run-docker-cluster.md (#17116 )	2023-02-26 11:05:28 +08:00
Tiewei Fang	3a9aa03aab	[BugFix](oracle-catalog) Modify the doris data type mapping of oracle `NUMBER(p,s)` type (#17051 ) The data type `NUMBER(p,s)` of oracle has some different of doris decimal type in semantics. For Oracle Number(p,s) type： 1. if s<0 , it means this is an Interger. This `NUMBER(p,s)` has (p+\|s\| ) significant digit, and rounding will be performed at s position. eg: if we insert 1234567 into `NUMBER(5,-2)` type, then the oracle will store 1234500. In this case, Doris will use int type (`TINYINT/SMALLINT/INT/.../LARGEINT`). 2. if s>=0 && s<p , it just like doris Decimal(p,s) behavior. 3. if s>=0 && s>p, it means this is a decimal(like 0.xxxxx). p represents how many digits can be left to the left after the decimal point, the figure after the decimal point s will be rounded. eg: we can not insert 0.0123456 into `NUMBER(5,7)` type, because there must be two zeros on the right side of the decimal point, we can insert 0.0012345 into `NUMBER(5,7)` type. In this case, Doris will use `DECIMAL(s,s)` 4. if we don't specify p and s for `NUMBER(p,s)` like `NUMBER`, the p and s of `NUMBER` are uncertain. In this case, doris can not determine p and s, so doris can not determine data type.	2023-02-26 09:05:41 +08:00
Mingyu Chen	4093ef9e4b	[fix](auth) fix losing global priv bug and refactor default role name (#16966 ) This PR mainly changes: When upgrading from old version to master, the ADMIN_PRIV for normal user may be lost. This may only happen if: Create a user with ADMIN_PRIV privilege. Upgrade Doris to v1.2.x or master before the meta image which contains the edit log in step 1 is generate. And the ADMIN_PRIV will be lost in Global Privileges This PR will rectify this bug and set ADMIN_PRIV to the right place Refactor the user's implicit role name In [feature](auth)Implementing privilege management with rbac model #16091, we refactor the Doris auth model by introducing RBAC. And each user will have an implicit role, named with prefix default_role_rbac_. But it has wrong format like: default_role_rbac_'default_cluster:user1'@'%' This PR change the role name's format, like: default_role_rbac_user1@% default_role_rbac_user2@[domain] NOTICE: this change may cause incompatible metadata, but since [feature](auth)Implementing privilege management with rbac model #16091 is not released, we should fix it soon. Add a new session variable show_user_default_role When set to true, it will show implicit role of user in the result of show roles stmt. Default is false	2023-02-24 23:36:53 +08:00
AlexYue	c39914c0a0	[feature](partition)add default list partition (#15509 ) This pr implements the list default partition referred in related #15507. It's similar as GreenPlum's default's partition which would store all data not satisfying prior partition key's constraints and optimizer wouldn't filter default partition which means default partition would be scanned each time you try to select data from one table with default partition. User could either create one table with default partition or alter add one default partition. ```sql PARTITION LIST(key) { PARTITION p1 values in (xx,xx), PARTITION DEFAULT } ALTER TABLE XXX ADD PARTITION DEFAULT ``` We don't support automatically migrate data inside default partition which meets newly added partition key's constraint to newly add partition when alter add new partition. User should select default partition using new constraints as predicate and insert them to new partition. ```sql insert into tbl select * from tbl partition default where partition_key=xx; ```	2023-02-24 15:24:59 +08:00
yongjinhou	7470198df6	[Docs](docs) Organize http documents (#16618 ) 1. Organize http documents 2. Add http interface authentication for FE 3. Support https interface for FE 4. Provide authentication interface 5. Add http interface authentication for BE 6. Support https interface for BE	2023-02-24 15:17:01 +08:00
Pxl	03f4c7a94d	[Doc](Materialized-View) update documentation about materialized view enhancement (#17025 ) update documentation about materialized view enhancement	2023-02-24 10:06:35 +08:00
yagagagaga	37b9b038c4	[typo](docs) fix Fix incorrect url address in export-manual.md. (#17072 )	2023-02-24 09:42:28 +08:00
DuRipeng	1cce5782a0	[typo](docs) collect doc md language annotation (#17090 )	2023-02-24 09:41:54 +08:00
lsy3993	c416bfbaef	[typo](docs)fix disk format (#17050 ) * change docker compose to 'docker-compose' * modify sql of mysql * fix docker start and stop cmd * new commit * markdown format adjust	2023-02-23 20:32:05 +08:00
qiye	92ecd16573	(feature)[DOE]Support array for Doris on ES (#16941 ) * (feature)[DOE]Support array for Doris on ES	2023-02-23 19:31:18 +08:00
lihangyu	526a66e9fb	[Function](array-type) support array_apply (#17020 ) Filter array to match specific binary condition ``` mysql> select array_apply([1000000, 1000001, 1000002], '=', 1000002); +-------------------------------------------------------------+ \| array_apply(ARRAY(1000000, 1000001, 1000002), '=', 1000002) \| +-------------------------------------------------------------+ \| [1000002] \| +-------------------------------------------------------------+ ```	2023-02-23 17:38:16 +08:00
slothever	51bbae27b8	[feature-wip](iceberg) add dlf and glue catalog impl for iceberg catalog (#16602 ) iceberg catalog supports DLF on Alibaba Cloud and AWS Glue Catalog	2023-02-23 14:02:41 +08:00
yongkang.zhong	2e1ed384fd	[typo](docs) add split_by_string function 1.2.2 label (#17057 )	2023-02-23 11:17:25 +08:00
Lijia Liu	8eeb435963	[improvement](meta) Enhance Doris's fault tolerance to disk error (#16472 ) Sense io error. Retry query when io error. Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.	2023-02-23 08:40:45 +08:00
Xinyi Zou	a1c0054b4c	[fix](memory) fix memory GC details and join probe catch bad_alloc (#16989 ) Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory. vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs. join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM. Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.	2023-02-23 08:33:30 +08:00
yongkang.zhong	d7d82f26af	[typo](docs) add date_trunc function 1.2 label (#17037 )	2023-02-22 22:42:18 +08:00
yongkang.zhong	8dd1a12ea6	[typo](docs)Add upgrade precautions #17027	2023-02-22 19:27:20 +08:00
cjq9458	e48d9c9d62	[doc](typo)update datax.md #17009	2023-02-22 19:27:03 +08:00
DuRipeng	e65a061256	[Enhancement](datetimev2-enhance) support 'microseconds_add' function for datetimev2 (#16970 ) support 'microseconds_add' function for datetimev2	2023-02-22 17:49:41 +08:00
Kang	51eb147711	fix inverted index doc typo and reorganize index related docs (#16915 )	2023-02-22 15:15:10 +08:00
chenlinzhong	0e3be4eff5	[Improvement](brpc) Using a thread pool for RPC service avoiding std::mutex block brpc::bthread (#16639 ) mainly include: - brpc service adds two types of thread pools. The number of "light" and "heavy" thread pools is different Classify the interfaces of be. Those related to data transmission are classified as heavy interfaces and others as light interfaces - Add some monitoring to the thread pool, including the queue size and the number of active threads. Use these - indicators to guide the configuration of the number of threads	2023-02-22 14:15:47 +08:00
UnicornLee	16c4e42f42	[typo](doc) 字段描述与建表sql中的不一致 (#16270 ) * 字段描述与建表sql中的不一致 * 1. 英文文档将`key_desc`改为`keys_type`。 * 1. 英文文档将`partition_desc`改为`partition_info`。 --------- Co-authored-by: unicornlee@dingtalk.com <lxb@201092104>	2023-02-21 23:00:26 +08:00
wudi	085f0826f6	update (#16975 ) Co-authored-by: wudi <>	2023-02-21 22:53:49 +08:00
ElvinWei	004872c99a	[fix](doc) fix invalid urls in tpch.md (#16949 )	2023-02-21 15:45:31 +08:00
Dazhuwei	246dd65435	[fix](doc) fix export-manual.md (#16969 )	2023-02-21 15:44:41 +08:00
lihangyu	13ae8cd6c6	[doc](point query) add row cache doc for hight-concurrent-point-query (#16972 ) This code in VCollectIterator::build_heap is possible to cause double free if cumu_iter->init() fails and returns early, becuase some LevelIterator* exists both in VCollectIterator::_children and cumu_iter::_children.	2023-02-21 14:18:37 +08:00
Mingyu Chen	491d269412	[fix](tvf) fix bug that failed to get schema of tvf when file is empty (#16928 ) In previous implementation, when querying tvf, FE will get schema from BE. And BE will try to open the first file to get its schema info, but for orc or parquet format, if the file is empty, it will return error. But even for an empty file, we can still get schema info from file's footer. So we should handle the empty file to get schema info correctly. Also modify the catalog doc to add some FAQ.	2023-02-21 14:14:32 +08:00
Mingyu Chen	c0bb2e33a8	[improvement](scan) separate scanner into local and remote scanner pool (#16891 ) There are 2 kinds for scanner thread pool, local and remote. Local is for local file read, specially for olap scanner. Remote is for other external data source, such as file scanner, jdbc scanner. This PR mainly changes: For olap scanner, use cold or hot rowset to decide whether to use local or remote pool. For other scanner, user remote pool by default. Add a new BE config doris_max_remote_scanner_thread_pool_thread_num, default is 512, indicate the max thread number of the remote scanner thread pool This will alleviate the problem of interaction between olap queries with load job and external queries.	2023-02-21 14:13:09 +08:00
lihangyu	113023fb86	(Enhancement)[load-json] support simdjson in new json reader (#16903 ) be config: enable_simdjson_reader=true related PR #11665	2023-02-21 11:31:00 +08:00

1 2 3 4 5 ...

1934 Commits