doris

Author	SHA1	Message	Date
zhengyu	6fb61b5bbc	[enhancement] (streamload) allow table in url when do two-phase commit (#15246 ) (#15248 ) Make it works even if user provide us with (unnecessary) table info in url. i.e. `curl -X PUT --location-trusted -u user:passwd -H "txn_id:18036" -H \ "txn_operation:commit" http://fe_host:http_port/api/{db}/{table}/_stream_load_2pc` can still works! Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-12-22 17:00:51 +08:00
ElvinWei	754fceafaf	[feature-wip](statistics) add aggregate function histogram and collect histogram statistics (#14910 ) Histogram statistics Currently doris collects statistics, but no histogram data, and by default the optimizer assumes that the different values of the columns are evenly distributed. This calculation can be problematic when the data distribution is skewed. So this pr implements the collection of histogram statistics. For columns containing data skew columns (columns with unevenly distributed data in the column), histogram statistics enable the optimizer to generate more accurate estimates of cardinality for filtering or join predicates involving these columns, resulting in a more precise execution plan. The optimization of the execution plan by histogram is mainly in two aspects: the selection of where condition and the selection of join order. The selection principle of the where condition is relatively simple: the histogram is used to calculate the selection rate of each predicate, and the filter with higher selection rate is preferred. The selection of join order is based on the estimation of the number of rows in the join result. In the case of uneven data distribution in the join condition columns, histogram can greatly improve the accuracy of the prediction of the number of rows in the join result. At the same time, if the number of rows of a bucket in one of the columns is 0, you can mark it and directly skip the bucket in the subsequent join process to improve efficiency. --- Histogram statistics are mainly collected by the histogram aggregation function, which is used as follows: Syntax ```SQL histogram(expr) ``` > The histogram function is used to describe the distribution of the data. It uses an "equal height" bucking strategy, and divides the data into buckets according to the value of the data. It describes each bucket with some simple data, such as the number of values that fall in the bucket. It is mainly used by the optimizer to estimate the range query. example ``` MySQL [test]> select histogram(login_time) from dev_table; +------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`login_time`) \| +------------------------------------------------------------------------------------------------------------------------------+ \| {"bucket_size":5,"buckets":[{"lower":"2022-09-21 17:30:29","upper":"2022-09-21 22:30:29","count":9,"pre_sum":0,"ndv":1},...]}\| +------------------------------------------------------------------------------------------------------------------------------+ ``` description ```JSON { "bucket_size": 5, "buckets": [ { "lower": "2022-09-21 17:30:29", "upper": "2022-09-21 22:30:29", "count": 9, "pre_sum": 0, "ndv": 1 }, { "lower": "2022-09-22 17:30:29", "upper": "2022-09-22 22:30:29", "count": 10, "pre_sum": 9, "ndv": 1 }, { "lower": "2022-09-23 17:30:29", "upper": "2022-09-23 22:30:29", "count": 9, "pre_sum": 19, "ndv": 1 }, { "lower": "2022-09-24 17:30:29", "upper": "2022-09-24 22:30:29", "count": 9, "pre_sum": 28, "ndv": 1 }, { "lower": "2022-09-25 17:30:29", "upper": "2022-09-25 22:30:29", "count": 9, "pre_sum": 37, "ndv": 1 } ] } ``` TODO: - histogram func supports parameter and sample statistics (It's got another pr) - use histogram statistics - add p0 regression	2022-12-22 16:42:17 +08:00
Ashin Gau	1520a4af6d	[refactor](resource) use resource to create external catalog (#14978 ) Use resource to create external catalog. -- HMS mysql> create resource hms_resource properties( -> "type"="hms", -> 'hive.metastore.uris' = 'thrift://172.21.0.44:7004', -> 'dfs.nameservices'='HANN', -> 'dfs.ha.namenodes.HANN'='nn1,nn2', -> 'dfs.namenode.rpc-address.HANN.nn1'='172.21.0.32:4007', -> 'dfs.namenode.rpc-address.HANN.nn2'='172.21.0.44:4007', -> 'dfs.client.failover.proxy.provider.HANN'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' -> ); -- MYSQL mysql> create resource mysql_resource properties ( -> "type"="jdbc", -> "user"="root", -> "password"="123456", -> "jdbc_url" = "jdbc:mysql://127.0.0.1:3316/doris_test?useSSL=false", -> "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar", -> "driver_class" = "com.mysql.cj.jdbc.Driver"); -- ES mysql> create resource es_resource properties ( -> "type"="es", -> "hosts"="http://127.0.0.1:29200", -> "nodes_discovery"="false", -> "enable_keyword_sniff"="true");	2022-12-22 13:45:55 +08:00
FreeOnePlus	c81a3bfe1b	[docs](compile)Add Windows compilation documentation (#15253 ) Add Windows compilation documentation	2022-12-22 10:16:58 +08:00
lexluo09	e83bab4e44	[typo](docs)add spark-doris-connector config (#15214 )	2022-12-21 14:12:41 +08:00
zhangstar333	c3712b1114	[bug](jdbc) fix error of jdbc with datetime type in oracle (#15205 )	2022-12-20 22:05:55 +08:00
jiafeng.zhang	d6b4d214ce	1.1.5 sidebar (#15206 )	2022-12-20 20:08:45 +08:00
AlexYue	821c12a456	[chore](BE) remove all useless segment group related code #15193 The segment group is useless in current codebase, remove all the related code inside Doris. As for the related protobuf code, use reserved flag to prevent any future user from using that field.	2022-12-20 17:11:47 +08:00
Luzhijing	c172e2396a	[docs](releasenote) Release Note 1.1.5 (#15182 )	2022-12-20 16:38:33 +08:00
Jibing-Li	6be5670ce9	[Feature](multi catalog)Remove enable_multi_catalog config item, open this function to public. (#15130 ) The multi-catalog feature is ready to use, remove enable_multi_catalog switch in FE config, open it to public.	2022-12-19 14:29:13 +08:00
AlexYue	b62a94ab46	[enhancement](metric)add one metric for the publish num per db (#14942 ) Add one metric to detect the publish txn num per db. User can get the relative speed of the txns processing per db using this metric and doris_fe_txn_num.	2022-12-19 14:18:11 +08:00
Gabriel	7241c156ed	[doc](decimalv3) add label for decimalv3 (#15148 )	2022-12-17 21:35:23 +08:00
Liqf	be2f1df3f1	[typo](doc) fix doc (#15132 )	2022-12-16 21:50:21 +08:00
catpineapple	63d2e85372	multi-catalog_doc (#15139 )	2022-12-16 21:49:50 +08:00
Kang	66422fc351	change datatypes order in document sidebar (#15117 )	2022-12-16 21:28:37 +08:00
Kang	03d40ad019	change version tag in jsonb doc (#15115 )	2022-12-16 17:50:21 +08:00
Gabriel	5ee5d70f51	[DOCS](Decimalv3) Add document for Decimalv3 (#15108 ) * [DOCS](Decimalv3) Add document for Decimalv3 * update	2022-12-15 21:27:51 +08:00
zy-kkk	71121deed9	[typo](docs)fix fe config en doc err (#15111 )	2022-12-15 20:27:12 +08:00
Yulei-Yang	21c2e485ae	[improvment](function) add new function substring_index (#15024 )	2022-12-15 09:54:34 +08:00
Stalary	03847b6a3a	[Feature](Api) Support operate node(fe/be). (#14904 ) Support operate node(fe/be) via http	2022-12-14 23:18:56 +08:00
gnehil	f1b2668a62	[typo](doc) Indicates that the order by feature in group_concat function is supported from version 1.2 (#15083 )	2022-12-14 21:24:06 +08:00
zy-kkk	05805a1632	[typo](docs)Add fe config `enable_new_load_scan_node` (#15075 )	2022-12-14 18:09:53 +08:00
caoliang-web	bc3a35d962	[typo](doc): modify the installation file (#15036 )	2022-12-13 23:37:33 +08:00
Liqf	271c28472a	[typo](docs)Fix doc (#15051 )	2022-12-13 23:17:41 +08:00
wxy	3d1be664b1	[feature](multi-catalog) support connecting to hive metastore with ke… (#15026 ) Support kerberos authentication on hive external catalog	2022-12-13 16:48:46 +08:00
gnehil	98ddb86ea2	[typo](doc)Update install-faq.md (#15029 ) * [typo](doc) 1.2 set java home variable	2022-12-13 15:38:25 +08:00
zy-kkk	dcede52964	[typo](docs)add be config `doris_scanner_row_bytes` (#15016 )	2022-12-13 09:25:28 +08:00
liqing-coder	38570312dd	[feature](split_by_string)support split by string function (#13741 )	2022-12-12 15:22:30 +08:00
Yulei-Yang	33349c3419	[feature](function)Support negative index for function split_part (#13914 )	2022-12-12 09:56:09 +08:00
jiafeng.zhang	614d7273f5	[typo](docs) fix get starting (#14870 ) * [typo](doc)Fix Utility Statements Doc	2022-12-12 09:31:20 +08:00
zy-kkk	22e9f5cc33	[typo](docs)fix the FE&BE config document (#14969 )	2022-12-09 21:11:30 +08:00
Liqf	0aa32ec1af	[typo](docs)optimize the BE config document (#14957 )	2022-12-09 16:05:15 +08:00
zy-kkk	9d96242242	[typo](docs)optimize the FE config document (#14899 ) * optimize the FE config document	2022-12-09 16:04:49 +08:00
Tiewei Fang	1140092211	[docs](jdbc catalog) add docs for jdbc catalog (#14924 )	2022-12-09 08:57:39 +08:00
Yulei-Yang	5768118bfa	[Improvement](multi-catalog) add show create catalog stmt (#14938 )	2022-12-09 08:56:55 +08:00
Ashin Gau	e8becaa562	[refactor](resource) unified resource user interface (#14842 ) At present, there are multiple user interface to access hdfs and s3. Each interface has its own configuration and is different, which causes confusion for users. Create resource already supports remote storage resources and resource permission management, but only `spark`/`odbc_catalog` are in use. Cloud storage resources need to be created and managed uniformly through create resource. This PR contains the following changes: 1. Add `s3`, `hdfs` and `hms` resource, and each resource contains it's own configuration items, and delete configuration items scattered in other classes. 2. Use `resource` to create `storage` tools, and use `storage` tools to access the remote file system.	2022-12-08 20:37:10 +08:00
Yulei-Yang	244bf84483	[improvement](docs) add docs for alter catalog stmt (#14789 )	2022-12-08 19:46:58 +08:00
jiafeng.zhang	b3b493fdef	[typo](docs)Java udf and Jdbc doc fix (#14927 ) * java udf doc fix	2022-12-08 18:24:27 +08:00
zhangstar333	962810b973	[Vectorized](jdbc) add check type for jdbc table (#14501 )	2022-12-08 10:27:47 +08:00
jiafeng.zhang	6b44039d58	[release notes](docs)release notes 1.2.0 (#14894 )	2022-12-07 18:53:34 +08:00
Adonis Ling	ec2539e2a3	[chore](macOS) Resolve the issue with missing python program (#14864 )	2022-12-07 15:30:12 +08:00
HB	789f1d4e3e	Fixed label_clean_interval_second's incorrect default value (#14869 )	2022-12-07 12:47:55 +08:00
Liqf	7c56b60596	[typo](docs)delete the offline fe parameters #14860	2022-12-07 08:46:25 +08:00
jiafeng.zhang	4911f9c6a1	[typo](doc)Fix Utility Statements Doc (#14859 ) * [typo](doc)Fix Utility Statements Doc	2022-12-07 08:44:40 +08:00
jiafeng.zhang	db38e7b6dc	[typo](docs)Add array function version number (#14843 ) [typo](docs)Add array function version number	2022-12-06 18:09:22 +08:00
lsy3993	5292880310	[refactor](odbc) move param to config (#14596 ) move param to config	2022-12-06 17:38:52 +08:00
jiafeng.zhang	1484de9f4f	add release notes (#14845 )	2022-12-06 11:22:24 +08:00
luozenglin	32a33c5119	[Enhancement](docs) Added grouping sets syntax for group by. (#14805 )	2022-12-06 00:20:08 +08:00
Yulei-Yang	8a834566d0	[typo](docs) fix schema change DATA_QUALITY_ERROR typo and related error msg (#14773 )	2022-12-05 09:50:20 +08:00
Yulei-Yang	852b03729f	[Improvement](meta)add IsCurrent column in show catalogs result #14700 When a user has multiple catalogs and switch several times, he may forget which catalog is using. So I add a iscurrent column in show catalogs result for help. mysql> show catalogs; +-----------+-------------+----------+-----------+ \| CatalogId \| CatalogName \| Type \| IsCurrent \| +-----------+-------------+----------+-----------+ \| 136591 \| es \| es \| \| \| 130100 \| hive \| hms \| yes \| \| 0 \| internal \| internal \| \| +-----------+-------------+----------+-----------+	2022-12-05 08:32:16 +08:00

1 2 3 4 5 ...

1692 Commits