Commit Graph

1728 Commits

Author SHA1 Message Date
084eec87ee [docs](docs)update en docs (#15470)
* Update basic-summary.md
2022-12-30 18:38:26 +08:00
34d7eeb571 [doc](session variable) add doc content for adding variables called rewrite_or_to_in_predicate_threshold (#15513)
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-12-30 17:11:45 +08:00
08d4dcefff [typo](doc)data partition doc including en and zh-CN #15379
Co-authored-by: Chen Jinquan 陈金泉 (690) <chenjinq@haier.com>
2022-12-30 15:38:25 +08:00
9a517d6a8f [DataType](Deciamlv3) change the avg function scale of decimalv3 (#15445) 2022-12-30 00:27:51 +08:00
73f7ccb58f [typo](docs) fix document display error in SHOW-ALTER.md and SHOW-PARTITION-ID.md and SHOW-PARTITIONS.md (#15453) 2022-12-30 00:27:22 +08:00
e2603ca883 [fix](docs) fix some docs about stream load and select. (#15372)
* [fix](docs) fix some docs about stream load and select.

* update
2022-12-29 14:50:06 +08:00
2ae28ea9dd [typo](docs)fix-doc #15438 2022-12-29 14:19:24 +08:00
4179ea31bd [typo](docs) fix typo in SHOW-ALTER.md and SHOW-LOAD-WARNINGS.md (#15431) 2022-12-29 14:19:05 +08:00
298c0a2391 [typo](docs)fix be dynamic configuration doc #15443 2022-12-29 14:18:14 +08:00
f5b4faf682 fix compile doc (#15454) 2022-12-29 14:17:53 +08:00
ffef81a6ab [feature](BE)pad missed version with empty rowset (#15030)
If all replicas of one tablet are broken, user can use this http api to pad the missed version with empty rowset.
2022-12-29 11:20:44 +08:00
a22ee89431 [Enhancement](jemalloc):support heap dump by http request at runtime (#15429) 2022-12-28 20:10:50 +08:00
75aa00d3d0 [Feature](NGram BloomFilter Index) add new ngram bloom filter index to speed up like query (#11579)
This PR implement  the new bloom filter index: NGram bloom filter index, which was proposed in  #10733.
The new index can improve the like query performance greatly, from our some test case , can  get order of magnitude  improve.
For how to use it you can check the docs in this PR, and the index based on the ```enable_function_pushdown```,
you need set it to ```true```, to make the index work for like query.
2022-12-28 18:01:50 +08:00
3aae27634a [doc](flink-connector) update flink connector faq (#15405) 2022-12-28 16:15:49 +08:00
8342691b62 [feature](remote)Add drop storage policy (#15364)
* add drop storage policy

* add drop storage policy

* add drop storage policy

* add drop storage policy
2022-12-28 16:04:30 +08:00
28bb13a026 [feature](light-schema-change) enable light schema change by default (#15344) 2022-12-28 09:29:26 +08:00
aad53d37c7 [typo](docs)fix doris docs 404 link (#15400) 2022-12-27 22:57:40 +08:00
b3f77a2e00 [feature](Show) add one show type cast command (#15137) 2022-12-27 14:19:04 +08:00
6d851b1fc9 [Doc](Flink) update flink connector doc add new version #15365
Co-authored-by: wudi <>
2022-12-27 14:15:49 +08:00
777b0b94bb [typo](docs) fix wrong date format (#15363)
fix wrong date format
2022-12-27 11:45:05 +08:00
c3d0e2931a [typo](docs) fix version tag for docs of s3 token (#15362) 2022-12-26 19:23:43 +08:00
8b6e4e74e7 [improvement](jdbc) add default jdbc driver's dir (#15346)
Add a new config "jdbc_drivers_dir" for both FE and BE.
User can put jdbc drivers' jar file in this dir, and only specify file name in "driver_url" properties
when creating jdbc resource.
And Doris will find jar files in this dir.

Also modify the logic so that when the jdbc resource is modified, the corresponding jdbc table
will get the latest properties.
2022-12-26 11:51:12 +08:00
bf71943605 [feature](load) stream load trim double quotes for csv (#15241) 2022-12-26 11:45:54 +08:00
b7768a928d [Improvement](S3) support access s3 via temporary security credentials (#15340) 2022-12-26 00:31:55 +08:00
f821dbc9f2 [doc] enable_new_load_scan_node doc (#15347) 2022-12-25 22:51:37 +08:00
001153ab38 [Improvement](multi-catalog) support hive external tables which store data on tencent chdfs (#15297)
* support read hive table whichs store data on tencent chdfs in multi-catalog
2022-12-25 21:57:18 +08:00
fe9571c2fd [typo](docs) fix typo in get-starting.md (#15345) 2022-12-25 21:56:44 +08:00
0cda82ad5a [typo](docs) fix typo in tablet-repair-and-balance (#15341) 2022-12-25 09:48:16 +08:00
fd764b3ccd [fix](fe)add session variable group_concat_max_len (#15254) 2022-12-24 20:07:14 +08:00
907cbcde69 [doc](compile) update docker compile image version (#15300)
Add new docker compile image tag: apache/doris:build-env-for-1.2
2022-12-24 15:28:03 +08:00
cf9217c0ca [typo](docs)fix 404 err to Monitoring and alarming doc #15324 2022-12-23 22:15:54 +08:00
ef3da105c9 [DOCS](refactor) refine en docs (#15244)
* Update basic-summary.md

* Update README.md
2022-12-23 16:47:51 +08:00
00fd5b1b1c [typo](doc) update Paxos spell mistake (#15171) 2022-12-23 16:47:12 +08:00
764b1db097 [fix](s3 outfile) Add theuse_path_style parameter for s3 outfile (#15288)
Currently, `outfile` did not support `use_path_style` parameter and use `virtual-host style` by default,
however some Object-storage may only support `use_path_style` access mode.

This pr add the`use_path_style` parameter for s3 outfile, so that different object-storage can use different access mode.
2022-12-23 16:22:06 +08:00
cb295de981 [Bug](decimalv3) Fix wrong precision of DECIMALV3 (#15302)
* [Bug](decimalv3) Fix wrong precision of DECIMALV3

* update
2022-12-23 14:11:08 +08:00
df5969ab58 [Feature] Support function roundBankers (#15154) 2022-12-22 22:53:09 +08:00
6fb61b5bbc [enhancement] (streamload) allow table in url when do two-phase commit (#15246) (#15248)
Make it works even if user provide us with (unnecessary) table info in url.
i.e. `curl -X PUT --location-trusted -u user:passwd -H "txn_id:18036" -H \
"txn_operation:commit" http://fe_host:http_port/api/{db}/{table}/_stream_load_2pc`
can still works!

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-12-22 17:00:51 +08:00
754fceafaf [feature-wip](statistics) add aggregate function histogram and collect histogram statistics (#14910)
**Histogram statistics**

Currently doris collects statistics, but no histogram data, and by default the optimizer assumes that the different values of the columns are evenly distributed. This calculation can be problematic when the data distribution is skewed. So this pr implements the collection of histogram statistics.

For columns containing data skew columns (columns with unevenly distributed data in the column), histogram statistics enable the optimizer to generate more accurate estimates of cardinality for filtering or join predicates involving these columns, resulting in a more precise execution plan.

The optimization of the execution plan by histogram is mainly in two aspects: the selection of where condition and the selection of join order. The selection principle of the where condition is relatively simple: the histogram is used to calculate the selection rate of each predicate, and the filter with higher selection rate is preferred.

The selection of join order is based on the estimation of the number of rows in the join result. In the case of uneven data distribution in the join condition columns, histogram can greatly improve the accuracy of the prediction of the number of rows in the join result. At the same time, if the number of rows of a bucket in one of the columns is 0, you can mark it and directly skip the bucket in the subsequent join process to improve efficiency.

---

Histogram statistics are mainly collected by the histogram aggregation function, which is used as follows:

**Syntax**

```SQL
histogram(expr)
```

> The histogram function is used to describe the distribution of the data. It uses an "equal height" bucking strategy, and divides the data into buckets according to the value of the data. It describes each bucket with some simple data, such as the number of values that fall in the bucket. It is mainly used by the optimizer to estimate the range query.

**example**

```
MySQL [test]> select histogram(login_time) from dev_table;
+------------------------------------------------------------------------------------------------------------------------------+
| histogram(`login_time`)                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------+
| {"bucket_size":5,"buckets":[{"lower":"2022-09-21 17:30:29","upper":"2022-09-21 22:30:29","count":9,"pre_sum":0,"ndv":1},...]}|
+------------------------------------------------------------------------------------------------------------------------------+
```
**description**

```JSON
{
    "bucket_size": 5, 
    "buckets": [
        {
            "lower": "2022-09-21 17:30:29", 
            "upper": "2022-09-21 22:30:29", 
            "count": 9, 
            "pre_sum": 0, 
            "ndv": 1
        }, 
        {
            "lower": "2022-09-22 17:30:29", 
            "upper": "2022-09-22 22:30:29", 
            "count": 10, 
            "pre_sum": 9, 
            "ndv": 1
        }, 
        {
            "lower": "2022-09-23 17:30:29", 
            "upper": "2022-09-23 22:30:29", 
            "count": 9, 
            "pre_sum": 19, 
            "ndv": 1
        }, 
        {
            "lower": "2022-09-24 17:30:29", 
            "upper": "2022-09-24 22:30:29", 
            "count": 9, 
            "pre_sum": 28, 
            "ndv": 1
        }, 
        {
            "lower": "2022-09-25 17:30:29", 
            "upper": "2022-09-25 22:30:29", 
            "count": 9, 
            "pre_sum": 37, 
            "ndv": 1
        }
    ]
}
```

TODO:
- histogram func supports parameter and sample statistics (It's got another pr)
- use histogram statistics
- add  p0 regression
2022-12-22 16:42:17 +08:00
1520a4af6d [refactor](resource) use resource to create external catalog (#14978)
Use resource to create external catalog.
-- HMS
mysql> create resource hms_resource properties(
    -> "type"="hms",
    -> 'hive.metastore.uris' = 'thrift://172.21.0.44:7004',
    -> 'dfs.nameservices'='HANN',
    -> 'dfs.ha.namenodes.HANN'='nn1,nn2',
    -> 'dfs.namenode.rpc-address.HANN.nn1'='172.21.0.32:4007',
    -> 'dfs.namenode.rpc-address.HANN.nn2'='172.21.0.44:4007',
    -> 'dfs.client.failover.proxy.provider.HANN'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
    -> );

-- MYSQL
mysql> create resource mysql_resource properties (
    -> "type"="jdbc",
    -> "user"="root",
    -> "password"="123456",
    -> "jdbc_url" = "jdbc:mysql://127.0.0.1:3316/doris_test?useSSL=false",
    -> "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar",
    -> "driver_class" = "com.mysql.cj.jdbc.Driver");

-- ES
mysql> create resource es_resource properties (
    -> "type"="es",
    -> "hosts"="http://127.0.0.1:29200",
    -> "nodes_discovery"="false",
    -> "enable_keyword_sniff"="true");
2022-12-22 13:45:55 +08:00
c81a3bfe1b [docs](compile)Add Windows compilation documentation (#15253)
Add Windows compilation documentation
2022-12-22 10:16:58 +08:00
e83bab4e44 [typo](docs)add spark-doris-connector config (#15214) 2022-12-21 14:12:41 +08:00
c3712b1114 [bug](jdbc) fix error of jdbc with datetime type in oracle (#15205) 2022-12-20 22:05:55 +08:00
d6b4d214ce 1.1.5 sidebar (#15206) 2022-12-20 20:08:45 +08:00
821c12a456 [chore](BE) remove all useless segment group related code #15193
The segment group is useless in current codebase, remove all the related code inside Doris. As for the related protobuf code, use reserved flag to prevent any future user from using that field.
2022-12-20 17:11:47 +08:00
c172e2396a [docs](releasenote) Release Note 1.1.5 (#15182) 2022-12-20 16:38:33 +08:00
6be5670ce9 [Feature](multi catalog)Remove enable_multi_catalog config item, open this function to public. (#15130)
The multi-catalog feature is ready to use, remove enable_multi_catalog switch in FE config, open it to public.
2022-12-19 14:29:13 +08:00
b62a94ab46 [enhancement](metric)add one metric for the publish num per db (#14942)
Add one metric to detect the publish txn num per db. User can get the relative speed of the txns processing per db using this metric and doris_fe_txn_num.
2022-12-19 14:18:11 +08:00
7241c156ed [doc](decimalv3) add label for decimalv3 (#15148) 2022-12-17 21:35:23 +08:00
be2f1df3f1 [typo](doc) fix doc (#15132) 2022-12-16 21:50:21 +08:00
63d2e85372 multi-catalog_doc (#15139) 2022-12-16 21:49:50 +08:00