Commit Graph

1495 Commits

Author SHA1 Message Date
50ae9e6b19 [enhancement](planner) support select table sample (#10170)
### Motivation
TABLESAMPLE allows you to limit the number of rows from a table in the FROM clause.

Used for data detection, quick verification of the accuracy of SQL, table statistics collection.

### Grammar
```
[TABLET tids] TABLESAMPLE n [ROWS | PERCENT] [REPEATABLE seek]
```

Limit the number of rows read from the table in the FROM clause, 
select a number of Tablets pseudo-randomly from the table according to the specified number of rows or percentages, 
and specify the number of seeds in REPEATABLE to return the selected samples again. 
In addition, can also manually specify the TableID, 
Note that this can only be used for OLAP tables.

### Example
Q1:
```
SELECT * FROM t1 TABLET(10001,10002) limit 1000;
```
explain:
```
partitions=1/1, tablets=2/12, tabletList=10001,10002
```
Select the specified tabletID of the t1.

Q2:
```
SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 1 limit 1000;
```
explain:
```
partitions=1/1, tablets=3/12, tabletList=10001,10002,10003
```

Q3:
```
SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 2 limit 1000;
```
explain:
```
partitions=1/1, tablets=3/12, tabletList=10002,10003,10004
```

Pseudo-randomly sample 1000 rows in t1.
Note that several Tablets are actually selected according to the statistics of the table, 
and the total number of selected Tablet rows may be greater than 1000, 
so if you want to explicitly return 1000 rows, you need to add Limit.

### Design
First, determine how many rows to sample from each partition according to the number of partitions.
Then determine the number of Tablets to be selected for each partition according to the average number of rows of Tablet,
If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
If seek is specified, it will be selected sequentially from the seek tablet of the partition.
And add the manually specified Tablet id to the selected Tablet.
2022-10-14 15:05:23 +08:00
ed73096f19 [improvemnt][doc] refine doc for unique key data model (#13319) 2022-10-14 09:55:52 +08:00
de4315c1c5 [feature](function) support initcap string function (#13193)
support `initcap` string function
2022-10-13 21:31:44 +08:00
cb300b0b39 [feature](agg) support any,any_value agg functions. (#13228) 2022-10-13 18:31:19 +08:00
fe1524a287 [Enhancement](load) remove load mem limit (#13111)
#12716 removed the mem limit for single load task, in this PR I propose to remove the session variable load_mem_limit, to avoid confusing.

For compatibility, load_mem_limit in thrift not removed, the value is set equal to exec_mem_limit in FE
2022-10-13 17:19:22 +08:00
e08ba8d573 [feature](restore) Add new property 'reserve_dynamic_partition_enable' to restore statement (#12498)
Add restore new property 'reserve_dynamic_partition_enable', which means you can
get a table with dynamic_partition_enable property which has the same value
as before the backup. before this commit, you always get a table with property
'dynamic_partition_enable=false' when restore.
2022-10-13 11:16:15 +08:00
4a5095f00d [cleanup](config) remove unused config push_write_mbytes_per_sec (#13290) 2022-10-12 15:58:04 +08:00
917d35a184 [typo](docs)Fix a document problem #13296 2022-10-12 10:08:48 +08:00
16999ef02d [Vectorized][Function] support date_trunc and countequal function (#13039) 2022-10-12 10:01:09 +08:00
022cfb6979 [typo](docs)delete duplicate document and fix some problem (#13274) 2022-10-12 09:09:05 +08:00
5af1439934 [feature](auth) support user password policy and alter user stmt (#13051) 2022-10-11 16:37:35 +08:00
48b182023f [docs](broker load) add doc for property load_parallelism (#13041) 2022-10-11 15:53:25 +08:00
230efa29dd [typo](docs)add orthogonal bitmap function note. #13078 2022-10-11 15:46:56 +08:00
9b42f7e479 [typo](docs)Modification instructions and examples for adding schema change key columns (#13280) 2022-10-11 15:42:14 +08:00
eb60976c25 [typo](docs)fix error url (#13171)
* fix error url
2022-10-11 15:41:00 +08:00
a716c74412 [typo](docs)Fix Docs Error Urls (#13176)
* fix doc
2022-10-11 15:40:03 +08:00
b1cd87d635 [typo](docs)Fix FE Configuration Jump Link 404 (#13149)
* [typo](docs)Fix FE Configuration Jump Link 404
2022-10-11 15:39:25 +08:00
6dad7ee5f5 [typo](docs) Fix jump link 404 in elastic-expansion.md (#13168)
* [typo](docs) Fix jump link 404
2022-10-11 15:38:17 +08:00
9c776c1011 [typo](docs) Fix the jump link 404 in basic usage.md (#13169)
* [typo](docs) Fix the jump link 404
2022-10-11 15:38:00 +08:00
0b9e9ac209 metadata operation fix 404 error url (#13215)
metadata operation fix 404 error url
2022-10-11 14:11:11 +08:00
6ee150755a [refactor](datax)Refactoring doris writer code (#13226)
* Refactoring doris writer code
2022-10-11 08:47:05 +08:00
e094e6ca71 [typo](docs)add hive-bitmap compile and package des #13237 2022-10-10 14:52:50 +08:00
b9516b50c1 [typo](docs)fix docs 404 url (#13157)
* fix docs 404 url
2022-10-09 20:02:48 +08:00
cfade2dfe0 [typo](docs)Fix Docs 404 Url #13175 2022-10-09 16:22:26 +08:00
dc2d33298b [chore](be config) remove config use_mmap_allocate_chunk #13196
This config is never used online and there exist bugs if enable this config. So that I remove this config and related tests.


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-10-09 16:19:59 +08:00
e5fbecc621 [typo](docs)Fix the jump link 404 in delete recover.md (#13156)
* [typo](docs)Fix the jump link 404 in delete-recover.md
2022-10-09 16:12:34 +08:00
207e913b55 fix the bad link fo delete-recover.md (#13203)
fix the bad link fo delete-recover.md
2022-10-09 16:08:19 +08:00
8f36f8b83a Add be Parameter Description(#13201)
Add be Parameter Description
2022-10-09 12:49:57 +08:00
e6f4c771d9 [fix](docs) fix trim, lower, upper function docs error (#13179) 2022-10-09 10:32:26 +08:00
555f9520e3 fix community module error url (#13182)
fix community module error url
2022-10-09 10:27:02 +08:00
c53d2d6a8b install deploy doc fix (#13177)
install deploy doc fix
2022-10-09 10:26:28 +08:00
e0044e5a5f [typo](docs)Sql doc link fix (#13151)
* sql doc link fix
2022-10-09 09:26:00 +08:00
ece4a6c194 [doc][fix](multi-catalog) add doc for multi catalog and fix refresh bug (#13097)
1. Add all document about multi catalog feature.
2. Fix a bug that REFRESH edit log is not handled
2022-10-09 09:14:44 +08:00
344377beb7 [typo](docs)Fix jump link 404 in jdbc load.md (#13170) 2022-10-08 20:01:52 +08:00
86e47650cf Update outfile.md (#13172) 2022-10-08 20:01:20 +08:00
4386f41442 sql server 2017 version ODBC usage instructions (#13178)
sql server 2017 version ODBC usage instructions
2022-10-08 20:00:53 +08:00
6b0410450b [typo](docs)Fix jump link 404 in external storage load.md (#13173) 2022-10-08 19:59:44 +08:00
71399ed771 fix data cache sidebar error (#13137)
fix data cache sidebar error
2022-10-07 17:45:21 +08:00
d902e80d6d [docs](unique-key-merge-on-write) add document for unique key merge o… (#13068) 2022-10-07 16:18:04 +08:00
447aceb223 [Fix](doc) Remove unsupported parameter (#13081) 2022-10-07 16:10:00 +08:00
f2aa6e9a21 [doc](typo): fix typo (#13130) 2022-10-06 18:10:41 +08:00
90512ebd59 [typo](docs)Metadata Operations and Maintenance link error (#13090)
* Metadata Operations and Maintenance link error
2022-10-05 22:58:24 +08:00
e00124d825 [typo](doc) Modify the comment of light schema change (#13061) 2022-10-04 21:28:11 +08:00
0c67b14b6d [typo](doc) replace unuse parameter max_base_compaction_concurrency (#13047) 2022-10-04 21:27:38 +08:00
5092ef78da [doc] Add python env for Mac M1 (#12792)
For Mac M1, the default is python3 instead of python.
When FE compiles, there will be an error that python cannot be found.
This PR complements this part of the description.
2022-10-04 21:24:08 +08:00
fef1062835 [optimization](array-type) optimize the help docs of array type (#13001)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-29 14:36:32 +08:00
29fc167548 [Bug](Datax)Fix bug that the dataxwriter will drop column when convert map to json (#13042)
* fix bug that when value is null,toJSONString will drop this key value.
2022-09-29 11:37:10 +08:00
819aecb26c [DOC](datev2) Add documents for DateV2 (#12976) 2022-09-28 14:36:26 +08:00
Pxl
9607f60845 [Feature](serialize) move block_data_version to fe heart beat (#12667)
Move block_data_version from be config to fe heart beat
2022-09-27 18:25:54 +08:00
907494760d [typo](docs)Add bitmap_count doc And Adjustment function list (#12978) 2022-09-27 14:21:37 +08:00