Commit Graph

1593 Commits

Author SHA1 Message Date
74f694753b Fix the en docs of benchmark (#14459) 2022-11-22 08:40:51 +08:00
b36f3d7e61 [typo](docs) fix typo in schema-change.md (#14311) 2022-11-21 13:38:47 +08:00
ce489cf723 [Feature](JDBC)support clickhouse jdbc external table (#14244) 2022-11-21 10:33:53 +08:00
98cea90950 [typo](docs)benchmark doc fix number (#14427) 2022-11-20 22:51:42 +08:00
c29975d347 [Docs](function) Add some function do not in sidebars (#14426) 2022-11-20 22:50:52 +08:00
71e80e8957 [typo](docs)Performance test documentation update (#14147)
* Performance test documentation update
2022-11-20 09:40:57 +08:00
2ccb5209a0 (improvement)[doc] add document version tag instruction (#14406) 2022-11-20 00:05:53 +08:00
f5f2e84e31 [refactor](planner) remove the limit return rows of order by (#12478)
Originally, Order By Limit returned a maximum of 65535 rows of data by default during the query,
but now many businesses do not apply this limit.
It is necessary to add larger data after the query statement to complete the full data query,
which is extremely inconvenient, so adjustments have been made.

At the same time, I added the variable DEFAULT_ORDER_BY_LIMIT to the SessionVariable,
the default value is -1, if the user does not use the LIMIT keyword or the LIMIT value is a negative integer,
the default query return value is Long.MAX_VALUE. If the corresponding maximum query value is set,
the number of data items is returned according to the maximum query value or the value followed by the
LIMIT keyword.
2022-11-19 12:45:44 +08:00
b4aef889f2 [feature-array](array-function) add array constructor function array() (#14250)
* [feature-array](array-function) add array constructor function `array()`

```
mysql>  select array(qid, creationDate) from nested_c_2  limit 10;
+------------------------------+
| array(`qid`, `creationDate`) |
+------------------------------+
| [1000038, 20090616074056]    |
| [1000069, 20090616075005]    |
| [1000130, 20090616080918]    |
| [1000145, 20090616081545]    |
+------------------------------+
10 rows in set (0.01 sec)
```
2022-11-19 10:49:50 +08:00
2c4236fd24 [improvement](ctas) use string type for varchar/char/string (#14382)
When executing create table as select stmt,
the varchar/char/string type of column in created table will be unified to string type.

Because when select from external table (mysql/pg, etc), the length of varchar in external database
is calculated by "char" length, not "byte" length.
So if there is a column with varchar(10) in external table, then there will be a same varchar(10)
in created table. But the byte length of data in external table may be larger than 10, causing failure of CTAS.

Change to string will not impact performance of the capacity of disk storage.
And notice that if a string type column is the first column, it will be changed to varchar(65535),
because we do not allow string type column as sort key column.
2022-11-18 14:20:13 +08:00
fb140d0180 [Enhancement](sequence-column) optimize the use of sequence column (#13872)
When you create the Uniq table, you can specify the mapping of sequence column to other columns.
You no longer need to specify mapping column when importing.
2022-11-17 22:39:09 +08:00
8fe5211df4 [improvement](multi-catalog)(cache) invalidate catalog cache when refresh (#14342)
Invalidate catalog/db/table cache when doing
refresh catalog/db/table.

Tested table with 10000 partitions. The refresh operation will cost about 10-20 ms.
2022-11-17 20:47:46 +08:00
a4d4fc8c02 datax doris writer doc fix (#14344) 2022-11-17 13:08:32 +08:00
0bf6d1fd79 [typo](doc)Datax doris writer doc update (#14328) 2022-11-17 08:53:55 +08:00
3259fcb790 [typo](docs) fix docs kafka-load.md (#14313) 2022-11-16 23:17:30 +08:00
70cc725649 [Vectorized](function) support avg_weighted/percentile_array/topn_wei… (#14209)
* [Vectorized](function) support avg_weighted/percentile_array/topn_weighted functions

* update add to stringRef
2022-11-15 16:38:38 +08:00
f86886f8f5 [Feature](function) Support array_compact function (#14141) 2022-11-15 14:24:37 +08:00
93e5d8e660 [Vectorized](function) support bitmap_from_array function (#14259) 2022-11-15 01:55:51 +08:00
7eed5a292c [feature-wip](multi-catalog) Support hive partition cache (#14134) 2022-11-14 14:12:40 +08:00
23a8c7eeb6 (fix)(multi-catalog)(es) Fix error result because not used fields_context (#14229)
Fix error result because not used fields_context
2022-11-14 14:00:55 +08:00
43490a33a5 [feature-array](array-type) Add array function array_with_constant (#14115)
Return array of constants with length num.

```
mysql> select array_with_constant(4, 1223);
+------------------------------+
| array_with_constant(4, 1223) |
+------------------------------+
| [1223, 1223, 1223, 1223]     |
+------------------------------+
1 row in set (0.01 sec)
```
co-authored-by @eldenmoon
2022-11-11 22:08:43 +08:00
0ba13af8ff [feature](running_difference) support running_difference function (#13737) 2022-11-11 21:22:56 +08:00
a162dab40a [feature](docs) add docs for SHOW-CATALOG-RECYCLE-BIN (#14185) 2022-11-11 15:54:05 +08:00
b6ba654f5b [Feature](Sequence) Support sequence_match and sequence_count functions (#13785) 2022-11-11 13:38:45 +08:00
7782fb63ca [docs](outfile) Add ORC to outfile document (#14153) 2022-11-11 09:42:30 +08:00
6297ef10e9 [enhancement](plugin) import audit logs for slow queries into a separate table (#14100)
* import audit logs for slow queries into a separate table
2022-11-11 09:06:01 +08:00
b62e700f4e [fix](doc): remove incubator. (#14159) 2022-11-11 08:58:42 +08:00
45a3bb87c4 [docs](recover) modify recover doc (#13904) 2022-11-10 20:20:39 +08:00
9b5b411112 [fix](schemeChange) fe oom because replicas too many when schema change (#12850) 2022-11-10 16:17:25 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
90bfd87660 [feature](function) add new function uuid() (#14092) 2022-11-10 14:55:41 +08:00
df622d8b7d [Bug](udf) fix java-udaf process string type error and add some tests (#14106) 2022-11-10 09:30:57 +08:00
55cae6202f [typo](docs)add udf doc and optimize udf regression test (#14000) 2022-11-10 09:24:45 +08:00
b74d0a4747 [feature](table-valued-function) Support desc from s3() and modify the syntax of tvf (#14047)
This pr does two things:

Support desc function s3()
modify the syntax of tvf
2022-11-09 14:12:43 +08:00
7362460525 [docs](array-type) update the docs to specify how to use array function when import data (#13995)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-11-09 12:21:26 +08:00
287c3893b9 [typo](docs)update array type doc #14057 2022-11-09 08:40:38 +08:00
a0f136a0bc [docs](odbc) fix docs for sqlserver odbc table (#14017)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-09 08:39:39 +08:00
b6f91b6eff [improvement](profile) support ordinary user to get query profile via http api (#14016) 2022-11-08 20:39:01 +08:00
f7ecb6d79f [Bug](Bitmap) fix sub_bitmap calculate wrong result to return null (#13978)
fix sub_bitmap calculate wrong result to return null
2022-11-08 14:10:12 +08:00
1c07a01038 [feature](multi-catalog) Support data on s3-compatible oss and support aliyun DLF (#13994)
Support Aliyun DLF
Support data on s3-compatible object storage, such as aliyun oss.
Refactor some interface of catalog, to make it more tidy.
Fix bug that the default text format field delimiter of hive should be \x01
Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.
2022-11-08 14:02:41 +08:00
241801ca17 [typo](doc) fix get_start doc (#14001) 2022-11-07 21:28:45 +08:00
0031304015 [typo](docs)fix config doc #14010 2022-11-07 17:00:16 +08:00
7254999f02 [typo](docs) fix docs,delete redundant words #13849 2022-11-07 13:51:10 +08:00
e8d2fb6778 [feature](function)add search functions: multi_search_all_positions & multi_match_any (#13763)
Co-authored-by: yiliang qiu <yiliang.qiu@qq.com>
2022-11-07 11:50:55 +08:00
7ffe88b579 [feature-array](array-type) Add array function array_popback (#13641)
Remove the last element from array.

```
mysql> select array_popback(['test', NULL, 'value']);
+-----------------------------------------------------+
| array_popback(ARRAY('test', NULL, 'value')) |
+-----------------------------------------------------+
| [test, NULL]                                        |
+-----------------------------------------------------+
```
2022-11-07 10:48:16 +08:00
380395a61f [doc](routineload)Common mistakes in adding routine load #13975 2022-11-05 19:17:33 +08:00
087488db3b [typo](doc) fixed spelling errors (#13974) 2022-11-05 15:40:55 +08:00
554f566217 [enhancement](compaction) introduce segment compaction (#12609) (#12866)
## Design

### Trigger

Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare.

### Target Selection

We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction.

### Compaction Process

A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings:

- build a MergeIterator from the target segments
- create a new segment writer
- for each block readed from MergeIterator, the Writer append it

### SegID handling

SegID must remain consecutive after segment compaction. 

If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4:

- we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3
- delete seg_0, seg_1, seg_2 and seg_3
- rename seg_0-3 to seg_0
- rename seg_4 to seg_1

It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.
2022-11-04 14:12:51 +08:00
1b36843664 [doc](jsonb type)add documents for JSONB datatype (#13792) 2022-11-03 19:33:51 +08:00
6ff306b1ea [docs](round) complement round function documentation (#13838) 2022-11-03 14:30:49 +08:00