Commit Graph

11783 Commits

Author SHA1 Message Date
4cbd99ad9b [pipeline](ckb) trigger new ckb pipeline, even pr id also run (#21661)
* [pipeline](ckb) also trigger new ckb pipeline

* [pipeline](ckb) all pr run ckb pipeline

* change required

---------

Co-authored-by: stephen <hello-stephen@qq.com>
2023-07-11 15:24:26 +08:00
b2c7a4575c [Bug](dynamic table) set all CreateTableStmt from cup parser dynamic table flag false (#21706) 2023-07-11 15:23:27 +08:00
d0eb4d7da3 [Improve](hash-fun)improve nested hash with range #21699
Issue Number: close #xxx

when cal array hash, elem size is not need to seed hash
hash = HashUtil::zlib_crc_hash(reinterpret_cast<const char*>(&elem_size),
                                                   sizeof(elem_size), hash);
but we need to be care [[], [1]] vs [[1], []], when array nested array , and nested array is empty, we should make hash seed to
make difference
2. use range for one hash value to avoid virtual function call in loop.
which double the performance. I make it in ut

column: array[int64]
50 rows , and single array has 10w elements
2023-07-11 14:40:40 +08:00
cb69349873 [regression] add bitmap filter p1 regression case (#21591) 2023-07-11 14:27:03 +08:00
Pxl
bb88df3779 [regression-test](agg-state) change set to set global enable_agg_state (#21708)
When there are multiple fe, we need set global to set the session variable of all fe
2023-07-11 14:15:54 +08:00
7a758f7944 [enhancement](mysql) Add have_query_cache variable to be compatible with old mysql client (#21701) 2023-07-11 14:05:40 +08:00
8d98f2ac7e [fix](errCode) Change the error code of a read-only variable (#21705) 2023-07-11 14:05:18 +08:00
5ed42705d4 [fix](jdbc scan) 1=1 does not translate to TRUE (#21688)
For most database systems, they recognize where 1=1 but not where true, so we should send the original 1=1 to the database
2023-07-11 14:04:49 +08:00
d3be10ee58 [improvement](column) Support for the default value of current_timestamp in microsecond (#21487) 2023-07-11 14:04:13 +08:00
Pxl
ca71048f7f [Chore](status) avoid empty error msg on status (#21454)
avoid empty error msg on status
2023-07-11 13:48:16 +08:00
5a15967b65 [fix](sparkdpp) Change spark dpp default version to 1.2-SNAPSHOT (#21698) 2023-07-11 10:49:53 +08:00
8eae31002d [fix](regression)update some case with timediff (#21697)
Because this pr introduces scale. However, fe current constant folding is incomplete, so the exact type cannot be deduced
2023-07-11 09:55:13 +08:00
7b403bff62 [feature](partial update)support insert new rows in non-strict mode partial update with nullable unmentioned columns (#21623)
1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table
2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable
2023-07-11 09:38:56 +08:00
736d6f3b4c [improvement](timezone) support mixed uppper-lower case of timezone names (#21572) 2023-07-11 09:37:14 +08:00
47dd2db292 [doc](fix) storage policy fe conf doc (#21679)
* [doc](fix) storage policy fe conf doc
2023-07-11 09:16:58 +08:00
f87a3ccba2 [fix](runtime_filter) runtime_profile was not initialized in multi_cast_data_stream_source (#21690) 2023-07-11 00:16:29 +08:00
d59c21e594 [test](spill) disable fuzzy spill variables for now (#21677)
we will rewrite this logic, so that it is useless now. Not test it anymore.
2023-07-10 22:28:41 +08:00
307149dc35 [pipeline](task_queue) remove disable steal in task queue to speed up query (#21692)
TPCH Q9

before: 2.74s
after: 2.33s
2023-07-10 22:21:56 +08:00
f5641b59ae [typo](docs) Fixed a typo that changed "不再分区范围内的数据" to "不在分区范围内的数据" (#21655) 2023-07-10 22:17:53 +08:00
90bebc57b9 [docs]Update upgrade.md #21658
* Update upgrade.md

* Update upgrade.md
2023-07-10 22:17:05 +08:00
24290799c4 [improvement](tpch) run-tpch-query.sh add analyze database with sync and calculate total time (#21652)
* run-tpch-query shell add analyze database with sync and calculate total time

* run-tpch-query shell add analyze database with sync and calculate total time
2023-07-10 22:04:57 +08:00
8973610543 [feature](datetime) "timediff" supports calculating microseconds (#21371) 2023-07-10 19:21:32 +08:00
202a5c636f [fix](create table) modify varchar default length 1 to 65533 (#21302)
*modify archer default length 1 to  varchar.max.length , when create table.*

```mysql
create table t2 (             
k1 CHAR,              
K2 CHAR(10) ,               
K3 VARCHAR ,             
 K4 VARCHAR(1024) )              
duplicate key (k1)              
distributed by hash(k1) buckets 1              
properties('replication_num' = '1');  

desc t2;
```

| Field | Type           | Null | Key   | Default | Extra |
| -- |--|--| -| -| -| 
| k1    | CHAR(1)        | Yes  | true  | NULL    |       |
| K2    | CHAR(10)       | Yes  | false | NULL    | NONE  |
| K3    | VARCHAR(65533) | Yes  | false | NULL    | NONE  |
| K4    | VARCHAR(1024)  | Yes  | false | NULL    | NONE  |
2023-07-10 17:57:21 +08:00
36524f2b72 [improvement](functions) avoid copying of block in create_block_with_nested_columns (#21526)
avoid copying of block in create_block_with_nested_columns
2023-07-10 17:21:23 +08:00
842fe00157 [enhancement](flush) make writer write fail status visible (#21530) 2023-07-10 17:14:33 +08:00
2b04fa604c fix: toCalendar should use Calendar.MONTH instead MONDAY (#21665) 2023-07-10 16:49:42 +08:00
0be349e250 [feature](jdbc) Support jdbc catalog to read json types (#21341) 2023-07-10 16:21:00 +08:00
1a08c81adc [Profile](runtimefilter) fix merge time of runtime filter (#21654) 2023-07-10 16:16:05 +08:00
a1a8ee8320 [enchancement](stats) Inject partition statistics #21543
The cost estimation can be more accurate if the statistics of partition are available. But we are running big data like 1T, can not really import.

So now we want to extend this by injecting partition statistics.

Syntax:

ALTER TABLE table_name MODIFY COLUMN column_name SET STATS ('stat_name' = 'stat_value', ...)
  [ PARTITION (partition_name) ];
Explanation:

- Table_name: The table to which the statistics are dropped. It can be a db_name.table_name form.
Column_name: Specified target column. table_name Must be a column that exists in. Statistics can only be modified one column at a time.

- Stat _ name and stat _ value: The corresponding stat name and the value of the stat info. Multiple stats are comma separated. Statistics that can be modified include row_count, ndv, num_nulls min_value max_value, and data_size.

- Partition_name: specifies the target partition. Must be a partition existing in table_name. Multiple partitions are separated by commas.
2023-07-10 15:06:25 +08:00
7d4c47e250 [Enhancement](Compaction) Caculate all committed rowsets delete bitmaps when do comapction (#20907)
Here we will calculate all the rowsets delete bitmaps which are committed but not published to reduce the calculation pressure of publish phase.

Step1: collect this tablet's all committed rowsets' delete bitmaps.

Step2: calculate all rowsets' delete bitmaps which are published during compaction.

Step3: write back updated delete bitmap and tablet info.
2023-07-10 14:06:11 +08:00
9f3bc11b04 [improvement](ssb) run-ssb-queries.sh and run-ssb-flat-queries.sh add analyze database with sync and calculate total time #21653 2023-07-10 11:45:45 +08:00
f9c56d59fc [improvement](statistics)Support external table show table stats, modify column stats and drop stats (#21624)
Support external table show table stats, modify column stats and drop stats.
2023-07-10 11:33:06 +08:00
Pxl
77336bff44 [Bug](materialized-view) adjust limit for create materialized view on uniq/agg table (#21580)
adjust limit for create materialized view on uniq/agg table
2023-07-10 10:04:17 +08:00
ee9822fa7e [Fix](pipeline) fix ExchangeSinkBuffer request id memory alloc problem (#21647)
Co-authored-by: airborne12 <airborne12@gmail.com>
fix ExchangeSinkBuffer request id memory alloc problem
2023-07-09 23:45:28 +08:00
469c8b7ece [Fix](JSON LOAD)fix json load issue when string conform with RFC 4627 #21390
should set: enable_simdjson_reader=false in master as master enable_simdjson_reader=true by default.

Issue Number: close #21389

from rapidjson:

Query String
In addition to GetString(), the Value class also contains GetStringLength(). Here explains why:

According to RFC 4627, JSON strings can contain Unicode character U+0000, which must be escaped as "\u0000". The problem is that, C/C++ often uses null-terminated string, which treats \0 as the terminator symbol.

To conform with RFC 4627, RapidJSON supports string containing U+0000 character. If you need to handle this, you can use GetStringLength() to obtain the correct string length.

For example, after parsing the following JSON to Document d:

{ "s" : "a\u0000b" }
The correct length of the string "a\u0000b" is 3, as returned by GetStringLength(). But strlen() returns 1.

GetStringLength() can also improve performance, as user may often need to call strlen() for allocating buffer.

Besides, std::string also support a constructor:

string(const char* s, size_t count);
which accepts the length of string as parameter. This constructor supports storing null character within the string, and should also provide better performance.
2023-07-09 17:16:03 +08:00
41fb3d5fa4 [opt](Nereids): Join use List<Plan> as children (#21608)
Join use List as children can avoid to construct extra ImmutableList
2023-07-09 17:11:55 +08:00
d9974e6337 [Chore](Job)Fix the wrong log when the export job reads fields and add more clear log information (#21490)
* [Chore](Job)Fix the wrong log when the export job reads fields and add more clear log information

* add OriginStatement .toString method
2023-07-09 17:06:38 +08:00
779b675e9d [test](fix) Case bug (#21518)
* add sync after streamLoad
2023-07-09 16:52:40 +08:00
cf1efce824 [fix](inverted index) use index id instead of column uid to determine whether a hard link is required when build index (#21574)
Fix problem:
For the same column, there are concurrent drop index request and build index request, if build index obtain lock before drop index, build a new index file, but when drop index request execute, link file not contains all index files for the column, that lead to new index file is missed.

Based on the above questions, use index id instead of column unique id to determine whether a hard link is required when do build index
2023-07-09 16:45:27 +08:00
6b945680a7 [Improve](point query) audit point query (#21587) 2023-07-09 16:43:41 +08:00
bf61d2cfc0 [fix](sink) fix pipeline load stuck #21636 2023-07-09 16:27:11 +08:00
015426b2b4 [fix](tablet report) fix fe can not update replica's status with be's report #21600 2023-07-09 16:23:18 +08:00
aacb9b9b66 [Enhancement](binlog) Add create/drop table, add/drop paritition && alter job, modify columns binlog support (#21544) 2023-07-09 09:11:56 +08:00
c36cd18a08 [docs](docs)add more explanation for Fe config (#21627)
add more explanation for Fe config
2023-07-09 08:46:37 +08:00
f2fb23e98f [pipeline](exec) disable pipeline load in now version (#21632) 2023-07-09 01:00:06 +08:00
1b226ff8a2 [refactor](load) remove FlushContext from SegmentWriter (#21596)
* [refactor](load) remove FlushContext from SegmentWriter

* remove unused imports
2023-07-08 22:44:56 +08:00
c58d5cd81b [opt](regression case) add more index change regression case (#21633) 2023-07-08 22:23:09 +08:00
f7adb6507e [Fix](storage engine) shutdown cooldown and cold data compaction thread when engine stop (#21639)
when stop be gracefully, storage engine did not shut down cooldown and cold data compaction thread correctly.
2023-07-08 22:22:15 +08:00
f8a2c66174 [refactor](planner) refactor automatically set instance_num (#21640)
refactor automatically set instance_num
2023-07-08 21:59:17 +08:00
aad8043d44 [opt](Nereids) enable parallel scan for local phase agg (#21642)
after we forbid some cases off agg candidate plans,
all local phase agg require DistributionSpecAny for child.
So, we could enable parallel scan for it
2023-07-08 21:47:17 +08:00