Commit Graph

2947 Commits

Author SHA1 Message Date
88e08a92d8 [fix](array-type) fix the wrong result when import array element with double quotes (#12786)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-10-13 23:07:19 +08:00
de4315c1c5 [feature](function) support initcap string function (#13193)
support `initcap` string function
2022-10-13 21:31:44 +08:00
cb300b0b39 [feature](agg) support any,any_value agg functions. (#13228) 2022-10-13 18:31:19 +08:00
baf2689610 [Improvement](join) compute hash values by vectorized way (#13335) 2022-10-13 16:04:58 +08:00
87793b7c00 [bugfix](datatimev2) fix value column loss precision and scale (#13233)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-13 15:39:53 +08:00
Pxl
c1ed7d4d7d [Bug](function) fix core dump on case when have 1000 condition #13315 2022-10-13 14:37:03 +08:00
830183984a [fix](hash)update_hashes_with_value method should handle if input value is null (#13332)
* [fix](hash)update_hashes_with_value method should handle if input value is null

* remove unnessasery xxHash64NullWithSeed
2022-10-13 14:36:01 +08:00
3e84c04195 [Bug](predicate) fix nullptr in scan node (#13316) 2022-10-13 12:14:42 +08:00
9b590ac4cb [improvement](olap) cache value of has_null in ColumnNullable (#13289) 2022-10-13 09:12:02 +08:00
c494ca0ed4 [enhancement](memtracker) Print query memory usage log every second when memory_verbose_track is enabled (#13302) 2022-10-13 09:11:23 +08:00
d430aec3ae [Bug](bloomfilter) fix concurrency bug caused by bloom filter (#13306) 2022-10-13 09:10:02 +08:00
Pxl
a77808e103 [Enhancement](function) optimize decimal minus and plus #13320 2022-10-13 09:00:05 +08:00
d63a80eaba [fix](bitmap_intersect) fix bitmap_intersect result error (#13298) 2022-10-12 19:12:11 +08:00
dfe308f501 [Improvement](join) refine prefetch strategy (#13286) 2022-10-12 19:02:06 +08:00
4fc7a048d2 [feature-wip](parquet-reader) fix string test and support decimal64 (#13184)
1. Refactor arguments list of parquet min max filter, pass parquet type for  min max value parsing
2. Fix the filter of string min max

Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-10-12 16:52:28 +08:00
bb4414e303 [feature-wip](multi-catalog) optimize parquet profile & add null map timer (#13257)
Use indentation to make `ParquetReader`'s profile more readable
Add `ParquetReader.DecodeNullMapTime` to show the time of parsing `NullMap` for `NullableColumn`

```
VFILE_SCAN_NODE  (id=0):(Active:  279.62ms,  %  non-child:  85.83%)
    -  FileReadBytes:  2.36  MB
    -  FileReadCalls:  20
    -  FileReadTime:  5.686ms
    -  MaxScannerThreadNum:  1
    -  NewlyCreateFreeBlocksNum:  125
    -  NumScanners:  1
    -  ParquetReader:  0ns
        -  ColumnReadTime:  259.946ms
        -  DecodeDictTime:  0ns
        -  DecodeHeaderTime:  437.707us
        -  DecodeLevelTime:  30.101us
        -  DecodeNullMapTime:  53.295ms
        -  DecodeValueTime:  62.607ms
        -  DecompressCount:  511
        -  DecompressTime:  1.159ms
        -  FilteredBytes:  0.00  
        -  FilteredGroups:  0
        -  FilteredRowsByGroup:  0
        -  FilteredRowsByPage:  0
        -  ParseMetaTime:  22.517ms
        -  ReadBytes:  2.36  MB
        -  ReadGroups:  20
```
2022-10-12 16:51:06 +08:00
b7621e1615 [feature-wip](new-scan) support csv reader (#13282)
Issue Number: close #12574
This pr adds CsvReader which implements GenericReader interface to support read csv format file.
2022-10-12 16:22:13 +08:00
4a5095f00d [cleanup](config) remove unused config push_write_mbytes_per_sec (#13290) 2022-10-12 15:58:04 +08:00
1bd14f1d82 [feature-wip](jsonb) jsonb parse function and load (#13129)
add function to parse json string to jsonb format and use it to support stream load.
2022-10-12 13:56:37 +08:00
239e5b9943 [enhancement](storage) set the segment cache capacity according to the open file limit of the process (#13269) 2022-10-12 12:10:58 +08:00
af7b6524f2 add hide config to hide config in webserver for safety. (#13255) 2022-10-12 10:27:09 +08:00
89b295c6cc [enhancement](memory) Print memory usage log when memory allocation fails (#13301) 2022-10-12 10:08:25 +08:00
16999ef02d [Vectorized][Function] support date_trunc and countequal function (#13039) 2022-10-12 10:01:09 +08:00
Pxl
5c68f69362 [improvement](config) set enable_local_exchange default value to true (#13292) 2022-10-12 09:07:24 +08:00
df54c6b63a [enhancement](memtracker) Add independent and unique scanner mem tracker for each query (#13262) 2022-10-11 19:47:12 +08:00
334708dc8c [fix](memory): avoid coredump when list pointer is null (#12919) 2022-10-11 16:00:23 +08:00
e8e171e0a3 [improvement](log) limit nums of logging disable auto compaction (#13113) 2022-10-11 15:52:56 +08:00
1724a91f53 [Bug](predicate) Cover all const predicates in scan node (#13238)
For an vectorized expression which meets the condition vexpr->is_constant(), a const column is expected to return.
But now we still don't cover all predicates for const expression.
For example, for query SELECT col FROM tbl WHERE 'PROMOTION' LIKE 'AAA%', predicate like will return a ColumnVector which contains a single value.

This PR want to cover all const predicates in scan node whether it returns a constcolumn or not
2022-10-11 15:49:53 +08:00
4e4f8afa28 [fix](array-type) fix get_data_at for zero element array #13225
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-11 15:41:34 +08:00
606b514329 [fix](olap) fix core dump caused by LikeColumnPredicate with nullable column (#13250) 2022-10-11 15:38:55 +08:00
c1ce48ffe4 [fix](new-scann) scanner may be marked close twice (#13263) 2022-10-11 15:37:15 +08:00
5757bbc9f3 fix be oom when replace with an empty old str (#13220) 2022-10-10 15:58:12 +08:00
86d55dd79c [Improvement](like function) avoid to convert const column to full column (#13214) 2022-10-10 14:19:46 +08:00
a8535e91af [Improvement](runtimefilter) DO NOT allocate memory for bbf in prepare phase (#13207) 2022-10-10 14:19:33 +08:00
Pxl
bdcb600f3d [Bug](load) fix core dump on big block load (#13014) 2022-10-10 12:38:32 +08:00
1cd4e5cec6 refractor insert_xxx functions (#13088)
As mentioned in #13074, there will be some problem in ColumnVector<int>::insert_many_in_copy_way.
Column::insert_xxx functions will append some data, they should reserve or resize before append data.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-10 11:54:27 +08:00
20b583c91e [Bug](array-type) Fix memory buffer overflow (#13074) 2022-10-10 11:42:13 +08:00
935ef5a598 [feature-wip](new-scan) Add new ES scanner and new ES scan node #13027 2022-10-10 09:56:38 +08:00
dd089259be [feature-wip](multi-catalog) Optimize the performance of boolean & dictionary decoding (#13212)
Generate vector for dictionary data.
Decode boolean values in batch.
2022-10-10 08:41:11 +08:00
3dc4dc6d43 [compaction](http_action) enable be run manual compaction concurrently (#13219)
In some case, we need to run manual compaction via http interface
concurrently, so we remove the mutex and tablet's compaction lock
is enough to prevent concurrent compaction in tablet.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-10 08:33:18 +08:00
15c7c0b754 [chore](release build) copy license and notice file to output folder and strip debug info from meta tool (#13222)
* [chore](release build) copy license and notice file to output folder and strip debug info from meta tool

Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-10-10 08:31:34 +08:00
7b2fdd26a1 [schema change](fix) fix coredump of schema change (#13183)
When schema change and compaction is executing simutaneously, both
nullable and not nullable data can be read for the same column, need to
reset _nullmap for each Block when converting Block data, or else Column
case will be wrong.
2022-10-09 19:44:00 +08:00
fc711d89c8 [fix](projections) Open the project expressions properly. (#13162)
In current 'ExecNode::open' function, the 'open(_projections)' is unreachable which might cause serious crashed. (#13150)
2022-10-09 18:43:45 +08:00
89514fc964 [fix](rowset) fix that rowset writer doesn't process the return value, which may result in data loss (#13189) 2022-10-09 17:10:11 +08:00
dc2d33298b [chore](be config) remove config use_mmap_allocate_chunk #13196
This config is never used online and there exist bugs if enable this config. So that I remove this config and related tests.


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-10-09 16:19:59 +08:00
f373b22dcf [fix](string) Fix over-allocated memory for string type (#13167)
For string/varchar/text type, the length field is fixed to 2GB. (`ColumnMetaPB`)
We don't actually have to allocate 2GB for every string type because we
will reallocate the precise size of memory for the string in
`WrapperField::from_string()`

```
    Status from_string(const std::string& value_string, const int precision = 0,
                       const int scale = 0) {
        if (_is_string_type) {
            if (value_string.size() > _var_length) {
                Slice* slice = reinterpret_cast<Slice*>(cell_ptr());
                slice->size = value_string.size();
                _var_length = slice->size;
                _string_content.reset(new char[slice->size]);
                slice->data = _string_content.get();
            }
        }
        return _rep->from_string(_field_buf + 1, value_string, precision, scale);
    }
```
2022-10-09 14:14:39 +08:00
Pxl
245490d6b7 [Enhancement](runtime filter) optimize for runtime filter (#12856)
optimize for runtime filter
2022-10-09 14:11:03 +08:00
9e42804298 [feature-wip](unique-key-merge-on-write) unique key with merge on write table support schema change (#12886) 2022-10-09 11:31:53 +08:00
671dc93035 [feature-wip](unique-key-merge-on-write) fix that versions of multiple replicas are inconsistent when rebalance (#12363) 2022-10-09 11:31:27 +08:00
b8b18e5153 [enhancement](array-type) Handle cast empty string value to array (#13028)
Handle empty value between two comma when cast string to array type.

before:
mysql> select cast("[a,b,c,,,,]" as array<string>);
+-----------------------------------+
| CAST('[a,b,c,,,,]' AS ARRAY<TEXT>) |
+-----------------------------------+
| ['a', 'b', 'c', ',', ',']                |
+-----------------------------------+
1 row in set (0.01 sec)

after:
mysql> select cast("[a,b,c,,,,]" as array<string>);
+-----------------------------------+
| CAST('[a,b,c,,,,]' AS ARRAY<TEXT>) |
+-----------------------------------+
| ['a', 'b', 'c', '', '', '']                |
+-----------------------------------+
1 row in set (0.01 sec)
2022-10-08 21:45:42 +08:00