Commit Graph

2906 Commits

Author SHA1 Message Date
7b2fdd26a1 [schema change](fix) fix coredump of schema change (#13183)
When schema change and compaction is executing simutaneously, both
nullable and not nullable data can be read for the same column, need to
reset _nullmap for each Block when converting Block data, or else Column
case will be wrong.
2022-10-09 19:44:00 +08:00
fc711d89c8 [fix](projections) Open the project expressions properly. (#13162)
In current 'ExecNode::open' function, the 'open(_projections)' is unreachable which might cause serious crashed. (#13150)
2022-10-09 18:43:45 +08:00
89514fc964 [fix](rowset) fix that rowset writer doesn't process the return value, which may result in data loss (#13189) 2022-10-09 17:10:11 +08:00
dc2d33298b [chore](be config) remove config use_mmap_allocate_chunk #13196
This config is never used online and there exist bugs if enable this config. So that I remove this config and related tests.


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-10-09 16:19:59 +08:00
f373b22dcf [fix](string) Fix over-allocated memory for string type (#13167)
For string/varchar/text type, the length field is fixed to 2GB. (`ColumnMetaPB`)
We don't actually have to allocate 2GB for every string type because we
will reallocate the precise size of memory for the string in
`WrapperField::from_string()`

```
    Status from_string(const std::string& value_string, const int precision = 0,
                       const int scale = 0) {
        if (_is_string_type) {
            if (value_string.size() > _var_length) {
                Slice* slice = reinterpret_cast<Slice*>(cell_ptr());
                slice->size = value_string.size();
                _var_length = slice->size;
                _string_content.reset(new char[slice->size]);
                slice->data = _string_content.get();
            }
        }
        return _rep->from_string(_field_buf + 1, value_string, precision, scale);
    }
```
2022-10-09 14:14:39 +08:00
Pxl
245490d6b7 [Enhancement](runtime filter) optimize for runtime filter (#12856)
optimize for runtime filter
2022-10-09 14:11:03 +08:00
9e42804298 [feature-wip](unique-key-merge-on-write) unique key with merge on write table support schema change (#12886) 2022-10-09 11:31:53 +08:00
671dc93035 [feature-wip](unique-key-merge-on-write) fix that versions of multiple replicas are inconsistent when rebalance (#12363) 2022-10-09 11:31:27 +08:00
b8b18e5153 [enhancement](array-type) Handle cast empty string value to array (#13028)
Handle empty value between two comma when cast string to array type.

before:
mysql> select cast("[a,b,c,,,,]" as array<string>);
+-----------------------------------+
| CAST('[a,b,c,,,,]' AS ARRAY<TEXT>) |
+-----------------------------------+
| ['a', 'b', 'c', ',', ',']                |
+-----------------------------------+
1 row in set (0.01 sec)

after:
mysql> select cast("[a,b,c,,,,]" as array<string>);
+-----------------------------------+
| CAST('[a,b,c,,,,]' AS ARRAY<TEXT>) |
+-----------------------------------+
| ['a', 'b', 'c', '', '', '']                |
+-----------------------------------+
1 row in set (0.01 sec)
2022-10-08 21:45:42 +08:00
869fe2bc5d [Improvement](outfile) Support ORC format in outfile (#13019) 2022-10-08 20:56:32 +08:00
c5f802b93c [Bug](libjvm) reorder initialization of JNI (#13165) 2022-10-08 18:53:47 +08:00
b81a8789c3 [feature-wip](parquet-reader) optimize the performance of column conversion (#13122)
Convert Parquet column into doris column via batch method.
In the previous implementation, only numeric types can be converted in batches,
and other types can only be inserted one by one.
This process will generate repeated virtual function calls and container expansion.
2022-10-08 18:03:10 +08:00
5214e898d9 [fix](parquet-reader) skip data/datatime column predicate filter to avoid coredump (#13072)
Will be fixed later
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-10-08 18:02:35 +08:00
cf2b93532b [fix](file-scanner) fix some logic about broker load with parquet with new file scanner (#13135)
Fix some logic about broker load using new file scanner, with parquet format:

1. If columns are specified in load stmt, but none of them are in parquet file,
    error will be thrown like `err: No columns found in file`. See `parquet_s3_case4`

2. If the first column of table are not in table, the result number of rows is wrong.
    See `parquet_s3_case8`

3. If column specified in `columns` in load stmt does not exist in file and table,
    error will be thrown like: `failed to find default value expr for slot: x1`. See `parquet_s3_case2`
2022-10-08 13:08:08 +08:00
91cf33865d [improvement](load) config flush_thread_num_per_store to be 6 by default (#13076)
Flushing memtable is cpu bound, so 2 thread for a disk is tool small.
2022-10-08 09:16:22 +08:00
8b03977689 fix bug that last line of data lost for stream load when line delimiter is more than one character (#13066) 2022-10-07 16:12:05 +08:00
b41748efa1 [feature-wip](new-scan)Add new jdbc scanner and new jdbc scan node (#12848)
Related pr: #11582
This pr is the new jdbc scan node and scanner.
2022-10-07 09:55:17 +08:00
441b450a79 (runtimefilter) shorter time prepare consumes (#13127)
Now, every preare put a runtime filter controller, so it takes the
mutex lock on the controller map. Init of bloom filter takes some
time in allocate and memset. If we run p1 tests with -parallel=20
-suiteParallel=20 -actionParallel=20, then we get error message like
'send fragment timeout 5s'.

The patch fixes the problem in the following 2 ways:
1. Replace one mutex block with 128s.
2. If a plan fragment does not have a runtime filter, it does not need to take
the locks.
2022-10-06 10:12:29 +08:00
218b0857ab [fix](string) allocate memory according to actual size instead of max size (#13112)
String column lengh is 2GB, if we allocate memory according to column length,
string would consume a lot of memory. It also misleads memory tracker.
2022-10-06 09:56:22 +08:00
d286aa7bf7 [fix](spark-load) no need to filter row group when doing spark load (#13116)
1. Fix issue #13115 
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
    Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
2022-10-05 23:00:56 +08:00
7b75c2df54 [fix](BE) fix the stream load error when upgrade BE from 1.1.2 to master (#13058) 2022-10-05 12:13:26 +08:00
80e1f401f0 [enhancement](memory) Fix USE_MEM_TRACKER=OFF compile (#13085) 2022-10-05 12:10:49 +08:00
984d387945 [Regression](load) Add broker load regression test. (#13062)
Add basic broker load regression test. It has been tested. But default
2022-10-04 21:29:05 +08:00
3f47f67b16 [fix](parquet) fix parquet write setting property is not effective (#12912) 2022-10-04 21:25:57 +08:00
e167aa120f [fix](jdbc) fix insert into date type to oracle using wrong type (#12883)
using JDBC insert into date type to ORACLE,
it's should be use to_date function convert string to java.sql.date
2022-10-04 21:24:33 +08:00
Pxl
db89b0b703 [Enhancement](optimize) optimize for function multiply on decimalv2 (#13049)
optimize for function multiply on decimalv2
2022-10-04 16:07:18 +08:00
026ffaf10d [feature-wip](parquet-reader) add detail profile for parquet reader (#13095)
Add more detail profile for ParquetReader:
ParquetColumnReadTime: the total time of reading parquet columns
ParquetDecodeDictTime: time to parse dictionary page
ParquetDecodeHeaderTime: time to parse page header
ParquetDecodeLevelTime: time to parse page's definition/repetition level
ParquetDecodeValueTime: time to decode page data into doris column
ParquetDecompressCount: counter of decompressing page data
ParquetDecompressTime: time to decompress page data
ParquetParseMetaTime: time to parse parquet meta data
2022-10-02 15:11:48 +08:00
8b14c4aa98 [fix](compaction) don't log cumu policy name for quick compaction (#13101) 2022-10-01 21:40:42 +08:00
3294b18674 [Improvement](datev2) fix some compatible problems for datev2 (#13079) 2022-09-30 13:56:01 +08:00
e7f18e998a [chore](be-ut) Remove useless lines which cause compilation errors (#13053) 2022-09-30 11:26:25 +08:00
d73e437718 [fix](array-type) fix the be core dump when use string to insert array (#12728)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-30 10:44:27 +08:00
287ff50a6f [Bug](datev2) Fix compatible error between datev2 and date (#13024) 2022-09-29 18:01:55 +08:00
34b14a71c8 [Improvement](string) Optimize scanning for string #12911
~0.2X performance boost for queries containing string predicates
2022-09-29 15:11:16 +08:00
c2fae109c3 [Improvement](outfile) Support output null in parquet writer (#12970) 2022-09-29 13:36:30 +08:00
bc2966ed80 [fix](like)the dictionary column should call get_shrink_value to get correct string value (#13032)
* [fix](like)the dictionary column should call get_shrink_value to get correct string value
2022-09-29 09:09:36 +08:00
36bf8ad3eb [Opt](Vec) Support const column check nullable and remove nullable (#13020) 2022-09-29 08:39:19 +08:00
a853dd3c61 [Bug](aarch64) Fix the mmap errors which make BE down during starting up (#13031) 2022-09-29 08:36:58 +08:00
820ec435ce [feature-wip](parquet-reader) refactor parquet_predicate (#12896)
This change serves the  following purposes:
1.  use ScanPredicate instead of TCondition for external table, it can reuse old code branch.
2. simplify and delete some useless old code
3.  use ColumnValueRange to save predicate
2022-09-28 21:27:13 +08:00
cd549d8a8f [improvement](scan) remove concurrency limit if scan has predicate (#13021)
If a scan node has predicate, we can not limit the concurrency of scanner.
Because we don't know how much data need to be scan.
If we limit the concurrency, this will cause query to be very slow.

For exmple:
select * from tbl limit 1, the concurrency will be 1;
select * from tbl where k1=1 limit 1, the concurrency will not limit.
2022-09-28 17:07:07 +08:00
16bb5cb430 [enhancement](memory) Jemalloc performance optimization and compatibility with MemTracker #12496 2022-09-28 12:04:29 +08:00
1ba9e4b568 [Improvement](sort) Reuse memory in sort node (#12921) 2022-09-28 09:44:35 +08:00
Pxl
ee3dd423b9 [Bug](function) core dump on substr #13007 2022-09-28 08:54:49 +08:00
d8ec53c83f [enhancement](load) avoid duplicate reduce on same TabletsChannel #12975
In the policy changed by PR #12716, when reaching the hard limit, there might be multiple threads can pick same LoadChannel and call reduce_mem_usage on same TabletsChannel. Although there's a lock and condition variable can prevent multiple threads to reduce mem usage concurrently, but they still can do same reduce-work on that channel multiple times one by one, even it's just reduced.
2022-09-27 22:03:08 +08:00
d80b7b9689 [feature-wip](new-scan) support more load situation (#12953) 2022-09-27 21:48:32 +08:00
16f5204cab fix_md5sum_and_sm3sum (#13009) 2022-09-27 21:41:14 +08:00
Pxl
9607f60845 [Feature](serialize) move block_data_version to fe heart beat (#12667)
Move block_data_version from be config to fe heart beat
2022-09-27 18:25:54 +08:00
Pxl
64988cb3d4 [Enhancement](optimize) optimize for insert_indices_from (#12807) 2022-09-27 15:49:15 +08:00
722106805f [chore](build) Fix compilation errors reported by clang-15 (#13000)
Add a compile flag -Wno-unused-but-set-variable to build libGeo.a .
2022-09-27 14:04:44 +08:00
3f99dd5c4b [function](bitmap) support bitmap_hash64 (#12992) 2022-09-27 12:16:02 +08:00
429ac929fb [chore](build) Support building from source on ubuntu-22.04 (aarch64) (#12813)
Support building from source on ubuntu-22.04
2022-09-27 10:29:13 +08:00