Commit Graph

6608 Commits

Author SHA1 Message Date
581494dea8 [fix](test) resolve load in tpch_sf100_unique_p2 and tpch_sf10_unique_p2 (#13208) 2022-10-09 20:30:00 +08:00
b9516b50c1 [typo](docs)fix docs 404 url (#13157)
* fix docs 404 url
2022-10-09 20:02:48 +08:00
7b2fdd26a1 [schema change](fix) fix coredump of schema change (#13183)
When schema change and compaction is executing simutaneously, both
nullable and not nullable data can be read for the same column, need to
reset _nullmap for each Block when converting Block data, or else Column
case will be wrong.
2022-10-09 19:44:00 +08:00
3302e0b57e [enhancement](regression-test) add sync for unique table debug test (#13210) 2022-10-09 19:32:28 +08:00
f2159709a8 [Regression](outfile) Fix concurrency test failure caused by outfile (#13209) 2022-10-09 19:09:44 +08:00
fc711d89c8 [fix](projections) Open the project expressions properly. (#13162)
In current 'ExecNode::open' function, the 'open(_projections)' is unreachable which might cause serious crashed. (#13150)
2022-10-09 18:43:45 +08:00
89514fc964 [fix](rowset) fix that rowset writer doesn't process the return value, which may result in data loss (#13189) 2022-10-09 17:10:11 +08:00
15fc3c2c89 [enhancement](statistics) optimize the default configuration related to statistics, etc. (#13136)
This pr is mainly to optimize statistical tasks. Includes the following:
1. No longer generate statistics tasks for empty tables, and move the logic of skipping empty partitions to the process of task generation.
2. Adjusted the default configuration related to statistics to improve the efficiency of statistics collection, parameters include `cbo_concurrency_statistics_task_num`,`statistic_job_scheduler_execution_interval_ms`  and `statistic_task_scheduler_execution_interval_ms`.
3. Optimize the display of statistical tasks.
4. In addition, some `org.apache.parquet.Strings` packages are changed to `com.google.common.base.Strings` to avoid the exception that Strings cannot be found in local debug.

etc.
2022-10-09 16:34:20 +08:00
da933ecd21 [fix](Nereids) plan broadcast on right semi join by mistake (#13206) 2022-10-09 16:32:12 +08:00
cfade2dfe0 [typo](docs)Fix Docs 404 Url #13175 2022-10-09 16:22:26 +08:00
dc2d33298b [chore](be config) remove config use_mmap_allocate_chunk #13196
This config is never used online and there exist bugs if enable this config. So that I remove this config and related tests.


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-10-09 16:19:59 +08:00
e5fbecc621 [typo](docs)Fix the jump link 404 in delete recover.md (#13156)
* [typo](docs)Fix the jump link 404 in delete-recover.md
2022-10-09 16:12:34 +08:00
207e913b55 fix the bad link fo delete-recover.md (#13203)
fix the bad link fo delete-recover.md
2022-10-09 16:08:19 +08:00
9c64fde8f5 [tools](banchmark) upgrade date type (#13197)
upgrade date type to datev2
2022-10-09 14:17:12 +08:00
f373b22dcf [fix](string) Fix over-allocated memory for string type (#13167)
For string/varchar/text type, the length field is fixed to 2GB. (`ColumnMetaPB`)
We don't actually have to allocate 2GB for every string type because we
will reallocate the precise size of memory for the string in
`WrapperField::from_string()`

```
    Status from_string(const std::string& value_string, const int precision = 0,
                       const int scale = 0) {
        if (_is_string_type) {
            if (value_string.size() > _var_length) {
                Slice* slice = reinterpret_cast<Slice*>(cell_ptr());
                slice->size = value_string.size();
                _var_length = slice->size;
                _string_content.reset(new char[slice->size]);
                slice->data = _string_content.get();
            }
        }
        return _rep->from_string(_field_buf + 1, value_string, precision, scale);
    }
```
2022-10-09 14:14:39 +08:00
Pxl
245490d6b7 [Enhancement](runtime filter) optimize for runtime filter (#12856)
optimize for runtime filter
2022-10-09 14:11:03 +08:00
8f36f8b83a Add be Parameter Description(#13201)
Add be Parameter Description
2022-10-09 12:49:57 +08:00
33fe389d62 [regression](datev2) Add regression tests for datev2 (#13040) 2022-10-09 11:55:06 +08:00
e0cff02c1a add sync for stream load test (#13185) 2022-10-09 11:36:01 +08:00
bbb6d2758a [fix](regression-test) fix test_segment_iterator_delete using order_qt_sql (#13192) 2022-10-09 11:35:22 +08:00
62c82bd575 [enhancement](test) Rewrite test_update_schema_change case (#13191) 2022-10-09 11:35:05 +08:00
9e42804298 [feature-wip](unique-key-merge-on-write) unique key with merge on write table support schema change (#12886) 2022-10-09 11:31:53 +08:00
671dc93035 [feature-wip](unique-key-merge-on-write) fix that versions of multiple replicas are inconsistent when rebalance (#12363) 2022-10-09 11:31:27 +08:00
e6f4c771d9 [fix](docs) fix trim, lower, upper function docs error (#13179) 2022-10-09 10:32:26 +08:00
555f9520e3 fix community module error url (#13182)
fix community module error url
2022-10-09 10:27:02 +08:00
c53d2d6a8b install deploy doc fix (#13177)
install deploy doc fix
2022-10-09 10:26:28 +08:00
e0044e5a5f [typo](docs)Sql doc link fix (#13151)
* sql doc link fix
2022-10-09 09:26:00 +08:00
ece4a6c194 [doc][fix](multi-catalog) add doc for multi catalog and fix refresh bug (#13097)
1. Add all document about multi catalog feature.
2. Fix a bug that REFRESH edit log is not handled
2022-10-09 09:14:44 +08:00
d16ff79217 [fix](flinkCDC Demo):fix flinkcdc demo execution error (#13148) 2022-10-09 09:13:18 +08:00
b8b18e5153 [enhancement](array-type) Handle cast empty string value to array (#13028)
Handle empty value between two comma when cast string to array type.

before:
mysql> select cast("[a,b,c,,,,]" as array<string>);
+-----------------------------------+
| CAST('[a,b,c,,,,]' AS ARRAY<TEXT>) |
+-----------------------------------+
| ['a', 'b', 'c', ',', ',']                |
+-----------------------------------+
1 row in set (0.01 sec)

after:
mysql> select cast("[a,b,c,,,,]" as array<string>);
+-----------------------------------+
| CAST('[a,b,c,,,,]' AS ARRAY<TEXT>) |
+-----------------------------------+
| ['a', 'b', 'c', '', '', '']                |
+-----------------------------------+
1 row in set (0.01 sec)
2022-10-08 21:45:42 +08:00
869fe2bc5d [Improvement](outfile) Support ORC format in outfile (#13019) 2022-10-08 20:56:32 +08:00
344377beb7 [typo](docs)Fix jump link 404 in jdbc load.md (#13170) 2022-10-08 20:01:52 +08:00
86e47650cf Update outfile.md (#13172) 2022-10-08 20:01:20 +08:00
4386f41442 sql server 2017 version ODBC usage instructions (#13178)
sql server 2017 version ODBC usage instructions
2022-10-08 20:00:53 +08:00
6b0410450b [typo](docs)Fix jump link 404 in external storage load.md (#13173) 2022-10-08 19:59:44 +08:00
c5f802b93c [Bug](libjvm) reorder initialization of JNI (#13165) 2022-10-08 18:53:47 +08:00
b81a8789c3 [feature-wip](parquet-reader) optimize the performance of column conversion (#13122)
Convert Parquet column into doris column via batch method.
In the previous implementation, only numeric types can be converted in batches,
and other types can only be inserted one by one.
This process will generate repeated virtual function calls and container expansion.
2022-10-08 18:03:10 +08:00
5214e898d9 [fix](parquet-reader) skip data/datatime column predicate filter to avoid coredump (#13072)
Will be fixed later
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-10-08 18:02:35 +08:00
cf2b93532b [fix](file-scanner) fix some logic about broker load with parquet with new file scanner (#13135)
Fix some logic about broker load using new file scanner, with parquet format:

1. If columns are specified in load stmt, but none of them are in parquet file,
    error will be thrown like `err: No columns found in file`. See `parquet_s3_case4`

2. If the first column of table are not in table, the result number of rows is wrong.
    See `parquet_s3_case8`

3. If column specified in `columns` in load stmt does not exist in file and table,
    error will be thrown like: `failed to find default value expr for slot: x1`. See `parquet_s3_case2`
2022-10-08 13:08:08 +08:00
63f5dc1953 [feature](Nereids): support Alias join reorder and fix bug. (#12890)
* [improve](Nereids): simplify onCondition check.

* feature: support project Alias for join reorder.
2022-10-08 10:45:04 +08:00
91cf33865d [improvement](load) config flush_thread_num_per_store to be 6 by default (#13076)
Flushing memtable is cpu bound, so 2 thread for a disk is tool small.
2022-10-08 09:16:22 +08:00
e0f17f217f [fix](test) resolve tpch_sf100_unique_p2 and tpch_sf10_unique_p2 to run in parallel (#13138) 2022-10-08 09:10:22 +08:00
71399ed771 fix data cache sidebar error (#13137)
fix data cache sidebar error
2022-10-07 17:45:21 +08:00
d902e80d6d [docs](unique-key-merge-on-write) add document for unique key merge o… (#13068) 2022-10-07 16:18:04 +08:00
8b03977689 fix bug that last line of data lost for stream load when line delimiter is more than one character (#13066) 2022-10-07 16:12:05 +08:00
447aceb223 [Fix](doc) Remove unsupported parameter (#13081) 2022-10-07 16:10:00 +08:00
b41748efa1 [feature-wip](new-scan)Add new jdbc scanner and new jdbc scan node (#12848)
Related pr: #11582
This pr is the new jdbc scan node and scanner.
2022-10-07 09:55:17 +08:00
0ccb047d45 fix slack link (#13128) 2022-10-06 18:11:14 +08:00
f2aa6e9a21 [doc](typo): fix typo (#13130) 2022-10-06 18:10:41 +08:00
441b450a79 (runtimefilter) shorter time prepare consumes (#13127)
Now, every preare put a runtime filter controller, so it takes the
mutex lock on the controller map. Init of bloom filter takes some
time in allocate and memset. If we run p1 tests with -parallel=20
-suiteParallel=20 -actionParallel=20, then we get error message like
'send fragment timeout 5s'.

The patch fixes the problem in the following 2 ways:
1. Replace one mutex block with 128s.
2. If a plan fragment does not have a runtime filter, it does not need to take
the locks.
2022-10-06 10:12:29 +08:00