Commit Graph

12309 Commits

Author SHA1 Message Date
e90f95dfda [config](merge-on-write) use separate config to control primary key index cache (#22538) 2023-08-03 17:11:19 +08:00
60ca5b0bad [Improvement](statistics)Return meaningful error message when show column stats column name doesn't exist (#22458)
The error message was not good for not exist column while show column stats:
```
MySQL [hive.tpch100]> show column stats `lineitem` (l_extendedpric);
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: null
```

This pr show a meaningful message:
```
mysql> show column stats `lineitem` (l_extendedpric);
ERROR 1105 (HY000): errCode = 2, detailMessage = Column: l_extendedpric not exists
```
2023-08-03 16:35:14 +08:00
c63e3e6959 [fix](regression] fix test_table_level_compaction_policy
[fix](regression] fix test_table_level_compaction_policy
2023-08-03 15:24:17 +08:00
22344d6e4a [test](pipline) exclude fail case (#22546)
exclude fail case
2023-08-03 15:18:26 +08:00
27f6e4649e [improvement](stats) Catch exception properly #22503
Catch exception instead of throw to caller directly to avoid unexpected interruption of upper logic
2023-08-03 15:16:55 +08:00
3961b8df76 [refactor](Nereids) mv top-n two phase read rule from post processor to rewriter (#22487)
use three new plan node to represent defer materialize of TopN.
Example:

```
-- SQL
select * from t1 order by c1 limit 10;

-- PLAN
+------------------------------------------+
| Explain String                           |
+------------------------------------------+
| PhysicalDeferMaterializeResultSink       |
| --PhysicalDeferMaterializeTopN           |
| ----PhysicalDistribute                   |
| ------PhysicalDeferMaterializeTopN       |
| --------PhysicalDeferMaterializeOlapScan |
+------------------------------------------+
```
2023-08-03 14:28:13 +08:00
4f9969ce1e [feature](show-frontends-disk) Add Show frontend disks (#22040)
Co-authored-by: yuxianbing <yuxianbing@yy.com>
Co-authored-by: yuxianbing <iloveqaz123>
2023-08-03 14:04:48 +08:00
4322fdc96d [feature](Nereids): add or expansion in CBO(#22465) 2023-08-03 13:29:33 +08:00
85a95e206e [bugfix](profile) not output some variables correctly (#22537) 2023-08-03 13:17:02 +08:00
e670d84b72 [feature](executor) using max_instance_num to limit automatically instance (#22521) 2023-08-03 13:12:32 +08:00
596fd4d86d [improvement](file-scan) reduce the min size of file split (#22412)
Reduce from 128MB to 8MB.
So that user can set `file_split_size` more flexible.
2023-08-03 11:42:00 +08:00
f7755aa538 [exec](set_operation) Support one child node in set operation (#22463)
Support one child node in set operation
2023-08-03 10:35:59 +08:00
9f0a9e6fd6 [bug](distinct-agg) fix limit value not effective in some case (#22517)
fix limit value not effective in some case
2023-08-03 10:35:36 +08:00
fb644ad691 [improvement](stats) Add more logs and config options (#22436)
1. add more logs and make error messages more clear
2. sleep a while between retry analyze
3. make concurrency of sync analyze configurable
4. Ignore internal columns like delete sign to save resources
2023-08-03 09:55:29 +08:00
205a0793e9 [fix](regression) fix flaky test test_partial_update_schema_change (#22500)
* update

* update
2023-08-03 09:32:48 +08:00
17f4776b0f [typo](docs) fix get start zh doc (#22524) 2023-08-02 23:32:07 +08:00
5aeea985e6 [typo](docs) Replace invalid mysql-connector-java download package. (#21954) 2023-08-02 22:58:08 +08:00
c2db01037a [refactor](config) rename segcompaction_max_threads (#22468) 2023-08-02 22:35:14 +08:00
938f768aba [fix](parquet) resolve offset check failed in parquet map type (#22510)
Fix error when reading empty map values in parquet. The `offsets.back()` doesn't not equal the number of elements in map's key column.

### How does this happen
Map in parquet is stored as repeated group, and `repeated_parent_def_level` is set incorrectly when parsing map node in parquet schema.
```
the map definition in parquet:
 optional group <name> (MAP) {
   repeated group map (MAP_KEY_VALUE) {
     required <type> key;
     optional <type> value;
   }
}
```

### How to fix
Set the `repeated_parent_def_level` of key/value node as the definition level of map node.

`repeated_parent_def_level` is the definition level of the first ancestor node whose `repetition_type` equals `REPEATED`.  Empty array/map values are not stored in doris column, so have to use `repeated_parent_def_level` to skip the empty or null values in ancestor node.

For instance, considering an array of strings with 3 rows like the following:
`null, [], [a, b, c]`
We can store four elements in data column: `null, a, b, c`
and the offsets column is: `1, 1, 4`
and the null map is: `1, 0, 0`
For the `i-th` row in array column: range from `offsets[i - 1]` until `offsets[i]` represents the elements in this row, so we can't store empty array/map values in doris data column. As a comparison, spark does not require `repeated_parent_def_level`, because the spark column stores empty array/map values , and use anther length column to indicate empty values. Please reference: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java

Furthermore, we can also avoid store null array/map values in doris data column. The same three rows as above, We can only store three elements in data column: `a, b, c`
and the offsets column is: `0, 0, 3`
and the null map is: `1, 0, 0`
2023-08-02 22:33:10 +08:00
Pxl
3d0d7a427b [Chore](brpc) display pool name when try offer failed (#22514) 2023-08-02 22:31:33 +08:00
876bd1c747 [typo](Docs) Capitalize the Title of Files in Data Operation - Import Category (#22456) 2023-08-02 22:17:15 +08:00
bbbefc4b6f [typo](docs) Capitalize and Rename Title of Files in Data Operation-Export (#22457) 2023-08-02 21:56:38 +08:00
d5bf00583f [typo](docs) Capitalize and Rename Table Design Files (#22453) 2023-08-02 21:51:58 +08:00
76108bac2f [typo](docs) Capitalize and Rename Title of Install and Deployment Files (#22451) 2023-08-02 21:51:36 +08:00
a7a5e14d52 [typo](docs) Merge Doris Introductions File into Getting Started Category (#22449) 2023-08-02 21:50:55 +08:00
ec4fc1f9ef [typo](doc) fix be java env faq (#22462) 2023-08-02 21:50:33 +08:00
57e0fa448c [typo](docs) Change the jdk version on the macOS to 11 (#22522) 2023-08-02 21:47:14 +08:00
e5028314bc [Feature](Job)Support scheduler job (#21916) 2023-08-02 21:34:43 +08:00
6f575cf4b3 [typo](doc)Add a description of whether one of the dynamic partitioning parameters must be required. (#22422) 2023-08-02 21:28:18 +08:00
2ff4e9d79d [typo](doc)modify some sql syntax description errors (#22420) 2023-08-02 21:28:02 +08:00
498c0124e8 [typo](doc)modify some sql syntax and example description errors (#22460) 2023-08-02 21:27:34 +08:00
9d3f1dcf44 [improvement](vectorized) Deserialized elements of count distinct aggregation directly inserted into target hashset (#21888)
The original logic is to first deserialize the ColumnString into a HashSet (insert the deserialized elements into the hashset), and then traverse all the HashSet elements into the target HashSet during the merge phase.
After optimization, when deserializing, elements are directly inserted into the target HashSet, thereby reducing unnecessary hashset insert overhead.

In one of our internal query tests, 30 hashsets were merged in second phase aggregation(the average cardinality is 1,400,000), and the cardinality after merging is 42,000,000. After optimization, the MergeTime dropped from 5s965ms to 3s375ms.
2023-08-02 21:19:56 +08:00
781c1d5238 [log](load) add debug logs for potential duplicate tablet ids (#22485) 2023-08-02 20:38:41 +08:00
3a787b6684 [improvement](regression) syncer regression test (#22490) 2023-08-02 20:09:27 +08:00
8cac8df40c [Fix](Planner) fix create view tosql not include partition (#22482)
Problem:
When create view with join in table partitions, an error would rise like "Unknown column"

Example:
CREATE VIEW my_view AS SELECT t1.* FROM t1 PARTITION(p1) JOIN t2 PARTITION(p2) ON t1.k1 = t2.k1;
select * from my_view ==> errCode = 2, detailMessage = Unknown column 'k1' in 't2'

Reason:
When create view, we do tosql first in order to persistent view sql. And when doing tosql of table reference, partition key
word was removed to keep neat of sql string. But here when we remove partition keyword it would regarded as an alias.
So "PARTITION" keyword can not be removed.

Solved:
Add “PARTITION” keyword back to tosql string.
2023-08-02 20:04:59 +08:00
4d9f4c7a68 [typo(docs) Capitalize Title of Files in Data Operation - Update and Delete(#22459) 2023-08-02 19:16:11 +08:00
0cd5183556 [Refactor](inverted index) refact tokenize function for inverted index (#22313) 2023-08-02 19:12:22 +08:00
4bc65aa921 [fix](load) PrefetchBufferedReader Crashing caused updating counter with an invalid runtime profile (#22464) 2023-08-02 18:19:48 +08:00
Pxl
751a7680c5 [Bug](exchange) fix core dump on send_local_block (#22494)
fix core dump on send_local_block
2023-08-02 18:12:34 +08:00
527782f3d3 [fix](nereids)move RecomputeLogicalPropertiesProcessor rule before topn optimization (#22488)
topn optimization will change MutableState. So need move RecomputeLogicalPropertiesProcessor rule before it
2023-08-02 17:36:56 +08:00
a4ef340777 [test](pipline) adjust mem limit to 90 & exclude some cases (#22445)
adjust mem limit to 90 & exclude some cases
2023-08-02 15:11:22 +08:00
ddd90855a9 [vectorized](udaf) java udaf support with map type (#22397)
[vectorized](udaf) java udaf support with map type (#22397)
* test
* remove some unused
* update
* add case
2023-08-02 15:03:44 +08:00
16461fdc1c [feature](Nereids): pushdown COUNT through join (#22455) 2023-08-02 14:55:25 +08:00
18692b2a7c fixed (#22481)
[FIX](array) fix array-dcheck-contains_null
2023-08-02 14:22:16 +08:00
e2ed2e99e2 exclude workload group test default (#22483) 2023-08-02 12:45:08 +08:00
e991f607d5 [fix](string-column) fix unescape length error (#22411) 2023-08-02 12:18:05 +08:00
Pxl
f5e3cd2737 [Improvement](aggregation) optimization for aggregation hash_table_lazy_emplace (#22327)
optimization for aggregation hash_table_lazy_emplace
2023-08-02 11:50:21 +08:00
41f984bb39 [fix](fe) Fix stmt forward #22469
The call of String.format() contains orphan %s that will cause following error.
Introduced from #21205
2023-08-02 10:34:04 +08:00
bc87002028 [opt](conf) remote scanner thread num is changed to core num * 10 (#22427) 2023-08-01 23:09:49 +08:00
19d1f49fbe [improvement](compaction) compaction policy and options in the properties of a table (#22461) 2023-08-01 22:02:23 +08:00