Commit Graph

10468 Commits

Author SHA1 Message Date
ac9e92e1aa [typo](docs) Optimize mac compilation documentation (#19629) 2023-05-15 20:34:47 +08:00
0a28959675 [config](mem) change default mem_limit from 90% to 80% (#19602)
With the default config of 90%, be may meet OOM when the load pressure is big.
when set to 80%, be works well with the same load pressure in my cluster.
2023-05-15 17:48:43 +08:00
fad9237d30 [fix](storage) consider file size on page cache key (#19619)
The core is due to a DCHECK:

F0513 22:48:56.059758 3996895 tablet.cpp:2690] Check failed: num_to_read == num_read
Finally, we found that the DCHECK failure is due to page cache:

1. At first we have 20 segments, which id is 0-19.
2. For MoW table, memtable flush process will calculate the delete bitmap. In this procedure, the index pages and data pages of PrimaryKeyIndex is loaded to cache
3. Segment compaction compact all these 10 segments to 2 segment, and rename it to id 0,1
4. Finally, before the load commit, we'll calculate delete bitmap between segments in current rowset. This procedure need to iterator primary key index of each segments, but when we access data of new compacted segments, we read data of old segments in page cache
To fix this issue, the best policy is:

1. Add a crc32 or last modified time to CacheKey.
2. Or invalid related cache keys after segment compaction.
For policy 1, we don't have crc32 in segment footer, and getting the last-modified-time needs to perform 1 additional disk IO.
For policy 2, we need to add additional page cache invalidation methods, which may cause the page cache not stable

So I think we can simply add a file size to identify that the file is changed.
In LSM-Tree, all modification will generate new files, such file-name reuse is not normal case(as far as I know, only segment compaction), file size is enough to identify the file change.
2023-05-15 17:16:31 +08:00
c87e78dc35 [bug](jsonb) fix jsonb query bug When the json key value contains "." (#19185)
Issue Number: close #19173

mysql> SELECT jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1');
+-------------------------------------------------------------------------------------------+
| jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1') |
+-------------------------------------------------------------------------------------------+
| "v31" |
+-------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
2023-05-15 15:43:12 +08:00
052c7cff89 [Fix](Planner) fix cast from decimal to boolean (#19585) 2023-05-15 15:13:16 +08:00
Pxl
2a02561863 [Bug](ubsan) fix some wrong downcast founded by ubsan (#19591)
fix some wrong downcast founded by ubsan.
```cpp
doris/be/src/olap/bloom_filter_predicate.h:43:32: runtime error: downcast of address 0x7f8ec2b691a0 which does not point to an object of type 'doris::BloomFilterColumnPredicate<doris::TYPE_DATE>::SpecificFilter' (aka 'BloomFilterFunc<(doris::PrimitiveType)11U>')
0x7f8ec2b691a0: note: object is of type 'doris::BloomFilterFunc<(doris::PrimitiveType)12>'
 e5 55 00 00  10 74 58 42 e5 55 00 00  00 00 10 00 8e 7f 00 00  20 07 6f cc 8e 7f 00 00  80 fe 68 cc
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'doris::BloomFilterFunc<(doris::PrimitiveType)12>'  
```
1. TYPE_DATE/TYPE_DATETIME have same data format, so I change the cast about bloom filter to reinterpret cast.
```cpp
doris/be/src/vec/exec/format/orc/vorc_reader.h:281:17: runtime error: downcast of address 0x7f562f4c3180 which does not point to an object of type 'ColumnVector<int>'
0x7f562f4c3180: note: object is of type 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >'
 74 65 00 00  20 91 70 f5 ca 55 00 00  02 00 00 00 00 00 00 00  f0 d4 4c 2f 56 7f 00 00  f0 d4 4c 2f
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >'
```
2. doris use ColumnDecimal to store decimal elements.
2023-05-15 14:27:48 +08:00
69243b3a57 [fix](Nereids): SemiJoinLogicalJoinTranspose shouldn't throw error when eliminate outer failed. (#19566) 2023-05-15 12:31:54 +08:00
Pxl
4eb2604789 [Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455)
1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE`
2. make assert_cast support cast to derived
3. change some static cast to assert cast
4. support sum(bool)/avg(bool)
2023-05-15 11:50:02 +08:00
5df5c77d39 [fix](Nereids) should not colocate agg when scan data partition is random (#19598) 2023-05-15 11:22:41 +08:00
6748ae4a57 [Feature] Collect the information statistics of the query hit (#18805)
1. Show the query hit statistics for `baseall`

   ```sql
    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 0          | 0           |
    | k1    | 0          | 0           |
    | k2    | 0          | 0           |
    | k3    | 0          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 0          | 0           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.002 sec)

    MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall  where k9 > 1 group by k0,k1,k2;
    +------+------+--------+-------------+
    | k0   | k1   | k2     | sum(`k3`)   |
    +------+------+--------+-------------+
    |    0 |    6 |  32767 |        3021 |
    |    1 |   12 |  32767 | -2147483647 |
    |    0 |    3 |   1989 |        1002 |
    |    0 |    7 | -32767 |        1002 |
    |    1 |    8 |    255 |  2147483647 |
    |    1 |    9 |   1991 | -2147483647 |
    |    1 |   11 |   1989 |       25699 |
    |    1 |   13 | -32767 |  2147483647 |
    |    1 |   14 |    255 |         103 |
    |    0 |    1 |   1989 |        1001 |
    |    0 |    2 |   1986 |        1001 |
    |    1 |   15 |   1992 |        3021 |
    +------+------+--------+-------------+
    12 rows in set (0.050 sec)

    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 1          | 0           |
    | k1    | 1          | 0           |
    | k2    | 1          | 0           |
    | k3    | 1          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 1          | 1           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.001 sec)
   ```

2. Show the query hit statistics summary for all the mv in a table

   ```sql
   MySQL [test_query_db]> show query stats from baseall all;
    +-----------+------------+
    | IndexName | QueryCount |
    +-----------+------------+
    | baseall   | 1          |
    +-----------+------------+
    1 row in set (0.005 sec)
   ```

3. Show the query hit statistics detail info for all the mv in a table

   ```sql
    MySQL [test_query_db]> show query stats from baseall all verbose;
    +-----------+-------+------------+-------------+
    | IndexName | Field | QueryCount | FilterCount |
    +-----------+-------+------------+-------------+
    | baseall   | k0    | 1          | 0           |
    |           | k1    | 1          | 0           |
    |           | k2    | 1          | 0           |
    |           | k3    | 1          | 0           |
    |           | k4    | 0          | 0           |
    |           | k5    | 0          | 0           |
    |           | k6    | 0          | 0           |
    |           | k10   | 0          | 0           |
    |           | k11   | 0          | 0           |
    |           | k7    | 0          | 0           |
    |           | k8    | 0          | 0           |
    |           | k9    | 1          | 1           |
    |           | k12   | 0          | 0           |
    |           | k13   | 0          | 0           |
    +-----------+-------+------------+-------------+
    14 rows in set (0.017 sec)
   ```

4. Show the query hit for a database

   ```sql
    MySQL [test_query_db]> show query stats for test_query_db;
    +----------------------------+------------+
    | TableName                  | QueryCount |
    +----------------------------+------------+
    | compaction_tbl             | 0          |
    | bigtable                   | 0          |
    | empty                      | 0          |
    | tempbaseall                | 0          |
    | test                       | 0          |
    | test_data_type             | 0          |
    | test_string_function_field | 0          |
    | baseall                    | 1          |
    | nullable                   | 0          |
    +----------------------------+------------+
    9 rows in set (0.005 sec)
   ```

5. Show query hit statistics for all the databases

   ```sql
    MySQL [(none)]> show query stats;
    +-----------------+------------+
    | Database        | QueryCount |
    +-----------------+------------+
    | test_query_db   | 1          |
    +-----------------+------------+
    1 rows in set (0.005 sec)
   ```
2023-05-15 10:56:34 +08:00
92bf485abd [Bug] Fix doris pipeline shared scan and top n opt (#19599) 2023-05-15 10:00:44 +08:00
554b89183b [community](collaborator) remove inactive collaborator (#19627) 2023-05-15 09:49:28 +08:00
91d5e956a0 [typo](doc) Fixed typos in cluster-action.md (#19549) 2023-05-14 23:52:41 +08:00
80886af828 [doc](grant)add the version for grant for user; (#19556) 2023-05-14 23:52:18 +08:00
859b203b1d [typo](doc) Fixed typos in query-profile-action.md (#19552) 2023-05-14 23:51:58 +08:00
2b402483a9 add release shade and sdk doc (#19576) 2023-05-14 23:51:17 +08:00
f4aea2a6db [Doc](binlog-load) delete binlog-load doc side bar (#19593) 2023-05-14 23:50:55 +08:00
0617c7e56b [enhance](Cold&Heat separation) use file block cache for cold heat separation rowset (#19410)
For performance issue, we would specify rowset included by cold heat separation table to use file block cache no matter what config user has set.
I've tested the config using cold_heat_seperation_case_p2 and it works well.
2023-05-14 22:06:26 +08:00
be0f4abc71 [doc](doris-future)Add doc for doris future (#19617) 2023-05-14 20:22:05 +08:00
0068828a94 [Feature](insert) support insert overwrite stmt (#19616) 2023-05-14 20:01:30 +08:00
f8ef25bb10 [enhancement](load) lazy-open necessary partitions when load (#18874) 2023-05-14 16:09:55 +08:00
91cdb79d89 [Bugfix](Outfile) fix that export data to parquet and orc file format (#19436)
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
2023-05-13 22:39:24 +08:00
e98f4c4a5e [fix](be) BE UT built against Clang-16 failed (#19610)
If we use Clang-16 to build the third-party libraries and build doris_be_test against them, we can not run doris_be_test successfully. Some errors with BRPC occur.

I tested this on Linux (x86_64) and macOS (x86_64/arm64), these errors always raised.
2023-05-13 22:32:29 +08:00
38294b98db Fix comparator of ResouceGroupSet (#19523) 2023-05-13 09:17:16 +08:00
86ba0ebf42 [fix](mow) revert 17147 and 18750 (#19583) 2023-05-13 08:43:36 +08:00
cd9d633c1b [doc](multi-catalog)add properties converter docs (#18287)
update doc for #18005
2023-05-12 21:03:30 +08:00
cb943ae7ca [pipeline](bug) DCHECK may failed in pip sender queue (#19545)
DCHECK may failed in pip sender queue
2023-05-12 20:39:18 +08:00
26d1eb64d2 [Doc](statistics) add statistics documents (#19323)
The stats feature will continue to be refined, and the documentation will change over time.
2023-05-12 20:11:29 +08:00
03d774d0af [fix](inverted index) fix query fail caused by FullTextIndexReader not check index file whether exists 2023-05-12 20:00:10 +08:00
316223ef34 [fix](planner) forbidden query in insert value list (#19493) 2023-05-12 19:46:19 +08:00
4142cc0e8c [fix](merge conflict) fix FE compile error (#19586) 2023-05-12 18:18:22 +08:00
c37d781942 [enchancement](statistics) manually inject table level statistics (#19495)
supports users to manually inject table level statistics.

table stats type:
- row_count

Modify table or partition statistics:
```SQL
ALTER TABLE table_name SET STATS ('k1' = 'v1', ...) 
```

TODO:
- support other table stats type if necessary
- update statistics cache if necessary
2023-05-12 17:03:12 +08:00
26a7f86b66 [improvement](auth)only GRANT_PRIV and USAGE_PRIV can GRANT for RESOURCE (#19547)
only GRANT_PRIV and USAGE_PRIV can GRANT for RESOURCE
2023-05-12 15:47:04 +08:00
26e930eed1 [Fix](multi-catalog) Make BE selection policy works fine when enable prefer_compute_node_for_external_table (#19346) 2023-05-12 15:32:50 +08:00
860ce97622 [feature](torc) support insert only transactional hive table on FE side (#19419)
* [feature](torc) support insert only transactional hive table on FE side

* 3

* commit

* 1
2023-05-12 15:32:26 +08:00
feef5afa0b [typo](doc) Fixed typos in SHOW-ROUTINE-LOAD.md (#19573) 2023-05-12 14:37:28 +08:00
a1da57c63e [opt](Nereids)(WIP) optimize agg and window normalization step 2 #19305
1. refactor aggregate normalization to avoid data amplification before aggregate
2. remove useless aggreagte processing in ExtractAndNormalizeWindowExpression
3. only push distinct aggregate function children

TODO:
1. push down redundant expression in aggregate functions
2. refactor normalize repeat rule
3. move expression normalization and optimization after plan normalization to avoid unexpected expression optimization.
2023-05-12 14:00:13 +08:00
0477a9f5de [fix](dateformat) Fix hour date format (#19569)
Introduced from #19265.
The hour format should support both "5" and "05".
2023-05-12 13:38:41 +08:00
56a6431b55 [fix](pipeline) fix query returns empty result instead of an error occasionally after being cancelled (#19561) 2023-05-12 12:40:41 +08:00
56bc8a762d [decimalv3](literal) use decimalv3 literal if enable_decimal_conversion is true (#19559) 2023-05-12 12:01:54 +08:00
9bf6ecca48 [minor](log) change debug log to info to observe the storage medium change #19529
When user set default_storage_medium to true, the storage medium of all partitions should be SSD,
and cooldown time should be 9999-12-31 23:59:59.
So that it won't change to HDD.

But looks like sometimes it still change to HDD.
So I change the debug log to info to observer it.
2023-05-12 11:02:55 +08:00
8ef9212ddc [enhancement](exceptionsafe) force check exec node method's return value (#19538) 2023-05-12 10:21:00 +08:00
157ec5757a [fix](s3FileWriter) don't use bthread countdown event to sync #19534
Unfortunately BthreadCountDownEvent will not serve as one sync primitive for this scenario where are all pthread workers. BthreadCountDownEvent::time_wait is used for bthread so it will result in some confusing sync problem like heap buffer use after free.
2023-05-12 09:19:57 +08:00
bd6a36091e [chore](cmake) fix DORIS_JAVA_HOME from JAVA_HOME (#19521) 2023-05-12 09:12:38 +08:00
1296a920c2 [chore](collaborator) add several collaborators to manage issue (#19550) 2023-05-12 09:09:52 +08:00
868bae47f6 [improvement](docker) update compilation Dockerfile (#19563) 2023-05-12 09:06:45 +08:00
e9392780a9 [fix](nereids)fix some nereids planner bugs (#19509)
1.some encrypt and decrypt functions have wrong blockEncryptionMode
2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id
3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )
2023-05-12 09:06:16 +08:00
a041f8eabe [fix](fe) Fx SimpleDateFormatter thread unsafe issue by replacing to DateTimeFormatter. (#19265)
DateTimeFormatter replace SimpleDateFormat in fe module because SimpleDateFormat is not thread-safe.
2023-05-11 22:50:24 +08:00
d58498841a [fix](Nereids) Should copy JoinReorderContext for PushdownProject (#19508)
1. should copy JoinReorderContext
2. verify bushy tree join reorder
2023-05-11 21:05:12 +08:00
9568de303a [Chore](build) update clang-format version check (#19542)
update clang-format version check
2023-05-11 19:38:58 +08:00