Commit Graph

54 Commits

Author SHA1 Message Date
4f9562650d Branch-2.1 [Fix](Variant) fix variant serialize to string (#47121) (#47147)
cherry-pick from #47121
2025-01-18 09:12:39 +08:00
ca5a4f6d35 [Fix](Variant) variant should not implicit be short key column when create mv (#46539)
cherry-pick from #46444
2025-01-09 08:15:22 +08:00
db224ba15f [fix](variant) fix schema change for variant from not null to null (#46403)
cherry-pick from #46279
2025-01-04 09:00:43 +08:00
82c7a9d15a [Fix](Variant) create table should not automatically add variant to key (#44736)
#36609
2024-11-29 09:34:43 +08:00
6dddd4c499 [function](cast)Make string casting to integers more like MySQL's beh… (#41541)
…avior (#38847)
https://github.com/apache/doris/pull/38847
## Proposed changes

There are two issues here. First, the results of casting are
inconsistent between FE and BE .
```
FE
mysql [(none)]>select cast('3.000' as int); 
+----------------------+
| cast('3.000' as INT) |
+----------------------+
|                    3 |
+----------------------+

mysql [(none)]>set debug_skip_fold_constant = true;

BE
mysql [(none)]>select cast('3.000' as int);
+----------------------+
| cast('3.000' as INT) |
+----------------------+
|                 NULL |
+----------------------+
```
The second issue is that casting on BE converts '3.0' to null. Here, the
casting logic for FE and BE has been unified

<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

---------

Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
2024-10-11 09:32:00 +08:00
d659750fd9 [pick](Serde-2.1) fix variant serde may lost num_rows when subcolumns empty (#41438)
serialization object with empty subcolumns may lost num_rows, so need to
record num_rows and set back num_rows in serdes

backport #38413
2024-09-29 09:45:37 +08:00
f69063ea87 [Fix](Variant) use uinque id to access column reader (#39841) (#40269)
#39841
#40295
2024-09-09 18:01:12 +08:00
c8d3202595 [regression-test](cases) optimize some cases (#40240)
#40174
2024-09-02 14:50:48 +08:00
a6f267c479 [pick](Variant) fix element_at should return nullable if result type is nullable (#39846)
#39732
2024-08-24 09:22:03 +08:00
3103bb08dc [pick](Variant) casting to decimal type may lost precision (#39843)
#39650
2024-08-23 22:47:32 +08:00
824f035b98 [pick](Row store) fix row store with invalid json string in variant ty… (#39456)
#39394
2024-08-16 14:43:11 +08:00
3c535e80dd [fix](compatibility) type toSql should return lowercase string (#38012) (#38517)
pick from master #38012

revert #25951
2024-08-09 11:35:42 +08:00
aa9bdd76d0 [Pick](Variant) pick some fix #38413 #38364 (#38512) 2024-07-31 11:03:31 +08:00
73fc55b203 [Pick](Variant) fix some issue by RQG (#38336)
#38318 
#38291
2024-07-25 12:19:07 +08:00
8c6ff22e04 [Pick](Variant) fix heap use after free and optimize cases #37991 #37976 (#38037) 2024-07-18 16:53:09 +08:00
b15ccdbe98 [Pick](Variant) pick some fix (#37922)
#37674
#37839
#37883 
#37857 
#37794
2024-07-16 21:38:47 +08:00
217eac790b [pick](Variant) pick some refactor and fix #34925 #36317 #36201 #36793 (#37526) 2024-07-11 21:25:34 +08:00
e1cb568d11 [Optimize] Add session variable `max_fetch_remote_schema_tablet_count… (#37505)
pick from #37217
2024-07-11 10:04:20 +08:00
d0c0a7b9ae [Fix](variant) ignore serialization of nothing type (#37006)
picked from #36997
2024-06-28 18:41:40 +08:00
a05d5cc75e [refactor](variant) refactor sub path push down on variant type (#36478) (#36923)
pick from master #36478

intro a new rule VARIANT_SUB_PATH_PRUNING to prune variant sub path.

for example, variant slot v in table t has two sub path: 'c1' and 'c2',
after this rule, select v['c1'] from t will only scan one sub path 'c1'
of v to reduce scan time.

This rule accomplishes all the work using two components. The Collector
traverses from the top down, collecting all the element_at functions on
the variant types, and recording the required path from the original
variant slot to the current element_at. The Replacer traverses from the
bottom up, generating the slots for the required sub path on scan,
union, and cte consumer. Then, it replaces the element_at with the
corresponding slot.
2024-06-27 17:48:43 +08:00
5b7d93df5e [Pick](Variant) pick 2 PRs to correct tmp column name to go fast execute #36277 #36313 (#36527) 2024-06-19 19:07:47 +08:00
3cd7b88868 [Fix](Variant) fix variant with empty key (#35671)
in some senario empty key will cause crash like

```
*** tablet *** SIGSEGV unknown detail explain (@0x0) received by PID 1527747 (
TID 1544788 OR 0x7f3302988700) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*
, void*) at /mnt/disk2/lihangyu/doris/be/src/common/signal_handler.h:429
 1# 0x00007F4880A12B50 in /lib64/libc.so.6
 2# doris::vectorized::PathInDataBuilder::append(std::basic_string_view<char,
std::char_traits<char> >, bool) at /mnt/disk2/lihangyu/doris/be/src/vec/json/p
ath_in_data.cpp:193
 3# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false
>::traverseObject(doris::vectorized::SimdJSONParser::Object const&, doris::vec
torized::JSONDataParser<doris::vectorized::SimdJSONParser, false>::ParseContex
t&) at /mnt/disk2/lihangyu/doris/be/src/vec/json/json_parser.cpp:121
 4# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false
>::traverse(doris::vectorized::SimdJSONParser::Element const&, doris::vectoriz
ed::JSONDataParser<doris::vectorized::SimdJSONParser, false>::ParseContext&) a
t /mnt/disk2/lihangyu/doris/be/src/vec/json/json_parser.cpp:95
 5# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false
>::parse(char const*, unsigned long) at /mnt/disk2/lihangyu/doris/be/src/vec/j
son/json_parser.cpp:81
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-05-30 19:55:25 +08:00
e3e5f18f26 [Fix](Json type) correct cast result for json type (#34764) 2024-05-18 18:40:17 +08:00
8f6f4cf0eb [Pick](Variant) pick #33734 #33766 #33707 to branch-2.1 (#33848)
* [Fix](Variant Type) forbit distribution info contains variant columns (#33707)

* [Fix](Variant) VariantRootColumnIterator::read_by_rowids with wrong null map size (#33734)

insert_range_from should start from `size` with `count` elements for null map

* [Fix](Variant) check column index validation for extracted columns (#33766)
2024-04-18 19:42:44 +08:00
04e30c91a0 [Fix](Variant) VariantRootColumnIterator::read_by_rowids with wrong null map size (#33734)
insert_range_from should start from `size` with `count` elements for null map
2024-04-18 19:02:58 +08:00
81f7c53bad [fix](Nereids) could not query variant that not from table (#33704) 2024-04-17 23:42:13 +08:00
249a9c9875 [Feature](Variant) support aggregation model for Variant type (#33493)
refactor use `insert_from` to replace `replace_column_data` for variable lengths columns
2024-04-17 23:42:00 +08:00
09fb30c989 (Chore)[regression-test] fix unstable output variant case (#33520) 2024-04-17 23:41:59 +08:00
3a196c8b0f [Pick](Variant) pick 2 prs about bugfix of variant (#33011)
* [Fix](Variant) forbit table with variant type doing segment compaction temporarily

TODO fix this corretly in later works

* [Bug](Variant) use lower case name for variant's root, since backend treat parent column as lower case

This PR address the problem as blow:
```
errCode = 2, detailMessage = (172.16.56.137)[CANCELLED]failed to initialize storage reader. tablet=17136, res=[INTERNAL_ERROR]Not found field_name, field_name:Tags.tag_key1, schema:[Thread(8), Tags(9), Source(5), tags.tag_key1(-1), Title(6), Level(3), Time(2), CreateDate(1), Message(7), IP(4), AppId(0)]

```
2024-03-29 11:12:28 +08:00
f443d6de85 [Fix](variant) filter with variant access may lead to to parition/tablet prune fall through (#32560)
Query like `select * from ut_p partitions(p2) where cast(var['a'] as int)  > 0` will fall through parition/tablet prunning since it's plan like
```
mysql> explain analyzed plan select * from ut_p where id = 3 and cast(var['a'] as int) = 789;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalResultSink[26] ( outputExprs=[id#0, var#1] )                                                                                                                        |
| +--LogicalProject[25] ( distinct=false, projects=[id#0, var#1], excepts=[] )                                                                                               |
|    +--LogicalFilter[24] ( predicates=((cast(var#4 as INT) = 789) AND (id#0 = 3)) )                                                                                         |
|       +--LogicalFilter[23] ( predicates=(0 = __DORIS_DELETE_SIGN__#2) )                                                                                                    |
|          +--LogicalProject[22] ( distinct=false, projects=[id#0, var#1, __DORIS_DELETE_SIGN__#2, __DORIS_VERSION_COL__#3, element_at(var#1, 'a') AS `var`#4], excepts=[] ) |
|             +--LogicalOlapScan ( qualified=regression_test_variant_p0.ut_p, indexName=<index_not_selected>, selectedIndexId=10145, preAgg=ON )                             |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
```
with an extra LogicalProject on top of LogicalOlapScan, so we should handle such case to prune parition/tablet
2024-03-22 16:38:19 +08:00
617cc667fe [Fix](Variant) fix variant serialize root node (#31769) 2024-03-21 14:07:50 +08:00
5478193002 [Fix](Variant) support view for accessing variant subcolumns and temp… (#32225) 2024-03-15 18:06:20 +08:00
e8aa5ee7d5 [Improve](Variant) support bloom filter for variant subcolumns (#31347)
* [Improve](Variant) support bloom filter for variant subcolumns

* rebase
2024-03-09 19:45:03 +08:00
0442d5dc0e [fix](Variant Type) Add sparse columns meta to fix compaction (#28673)
Co-authored-by: eldenmoon <15605149486@163.com>
2024-02-16 10:12:23 +08:00
b23a785775 [Fix](Variant) support materialize view for variant and accessing variant subcolumns (#30603)
* [Fix](Variant) support materialize view for variant and accessing variant subcolumns
1. fix schema change with path lost and lead to invalid data read
2. support element_at function in BE side and use simdjson to parse data
3. fix multi slot expression
2024-02-16 10:12:23 +08:00
e6fbccd3ed [Feature](Variant) support row store for variant type (#30052) 2024-01-31 23:53:39 +08:00
7667fe8570 [Improve)(Variant) do not allow fall back to legacy planner (#30430) 2024-01-29 19:02:46 +08:00
8543167195 [Nereids](Variant) Implement variant type and support new sub column access method (#30348)
* [Nereids](Variant) Implement variant type in Variant and support new sub column access method

The query SELECT v["a"]["b"] from simple_var WHERE cast(v["a"]["b"] as int) = 1

1. During the binding stage, the expression element_at(var, "xxx") is transformed into a SlotReference with a specified path. This conversion is tracked in the StatementContext, where the parent slot is the primary key and the paths are secondary keys. This structure, known as subColumnSlotRefMap in the StatementContext, helps to eliminate duplicates of the same slot derived from identical paths.

2. A new rule, BindSlotWithPaths, is introduced in the analysis stage. This rule is responsible for converting slots with paths into their respective slot suppliers. To ensure that slots with paths are correctly associated with the appropriate LogicalOlapScan, an additional mapping, slotToRelation, is added to the StatementContext. This mapping links the top-level slot to its corresponding relation (i.e., LogicalOlapScan). Consequently, subsequent slots with paths can determine the correct LogicalOlapScan to merge with and modify accordingly.
2024-01-27 09:09:02 +08:00
9aaa6ba351 [Fix](Variant) fix variant lost null info after cast_column (#30153)
This could result incorrect output in hirachinal cases

```
 sql """insert into ${table_name} values (-3, '{"a" : 1, "b" : 1.5, "c" : [1, 2, 3]}')"""
    sql """insert into  ${table_name} select -2, '{"a": 11245, "b" : [123, {"xx" : 1}], "c" : {"c" : 456, "d" : "null", "e" : 7.111}}'  as json_str
            union  all select -1, '{"a": 1123}' as json_str union all select *, '{"a" : 1234, "xxxx" : "kaana"}' as json_str from numbers("number" = "4096") limit 4096 ;"""

mysql> select v["c"] from var_rs where k = -3 or k = -2;
+----------------------+
| element_at(`v`, 'c') |
+----------------------+
| [1,2,3]              |
| []                   |
+----------------------+
2 rows in set (0.04 sec)
```
2024-01-27 09:08:29 +08:00
4480f751e6 [Improve](Variant) support implicit cast to numeric and string type (#30029) 2024-01-23 10:09:54 +08:00
97ed06a92c [regression-test](Variant) fix unstable case (#29648) 2024-01-12 11:36:45 +08:00
c0f63915f7 [chore](test) make configuartion of parallel scan be fuzzy (#29356) 2024-01-05 11:09:43 +08:00
5db496d844 [Improve](Variant) make output stable (#29389) 2024-01-02 20:29:17 +08:00
e9e1e2894b [performance](variant) support topn 2phase read for variant column (#28318)
[performance](variant) support topn 2phase read for variant column
2023-12-25 11:50:41 +08:00
13ccfa06a7 [Feature](Variant) Implement variant new sub column access method (#28484)
* [Feature](Variant) Implement variant new sub column access method

The query SELECT v["a"]["b"] from simple_var WHERE cast(v["a"]["b"] as int) = 1 encompasses three primary testing scenarios:

```
1. A basic test involving the variant data type.
2. A scenario dealing with GitHub event data in the context of a variant.
3. A case related to the TPC-H benchmark using a variant.
```
2023-12-22 11:59:37 +08:00
f6b6180462 [Fix](Variant) fix variant predicate rewrite OrToIn with wrong plan (#28695)
using the name without paths info will lead to wrong In plan, e.g.
```
where cast(v:a as text) = 'hello' or cast(v:b as text) = 'world'
```
will be rewrite to:
```
where cast(v as text) in ('hello', 'world')
``
This is wrong, because they are different slots
2023-12-22 11:51:36 +08:00
e5a57f82ec [fix](Variant Type) Fixes the desc failure (#28343)
fix the desc failure when there is no decomposition of columns in the variant column.
2023-12-14 13:20:43 +08:00
1bbc54d1b2 [regression-test](variant) change p2 case to s3 load (#28193) 2023-12-11 12:31:25 +08:00
573b594df3 [improvement](Variant Type) Support displaying subcolumns expanded for the variant column (#27764) 2023-12-08 20:34:58 +08:00
341822ec05 [regression-test](Variant) add compaction case for variant and fix bugs (#28066) 2023-12-08 12:18:46 +08:00