When execute analyze table, doris fails on decimal columns.
The root cause is the scale in decimalV2 is 9, but 2 in schema.
There is no need to check scale for decimalV2, since it is not a float point type.
1. add RemainedDownPredicates
2. fix core dump when _scan_ranges is empty
3. fix invalid memory access on vLiteral's debug_string()
4. enlarge mv test wait time
refractor DataTypeArray from_string, make it more clear;
support ',' and ']' inside string element, for example: ['hello,,,', 'world][]']
support empty elements, such as [,] ==> [0,0]
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
1. binding slot in order by that not show in project, such as:
SELECT c1 FROM t WHERE c2 > 0 ORDER BY c3
2. not check unbound when bind slot reference. Instead, do it in analysis check.
* [enhancement](load) shrink reserved buffer for page builder (#14012)
For table with hundreds of text type columns, flushing its memtable may cost huge memory.
These memory are consumed when initializing page builder, as it reserves 1MB for each column.
So memory consumption grows in proportion with column number. Shrinking the reservation may
reduce memory substantially in load process.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
* response to the review
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
* Update binary_plain_page.h
* Update binary_dict_page.cpp
* Update binary_plain_page.h
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
Introduce a SQL syntax for creating inverted index and related metadata changes.
```
-- create table with INVERTED index
CREATE TABLE httplogs (
ts datetime,
clientip varchar(20),
request string,
status smallint,
size int,
INDEX idx_size (size) USING INVERTED,
INDEX idx_status (status) USING INVERTED,
INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none")
)
DUPLICATE KEY(ts)
DISTRIBUTED BY RANDOM BUCKETS 10
-- add an INVERTED index to a table
CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english");
```
This PR implements the function of predicate inference
For example:
``` sql
select * from student left join score on student.id = score.sid where score.sid > 1
```
transformed logical plan tree:
left join
/ \
filter(sid >1) filter(id > 1) <---- inferred predicate
| |
scan scan
See `InferPredicatesTest` for more cases
The logic is as follows:
1. poll up bottom predicate then infer additional predicates
for example:
select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id
1. poll up bottom predicate
select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1
2. infer
select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 and t2.id = 1
finally transformed sql:
select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t2.id = 1
2. put these predicates into `otherJoinConjuncts` , these predicates are processed in the next
round of predicate push-down
Now only support infer `ComparisonPredicate`.
TODO: We should determine whether `expression` satisfies the condition for replacement
eg: Satisfy `expression` is non-deterministic
We set heap limit for tcmalloc to avoid oom introduced by tcmalloc which allocates memory for cache even free memory of a machine is little. However, doris allocates large memory unused in some cases, so tcmalloc would throw an oom exception even ther are a lot free memory in a machine.
We can set the limit after we fix the problem again.
1. add back TPC-H regression test cases
2. fix decimal problem on aggregate function sum and agg introduced by #13764
3. fix memo merge group NPE introduced by #13900
Support Aliyun DLF
Support data on s3-compatible object storage, such as aliyun oss.
Refactor some interface of catalog, to make it more tidy.
Fix bug that the default text format field delimiter of hive should be \x01
Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.
In GraphSimplifier, we can use simple cost to calculate the benefit.
And only when the best neighbor of the apply step is the processing edge, we need to update recursively.