This pr implements the list default partition referred in related #15507.
It's similar as GreenPlum's default's partition which would store all data not satisfying prior partition key's
constraints and optimizer wouldn't filter default partition which means default partition would be scanned
each time you try to select data from one table with default partition.
User could either create one table with default partition or alter add one default partition.
```sql
PARTITION LIST(key) {
PARTITION p1 values in (xx,xx),
PARTITION DEFAULT
}
ALTER TABLE XXX ADD PARTITION DEFAULT
```
We don't support automatically migrate data inside default partition which meets newly added partition key's
constraint to newly add partition when alter add new partition. User should select default partition using new
constraints as predicate and insert them to new partition.
```sql
insert into tbl select * from tbl partition default where partition_key=xx;
```
Consider the sql bellow:
select sum(cc.qlnm) as qlnm
FROM
outerjoin_A
left join (SELECT
outerjoin_B.b,
coalesce(outerjoin_C.c, 0) AS qlnm
FROM
outerjoin_B
inner JOIN outerjoin_C ON outerjoin_B.b = outerjoin_C.c
) cc on outerjoin_A.a = cc.b
group by outerjoin_A.a;
The coalesce(outerjoin_C.c, 0) was calculated in the agg node, which is wrong.
This pr correct this, and the expr is calculated in the inner join node now.
1. Organize http documents
2. Add http interface authentication for FE
3. Support https interface for FE
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
1. Organize http documents
2. Add http interface authentication for FE
3. Support https interface for FE
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
For add or drop inverted index, when replay the logModifyTableAddOrDropInvertedIndices will new a schema change job, that has a new CreateTime, here should new a schema change job when not replay log.
1. strdup(const char*) will copy a temporary char array.
2. std::string's copy construct will copy the temporary char array to std::string buffer.
3. finally the temporary char array will leak.
Add hint NTH_OPTIMIZED_PLAN to let the optimzier can select n-th optimized plan. For example, you could use,
select /*+SET_VAR("nth_optimized_plan"=2) */ * from table;
to select the second-best plan in the optimizer.
BloomFilter in MoW table may consume lots of memory, and it's life cycle is same as segment. This patch try to improve the efficiency of recycling segment cache, to release the memory in time.
Support to decode nested array column in parquet reader:
1. FE should generate the right nested column type. FE doesn't check the nesting depth and legality, like map\<array\<int\>, int\>.
2. `ParquetColumnReader` has removed the filtering of page index to support nested array type.
It's too difficult to skip values in nested complex types. Maybe we should support the filtering of page index and lazy read in later PR.
3. `ExternalFileScanNode` has a bug in creating default value expression.
4. Maybe it's slow to read repetition levels in a while loop. I'll optimize this in next PR.
5. Array column has temporary `SchemaElement` in its thrift definition,
we have removed them and keep its parent in former implementation.
The remaining parent should inherit the repetition and definition level of its child.
Improve performance of parquet reader filter calculation.
- Use `filter_data` instead of `(*filter_ptr)` to merge filter to improve performance.
- Use mutable column filter func instead of original new column filter func which introduced by #16850.
- Avoid column ref-count increasing which caused unnecessary copying by passing column pointer ref.
This pr do two things:
1. fix:
It use `column[0]` to judge class type in JdbcExecutor, but column[0] may be null !
2. Enhencement
In the original logic, all fields in jdbc catalog table will be set Nullable.
However, it is inefficient for nullable fields. Actually, we can know if the fields in data source table
is nullable through jdbc. So we can set the corresponding fields in Doris jdbc catalog to nullable or not.
The LoadScanProvider doesn't get Hidden Columns from stream load parameter.
This may cause stream load delete operation fail. This pr is to pass the hidden columns to LoadScanProvider.
sub_bitmap return type should be ALWAYS_NULLABLE, not depend on children.
For example
sub_bitmap(bitmap_empty(), 1, 2) return NULL, but all children are not null.
Sense io error.
Retry query when io error.
Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.
Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory.
vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs.
join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM.
Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.
the body of create view stmt is parsed twice.
in the second parse, we get sql string from CreateViewStmt.viewDefStmt.toSql() function, which missed selectlist.
consider sql select *
from
(select * from test_1) a
inner join
(select * from test_2) b
on a.id = b.id
inner join
(select * from test_3) c
on a.id = c.id
Because a.id is from a subquery, to find its source table, need use function getSrcSlotRef().
Fulltext index is the inverted index of the specified tokenizer, before this pr, fulltext index only can evaluate match predicate, this pr to support evaluate equal predicate and list predicate.
In version 1.2.1, user can set `"hadoop.username" = "xxx"` to specify a remote user to access hdfs
when creating hive catalog.
But in version 1.2.2, we upgrade the hadoop version from 2.8 to 3.3, some behavior changed and the
user specified remote user is useless.
This PR try to fix this by using `UserGroupInformation` to delegate.