Commit Graph

4406 Commits

Author SHA1 Message Date
2c2e06a5fe [Fix bug] fix non-equal out join is not supported (#8857) 2022-04-21 12:44:20 +08:00
7af684ad0f [fix] Fix a compatibility problem caused by using a non-existent database when connecting via mysql client (#9127) 2022-04-21 12:43:39 +08:00
7b3865b524 [fix](ut)(vectorized) fix a potential stack overflow bug and some unit test (#9140) 2022-04-21 12:17:03 +08:00
fa5b5fc6d1 [fix](dynamic_partition) fix dynamic partition scheduler not work for olap table with random hash info (#9108) 2022-04-21 12:16:27 +08:00
498f50a837 [regression-test] update test case dir which divided by basic functions (#9084)
1.  Add test case dir. 
2. Add some test suites.
2022-04-21 11:55:41 +08:00
Pxl
dda7604e16 [Bug][Storage-vectorized] fix code dump on outer join with not nullable column (#9112) 2022-04-21 11:02:04 +08:00
40362dfaca [fix](partition) Fix wrong partition distribution key info for random hash olap table (#9104) 2022-04-20 17:08:42 +08:00
f253e260c8 [fix] Modify fe jetty configuration parameters (#9075) 2022-04-20 14:51:25 +08:00
39c0fec680 [fix] fix bug when partition_id exceeds integer range in spark load (#9073) 2022-04-20 14:50:55 +08:00
a2edc6fd8b [feature-wip](array-type) replicate impl for ColumnArray to support join with array column (#9070)
SQL with JOIN and columns ARRAY, will call function ColumnArray::replicate. At this pr,
we implement replicate for ARRAY type, to support SQL like this:
`SELECT count(lo_array),count(d_array),SUM(lo_extendedprice*lo_discount) AS REVENUE FROM  lineorder, date WHERE  lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;`
2022-04-20 14:50:34 +08:00
df3a8545dc [fix](routine_load) Add retry mechanism for routine load task which encounter Broker transport failure (#9067) 2022-04-20 14:49:58 +08:00
3cd432c83a [community](*) polish config about project (#8987)
- Add `.editorconfig`
- Polish `.gitignore`
2022-04-20 14:48:32 +08:00
bd126f0679 [improvement] Refactor type info for further optimizations. (#8786)
## Design:

For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types.

There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes.
1. `const TypeInfo* get_scalar_type_info(FieldType field_type)`
    The function is used to get the type info of scalar types. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
2. `const TypeInfo* get_collection_type_info(FieldType sub_type)`
    The function is used to get the type info of array types with just **ONE** depth. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)`
4. `TypeInfoPtr get_type_info(const TabletColumn* col)`
    These functions are used to get the type info of **BOTH** scalar types and composite types. The caller should be responsible to manage the resources returned.

#### About the new type `TypeInfoPtr`
`TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter.
1. For scalar types, the deleter does nothing.
2. For composite types, the deleter reclaim the memory.

By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr:
1. `Field`
2. `ColumnReader`
3. `DefaultValueColumnIterator`

Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance.
1. `ScalarColumnWriter` - holds `Field`
    1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter`
        1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types.
    2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
        1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and  `BitmapIndexWriter` doesn't support `ArrayType`.
    3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
        1.  `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types.
2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types).
3. `ColumnVectorBatch`
    1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in  `IndexedColumnReader`
    2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader`
    3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`
2022-04-20 14:47:29 +08:00
1b4cd76847 [feature](vectorized)(function) Support min_by/max_by function. (#8623)
Support min_by/max_by on vectorized engine.
2022-04-20 14:46:19 +08:00
d58e8d76b5 [doc]Release manager docment (#9081)
Add Release manager docment
2022-04-20 14:16:12 +08:00
304bd9ab62 Change Get-Starting (#9102)
Modify the download address
2022-04-20 14:15:18 +08:00
8d0f06e49a [docs][typo] Fixed Chinese and English "advance-usage.md" files. (#9099)
Fixed Chinese and English "advance-usage.md" files
2022-04-20 14:14:39 +08:00
3a9008f06a [dosc][typo] Fix "basic-usage.md" files. (#9097)
Fix "basic-usage.md"
2022-04-20 14:14:12 +08:00
1d0629925f Modify the compilation docs whether it supports avx2. (#9095)
Modify the compilation docs whether it supports avx2
2022-04-20 14:13:39 +08:00
48f805fbab Fix routine-load-manual.md (#9090)
Fix routine-load-manual
2022-04-20 14:13:19 +08:00
37bd89d24b [typo](docs) fix some typos in docs (#9085)
fix some typos in docs
2022-04-20 14:12:54 +08:00
869fdff2f0 [refactor] add reference path for source file from impala (#9115)
According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.
2022-04-20 12:29:57 +08:00
2cecb5dc82 [fix](regression-test) disable test for hdfs and fix double type with null (#9080)
1. add a new config in regression-conf.groovy
    enableHdfs, default is false, to skip tests with hdfs
2. fix a bug that when double type column result is null, exception will be thrown
2022-04-18 19:37:37 +08:00
0f86fed547 [improvement](insert) Support verbose keyword in insert query stmt (#9047) 2022-04-18 19:36:40 +08:00
51db4e54c0 [fix](table-function) Fix bug of table function with outer join cause nullptr of tuple (#9041) 2022-04-18 19:35:26 +08:00
f3dce9a6c1 [fix](planner) fix is-null predicate in where statement cannot be pushed down to the storage layer (#9035) 2022-04-18 19:35:02 +08:00
Pxl
681f960257 [fix](storage)(vectorized) query get wrong result when read datetime type column (#8872) 2022-04-18 19:34:06 +08:00
a71e0554be [github] enable clang format github action (#9082) 2022-04-18 17:48:35 +08:00
afce993ca7 [feature](load)(csv) CSV import and export support header (#8765)
- Add two new types to stream load boker load: **csv_with_names** and **csv_with_name_sand_types**
- Add two new types to export: **csv_with_names** and **csv_with_names_and_types**
2022-04-18 15:29:18 +08:00
dffd8513c6 Modify some bad link in docs. (#9078)
Modify some bad link in docs.
2022-04-18 13:29:22 +08:00
9051ed7c7d Revert "[Refactor] remove some useless code (#8976)" (#9074)
This reverts commit de7dce4df84fcbfbbaf715cbac151e802321f80f.
Reverts apache/incubator-doris#8976
This cause BE ut failed: sh run-be-ut.sh --run --filter OlapTableSinkTest.*

```
==62008==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7ffff36867c0 in thread T0
```
2022-04-18 12:01:14 +08:00
38b9f02c5f [release] Add download url for 1.0.0 (#9071) 2022-04-18 11:24:21 +08:00
Pxl
44d37acbff Change date/datetime result type to bigint (#8975) 2022-04-18 09:56:28 +08:00
de7dce4df8 [Refactor] remove some useless code (#8976) 2022-04-18 09:55:54 +08:00
04287cabb2 [Forbidden](Vec) Switch to non-vec engine when outer join + not null column (#8979)
* [Forbidden](Vec) Switch to non-vec engine when outer join + not null column

Vectorized code will occur `core` in the case of ```outer join + not null column```, such as issue #7901
So we need to fall back from vectorized mode to non-vectorized mode when we encounter this situation.

If the nullside column of the outer join is a column that must return non-null like count(*)
then there is no way to force the column to be nullable.
At this time, vectorization cannot support this situation,
so it is necessary to fall back to non-vectorization for processing.
For example:
  Query: set enable_vectorized_engine=true
  Query: select * from t1 left join (select k1, count(k2) as count_k2 from t2 group by k1) tmp on t1.k1=tmp.k1
  Result: Query goes non-vectorized engine
2022-04-18 09:55:33 +08:00
be0ba76dff [Refactor] Use '#pragma once' to replace '#define' and '#endif' (#9062) 2022-04-18 09:54:59 +08:00
c71ffc01de [Refactor] Cleanup some unused include (#9063) 2022-04-18 09:52:31 +08:00
b260bcba22 Modify some documents in the English version of doris (#9064)
Modify some documents in the English version of doris
2022-04-18 08:25:21 +08:00
352d93b566 [Refactor][doc] Fixed some issues in en and zh-CN docs (#9068)
* Modify some error in en and zh-CN docs
2022-04-18 08:24:57 +08:00
1a2620b724 [fix][doc]fix max_send_batch_parallelism_per_job default value (#9038)
* fix max_send_batch_parallelism_per_job default value
2022-04-17 15:22:07 +08:00
0f8a7ff985 [Refactor](ReportHandler) Remove some unused schema_hash code in fe (#9005) 2022-04-17 10:01:34 +08:00
a749f98e44 Fix get-starting en and zh-CN docs. (#9059)
Co-authored-by: smallhibiscus <844981280>
2022-04-16 20:02:14 +08:00
7278ad460c fix refactor doc bug (#9058)
fix refactor doc bug
2022-04-16 17:24:27 +08:00
c7a098c1b0 [fix](sql_block_rule) optimization of alter sql_block_rule stmt (#8971)
Optimization of alter sql_block_rule stmt.
2022-04-16 11:05:31 +08:00
b92dd11a1d [fix][doc]Data import document modification (#9057)
* [fix][doc]Data import document modification
2022-04-16 10:19:07 +08:00
c431da3bf8 [Refactor][doc] Fix bad link in documentation (#9053)
Fix bad link in documentation
2022-04-15 19:50:35 +08:00
2c7327fb7c add doc tpch and ssb (#9052)
add doc tpch and ssb
2022-04-15 19:22:38 +08:00
34457cd768 add show load warning (#9051)
add show load warning
2022-04-15 18:53:20 +08:00
6215e5b09f Add best practice docs in advanced module (#9049)
Add best practice docs in advanced module
2022-04-15 17:22:16 +08:00
556602a5f1 [Refactor][Doc]Add part of the document content for Get-Starting (#8867)
* Add part of the document content for Get-Starting
2022-04-15 16:38:42 +08:00