Commit Graph

4662 Commits

Author SHA1 Message Date
31e40191a8 [Refactor] add vpre_filter_expr for vectorized to improve performance (#9508) 2022-05-22 11:45:57 +08:00
0c4b47756a [enhancement](community): enhance java style (#9693)
Enhance java style.

Now: checkstyle about code order is in this page--Class and Interface Declarations

This pr can make idea auto rearrange code
2022-05-20 15:24:30 +08:00
61a60d1dcc [code style] minor update for code style (#9695) 2022-05-20 11:47:49 +08:00
8fa677b59c [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner (#9666)
* [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner
1. fix bug of vjson scanner not support `range_from_file_path`
2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different
3. fix bug of vparquest filter_block reference of column in not 1
4. refactor code to simple all the code

It only changed vectorized load, not original row based load.

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-20 11:43:03 +08:00
6f61af7682 [Vectorized][java-udf] add datetime&&largeint&&decimal type to java-udf (#9440) 2022-05-20 10:26:09 +08:00
5fa6e892be [fix](broker-scan-node) Remove trailing spaces in broker_scanner. Make it consistent with hive and trino behavior. (#9190)
Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't.
This would cause different query result with hive.

Add a new session variable "trim_tailing_spaces_for_external_table_query".
If set to true, when reading csv from broker scan node, it will trim the tailing space of the column
2022-05-20 09:55:13 +08:00
defdae1e7d [improvement](stream-load) adjust read unit of http to optimize stream load (#9154) 2022-05-20 09:52:36 +08:00
1e940f28b0 [docs] Fix error command of meta tool docs (#9590)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-20 09:36:26 +08:00
c2d41c84bf [feature](nereids): add join rules base code (#9598) 2022-05-20 08:18:08 +08:00
2c79d223e4 [refactor][rowset]move rowset writer to a single place (#9368) 2022-05-19 23:57:02 +08:00
c048b1f0f9 [fix](sparkload): fix min_value will be negative number when maxGlobalDictValue exceeds integer range (#9436) 2022-05-19 23:56:24 +08:00
ef65f484df [Enhancement] improve parquet reader via arrow's prefetch and multi thread (#9472)
* add ArrowReaderProperties to parquet::arrow::FileReader

* support perfecth batch
2022-05-19 23:52:01 +08:00
1355bc162b [Enhance] Add host info to heartbeat error msg (#9499) 2022-05-19 23:45:53 +08:00
Pxl
6951c42d5c [Bug][Vectorized] fix schema change add varchar type column default value get wrong result (#9523) 2022-05-19 23:38:57 +08:00
c09858671d [improvement][performance] improve lru cache resize performance and memory usage (#9521) 2022-05-19 23:37:59 +08:00
939daa07f1 [fix] fix Code Quality Analysis failed (#9685) 2022-05-19 23:13:47 +08:00
0f9ef26576 [Bug] Fix timestamp_diff issue when timeunit is year and month (#9574) 2022-05-19 21:24:43 +08:00
73c4ec7167 Fix some typos in be/. (#9681) 2022-05-19 20:55:39 +08:00
87e3904cc6 Fix some typos for docs. (#9680) 2022-05-19 20:55:21 +08:00
cbc7b167b1 [Feature] cancel load support state (#9537) 2022-05-19 16:37:56 +08:00
119ff2c02d [enhancement] Improve debugging experience. (#9677) 2022-05-19 16:36:37 +08:00
235d586f11 [style](fe) code correct rules and name rules (#9670)
* [style](fe) code correct rules and name rules

* revert some change according to comments
2022-05-19 16:36:03 +08:00
7c2db79b73 [BUG] fix bug for vectorized compaction and some storage vectorization bug (#9610) 2022-05-19 16:35:15 +08:00
cbf1e20fbc [doc]update streamload 2pc doc (#9651)
Co-authored-by: wudi <>
2022-05-19 14:30:17 +08:00
7a9bf5b23e [FeConfig](Project) Project optimization is enabled by default (#9667) 2022-05-19 14:03:14 +08:00
86b2c01e85 [refactor][regressiontest] reorder license header and import statement (#9672) 2022-05-19 14:00:33 +08:00
dd5e9fa9a4 [doc] Fixed a error in the Bitmap Index section of the document (#9679) 2022-05-19 13:55:52 +08:00
3efe97e73c [website] fix doris website with no link to the Privacy Policy. (#9665)
All websites must link to the Privacy Policy
2022-05-18 22:49:49 +08:00
a3183ec45c [fix](planner) unnecessary cast will be added on children in CaseExpr sometimes (#9600)
unnecessary cast will be added on children in CaseExpr because use symbolized equal to compare to `Expr`'s type.
it will lead to expression compare mistake and then lead to expression substitute failed when use `ExprSubstitutionMap`
2022-05-18 22:44:51 +08:00
6602adf499 [regression test] Add compaction regression test case for different data models (#9660) 2022-05-18 17:12:20 +08:00
bdaf0b3fcc [fix](storage) low_cardinality_optimize core dump when is null predicate (#9586)
Issue Number: close #9555
Make the last value of the dictionary null, when ColumnDict inserts a null value,
add the encoding corresponding to the last value of the dictionary·
2022-05-18 14:57:13 +08:00
c9ab5e22fe [fixbug](vec-load) fix core of segment_writer while it is not thread-safe (#9569)
introduce in stream-load-vec #9280, it will cause multi-thread
operate to same segment_write cause BetaRowset enable multi-thread
of memtable flush, memtable flush call rowset_writer.add_block, it
use member variable _segment_writer to write, so it will cause
multi-thread in segment write.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-05-18 11:29:15 +08:00
94c89e8a37 [improment](planner) push down predicate past two phase aggregate (#9498)
Push down predicate past aggregate cannot push down predicate past 2 phase aggregate.

origin plan is like this:
```
second phase agg (conjuncts on olap scan node tuples)
|
first phase agg
|
olap scan node
```
should be optimized to
```
second phase agg
|
first phase agg
|
olap scan node (conjuncts on olap scan node tuples)
```
2022-05-18 10:09:39 +08:00
682cc14182 [bug] (init) Java version check fail (#9607) 2022-05-18 07:47:03 +08:00
bfb1ab059d [BUG] fix information_schema.columns results not correctly on vec engine (#9612)
* VSchemaScanNode get_next bugfix

* add regression-test case for VSchemaScanNode

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-05-18 07:44:32 +08:00
b6f5c89f6c [regression test] add some case for json load regression test (#9614)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-05-18 07:43:51 +08:00
Pxl
26353ba8b5 [clang build]fix clang compile error (#9615) 2022-05-18 07:42:31 +08:00
908f9cb7b9 [Improvement][ASAN] make BE can exit normally and ASAN memory leak checking work (#9620) 2022-05-18 07:40:57 +08:00
4312ef93d7 [Improvement] reduce string size in serialization (#9550) 2022-05-17 22:38:34 +08:00
7d9c25e718 [config] Remove some old config and session variable (#9495)
1. Remove session variable "enable_lateral_view"
2. Remove Fe config: enable_materialized_view
3. Remove Fe config: enable_create_sync_job
4. Fe config dynamic_partition_enable is only used for disable dynamic partition scheduler.
2022-05-17 22:37:11 +08:00
2ba81899d0 [fix] fix bug that replica can not be repaired duo to DECOMMISSION state (#9424)
Reset state of replica which state are in DECOMMISSION after finished scheduling.
2022-05-17 22:36:30 +08:00
4ba75d3195 [feature] Add StoragePolicyResource for Remote Storage (#9554)
Add StoragePolicyResource for Remote Storage
2022-05-17 20:17:33 +08:00
d95fe08458 [feature] group_concat support distinct (#9576) 2022-05-17 19:29:47 +08:00
ec2cd0083a [code format]Upgrade clang-format in BE Code Formatter from 8 to 13 (#9602) 2022-05-17 19:28:15 +08:00
7417f9dfa3 [doc]modified the spark-load doc (#9605) 2022-05-17 19:27:02 +08:00
0aac9489ae [doc]add largeint doc (#9609)
add largeint doc
2022-05-17 19:26:45 +08:00
536d8ca1ed [Bug][Vectorized] Fix insert bimmap column with nullable column (#9408)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-17 14:42:20 +08:00
1cc9653bd8 [Bug][Vectorized] Fix BE crash with delete condition and enable_storage_vectorization (#9547)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-17 14:01:22 +08:00
7d9fa04472 [fix](storage-vectorized) fix VMergeIterator core dump (#9564)
It could be re appeared on rowset with many segment, it means segment overlap. Maybe could not reappear it easily.
2022-05-17 11:58:59 +08:00
72e0042efb [feature-wip](hudi) Step1: Support create hudi external table (#9559)
support create hudi table
support show create table for hudi table

### Design
1. create hudi table without schema(recommanded)
```sql
    CREATE [EXTERNAL] TABLE table_name
    ENGINE = HUDI
    [COMMENT "comment"]
    PROPERTIES (
    "hudi.database" = "hudi_db_in_hive_metastore",
    "hudi.table" = "hudi_table_in_hive_metastore",
    "hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
    );
```

2. create hudi table with schema
```sql
    CREATE [EXTERNAL] TABLE table_name
    [(column_definition1[, column_definition2, ...])]
    ENGINE = HUDI
    [COMMENT "comment"]
    PROPERTIES (
    "hudi.database" = "hudi_db_in_hive_metastore",
    "hudi.table" = "hudi_table_in_hive_metastore",
    "hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
    );
```
When create hudi table with schema, the columns must exist in corresponding table in hive metastore.
2022-05-17 11:30:23 +08:00