This pr does three things:
1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first.
2. add p2 test for export
3. modify docs
segcompaction_p1 contains fairly large load jobs, which will exceed
memlimit or timeout in pipeline under such heavy loads.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
This patchset applies the following changes:
using vertical compaction machanism to do segcompaction
basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter
add segcompaction specific ut and regression tests
There is something wrong with the `test_broker_load` suite(s3 auth problem).
So I ignore this case temporarily.
cc @wsjz , please help to solve it and add it back
adjust nullable on empty set should apply after unnested sub-query
some function should propagate nullable when args are datev2 or datetimev2
add back tpch sf0.1 nereids regression test
Main subtask of [DSIP-28](https://cwiki.apache.org/confluence/display/DORIS/DSIP-028%3A+Suppot+MySQL+Load+Data)
## Problem summary
Support mysql load syntax as below:
```sql
LOAD DATA
[LOCAL]
INFILE 'file_name'
INTO TABLE tbl_name
[PARTITION (partition_name [, partition_name] ...)]
[COLUMNS TERMINATED BY 'string']
[LINES TERMINATED BY 'string']
[IGNORE number {LINES | ROWS}]
[(col_name_or_user_var [, col_name_or_user_var] ...)]
[SET (col_name={expr | DEFAULT} [, col_name={expr | DEFAULT}] ...)]
[PROPERTIES (key1 = value1 [, key2=value2]) ]
```
For example,
```sql
LOAD DATA
LOCAL
INFILE 'local_test.file'
INTO TABLE db1.table1
PARTITION (partition_a, partition_b, partition_c, partition_d)
COLUMNS TERMINATED BY '\t'
(k1, k2, v2, v10, v11)
set (c1=k1,c2=k2,c3=v10,c4=v11)
PROPERTIES ("auth" = "root:", "strict_mode"="true")
```
Note that in this pr the property named `auth` must be set since stream load need auth. I will optimize it later.
the cold-hot separation feature is still
under development. And seems there are some unsolved feature remains.
So I add a fe config enable_storage_policy, and default is false, to disable the creation and usage of storage policy by default.
So that user can aware that he is using an experimental feature on his own, and it will not be released formally in v1.2.0.
Disable storage policy by default, user can not use or create storage policy. Configured by enable_storage_policy.
Remove property remote_storage_policy, it is duplicate with storage_policy
Change the persist field in DataProperty.java.
And remove remoteCooldownTime from DataProperty, because it can be got from StoragePolicy.
Support running transactional insert operation with new scan framework. eg:
admin set frontend config("enable_new_load_scan_node" = "true");
begin;
insert into tbl1 values(1,2);
insert into tbl1 values(3,4);
insert into tbl1 values(5,6);
commit;
Add some limitation to transactional insert
Do not support non-literal value in insert stmt
Fix some issue about array type:
Forbid cast other non-array type to NESTED array type, it may cause BE crash.
Add getStringValueForArray() method for Expr, to get valid string-formatted array type value.
Add useLocalSessionState=true in regression-test jdbc url
without this config, the jdbc driver will send some init cmd each time it connect to server, such as
select @@session.tx_read_only.
But when we use transactional insert, after begin command, Doris do not support any other type of
stmt except for insert, commit or rollback.
So adding this config to let the jdbc NOT send cmd when connecting.
Issue Number: close#12574
This pr adds `NewJsonReader` which implements GenericReader interface to support read json format file.
TODO:
1. modify `_scann_eof` later.
2. Rename `NewJsonReader` to `JsonReader` when `JsonReader` is deleted.
1. Modify default behavior of `build.sh`
The `BUILD_JAVA_UDF` is default ON, so that jvm is needed for compilation and runtime.
2. Add docker-compose for MySQL 5.7, PostgreSQL 14 and Hive 2
See `docker/thirdparties/docker-compose`.
3. Add some regression test cases for jdbc query on MySQL, PG and Hive Catalog
The default is `false`, if set to true, you need first start docker for MySQL/PG/Hive.
4. Support `if not exists` and `if exists` for create/drop resource and create/drop encryptkey
1. Refactor the file reader creation in FileFactory, for simplicity.
Previously, FileFactory had too many `create_file_reader` interfaces.
Now unified into two categories: the interface used by the previous BrokerScanNode,
and the interface used by the new FileScanNode.
And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files.
2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode
3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside.
4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.
1. Fix issue #13115
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
Support register suite plugin to add third-party function.
See
1. register in: ${DORIS_HOME}/regression-test/plugins/plugin_example.groovy
2. usage: ${DORIS_HOME}/regression-test/suites/demo/test_plugin.groovy
3. doc: ${DORIS_HOME}/docs/zh-CN/developer-guide/regression-testing.md
1. add a new config in regression-conf.groovy
enableHdfs, default is false, to skip tests with hdfs
2. fix a bug that when double type column result is null, exception will be thrown
* [Feature][regression-test]CSV import and export support header (#8487)
1.Add two new types to stream load boker load: csv_with_names and csv_with_name_sand_types
2.Add two new types to export: csv_with_names and csv_with_names_and_types
support suite block to specify multiple groups.
TestAction support compare result to iterator, local file and http stream.
support print teamcity service message.
abandon the logical: generate groovy file for sql file
support 3 levels parrallel: script file, suite block, thread action
support specify JAVA_OPTS for boot shell
avoid jvm metaspace oom
use -d to run the suite in some directories, instead of -g. and -g is used to specify groups
Add scalable regression testing framework(#7584)
contains
- Test framework making by groovy, and support built-in **readable DSL** named as `Action`
- Demo exists in `${DORIS_HOME}/regression-test/data/demo`
- Chinese doc exist in `${DORIS_HOME}/docs/zh-CN/developer-guide/regression-testing.md`
English document coming soon