For Unique Key MoW table, if there are duplicate keys in one single load job and there's multiple segments, we need to calculate delete bitmap to mark these duplicate keys deleted.
Add a check here to detect any bugs that might cause duplicate keys.
Hive store all the data without partition columns to a default partition named __HIVE_DEFAULT_PARTITION__.
Doris will fail to get the this partition when the partition column type is INT or something else that
__HIVE_DEFAULT_PARTITION__ couldn't convert to.
This pr is to support hive default partition, set the column value to NULL for the missing partition columns.
Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array.
And the later bytes will overwrite the previous one and make wrong bytes order among the network.
Copy the byte array and then fill it into network.
* disable setting storage policy on MoW table
* fix error in regression test
* make the name of test table unique
* use Strings.isNullOrEmpty to replace equals
* fix error in if statement
Different version numbers are used to calculate the delete bitmap between segments and rowsets, resulting in the failure of the last update of the delete bitmap.
* Support mapping es date format, default/yyyy-MM-dd HH:mm:ss/yyyy-MM-dd/epoch_millis
* Replace simple json with jackson, resolve column order random problem
* Add es array doc version
Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param,
which enables to limit the number of elements in result array.
Demo:
```
# HELP doris_fe_mtmv_job Total job number of mtmv.
# TYPE doris_fe_mtmv_job gauge
doris_fe_mtmv_job{type="TOTAL-JOB"} 1
doris_fe_mtmv_job{type="ACTIVE-JOB"} 1
# HELP doris_fe_mtmv_task Running task number of mtmv.
# TYPE doris_fe_mtmv_task gauge
doris_fe_mtmv_task{type="RUNNING-TASK"} 0
doris_fe_mtmv_task{type="PENDING-TASK"} 0
doris_fe_mtmv_task{type="FAILED-TASK"} 0
doris_fe_mtmv_task{type="TOTAL-TASK"} 1
```
decode method is only used for big int and other decode method is only used in unit test.
I remove the useless method and we can remove mempool parameter from decode method.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
The logic of topn and full sort is wrong when there are both offsets and limits, the offset is not considered when doing the max heap optimization, which will lead to wrong result.
* [Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint.
`_simdjson_set_column_value` could become a hot spot while parsing json in simdjson mode,
introduce `_prev_positions` to cache results for previous row (keyed as index in JSON object) due to the json name field order,
should be quite the same between each lines
* fix case
when emitCsgCmp, we should check if there is some missed edges should be used as connection edge. If there is missed edge but can't be used as connection edge, the emitCsgCmp should return and seek for another plan.
Add use_fix_replica session variable, so that we can be better debug replica inconsistencies problem.
If use_fix_replica default is -1, which means not fix,
else we will choose the {use_fix_replica} smallest replica.
function pushdown: #10355
NGram BloomFilter Index apply like pushdown: #11579
Enabled by default, make sure it stays active.
If NGram BloomFilter Index is not used, this like pushdown can be replaced by #15917, which can push down all expressions including like.
use StringRef instead of string_view operator == for vectorized impl for array_contains function.
- test data: 10,000,000 rows with a ARRAY<STRING> column. There are 10 elements, average length 11 chars, in the array column in each row.
- test SQL: `select count() from test_like_array where array_contains(s_arr, 'xxxxxxxx');`
- test result: 0.76 sec vs. 0.52 sec, 30% time reduced
[WARNING:gensrc/thrift/parquet.thrift:22] Uncaptured doctext at on line 18.
[WARNING:gensrc/thrift/parquet.thrift:23] Uncaptured doctext at on line 22.
[WARNING:gensrc/thrift/parquet.thrift:436] Uncaptured doctext at on line 428.
WARNING in asset size limit: The following asset(s) exceed the recommended size limit (244 KiB).WARNING in asset size limit: The following asset(s) exceed the
recommended size limit (244 KiB). This can impact web performance
WARNING in entrypoint size limit: The following entrypoint(s) combined asset size exceeds the recommended limit
Warning : Macro "NonTerminator" has been declared but never used.
A const reference member variables as class member stores a temporary object, which cannot be got after the temporary object being destroyed, cause be core dump while enable debug level log
_broker_addr has been destroyed in BrokerFileReader
add a defer in fold constant to close.
add more type when call _get_result function in fold constant.3.
fix in can't handle null. eg:select 1 in (2, NULL, 1);
in java udf jni_ctx will be nullptr, so call close will be core dump.
Describe your changes.
when stream load with 2pc, the table was droped before commit, it will get error commit or abort, trasaction can not finish.
if commit or abort ,will get error:
{
"status": "ANALYSIS_ERROR",
"msg": "errCode = 7, detailMessage = unknown table, tableId=52579"
}
after this pr, i can abort success.
The data type `NUMBER(p,s)` of oracle has some different of doris decimal type in semantics.
For Oracle Number(p,s) type:
1.
if s<0 , it means this is an Interger. This `NUMBER(p,s)` has (p+|s| ) significant digit,
and rounding will be performed at s position.
eg: if we insert 1234567 into `NUMBER(5,-2)` type, then the oracle will store 1234500. In this case,
Doris will use
int type (`TINYINT/SMALLINT/INT/.../LARGEINT`).
2. if s>=0 && s<p , it just like doris Decimal(p,s) behavior.
3. if s>=0 && s>p, it means this is a decimal(like 0.xxxxx).
p represents how many digits can be left to the left after the decimal point,
the figure after the decimal point s will be rounded. eg: we can not insert 0.0123456 into `NUMBER(5,7)` type,
because there must be two zeros on the right side of the decimal point,
we can insert 0.0012345 into `NUMBER(5,7)` type. In this case, Doris will use `DECIMAL(s,s)`
4. if we don't specify p and s for `NUMBER(p,s)` like `NUMBER`,
the p and s of `NUMBER` are uncertain. In this case, doris can not determine p and s,
so doris can not determine data type.
1. Enhencement:
For single-charset column separator,csv_reader use another method of `split value`.
2. BugFix
Set `json` file format loading to be sensitive.