The key keyword definition section of `sql_parser.cup` is unordered and messy:
1. It is almost unreadable
2. There are no rules to format it when we make a change to it
3. **It takes unnecessary effort to resolve conflict caused by the unordered keywords**
We can apply some simple rules to format it:
1. Sort in lexicographical order
4. Break into several "sections", keywords in each section have the same prefix `KW_${first_letter}`
5. Every 2 sections are connected with an empty line containing only 4 white spaces
e.g.
```
terminal String
KW_A...
KW_B...
...
KW_Z...
```
dump memo info and physical plan in stdout and log
set `enable_nereids_trace` variable true/false to open/close this dump.
following is a fragment of memo:
```
Group[GroupId#8]
GroupId#8(plan=PhysicalHashJoin ( type=INNER_JOIN, hashJoinCondition=[(r_regionkey#250 = n_regionkey#255)], otherJoinCondition=Optional.empty, stats=null )) children=[GroupId#6 GroupId#7 ] stats=(rows=25, isReduced=false, width=2)
GroupId#8(plan=PhysicalHashJoin ( type=INNER_JOIN, hashJoinCondition=[(r_regionkey#250 = n_regionkey#255)], otherJoinCondition=Optional.empty, stats=null )) children=[GroupId#7 GroupId#6 ] stats=(rows=25, isReduced=false, width=2)
```
As mentioned in #13074, there will be some problem in ColumnVector<int>::insert_many_in_copy_way.
Column::insert_xxx functions will append some data, they should reserve or resize before append data.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
The toThrift method will be called mutilple times for sending data to different be but the changes of resolvedTupleExprs should be done only once. This pr make sure the resolvedTupleExprs can only be changed only once
In some case, we need to run manual compaction via http interface
concurrently, so we remove the mutex and tablet's compaction lock
is enough to prevent concurrent compaction in tablet.
Co-authored-by: yixiutt <yixiu@selectdb.com>
* [chore](release build) copy license and notice file to output folder and strip debug info from meta tool
Co-authored-by: yiguolei <yiguolei@gmail.com>
When schema change and compaction is executing simutaneously, both
nullable and not nullable data can be read for the same column, need to
reset _nullmap for each Block when converting Block data, or else Column
case will be wrong.
This pr is mainly to optimize statistical tasks. Includes the following:
1. No longer generate statistics tasks for empty tables, and move the logic of skipping empty partitions to the process of task generation.
2. Adjusted the default configuration related to statistics to improve the efficiency of statistics collection, parameters include `cbo_concurrency_statistics_task_num`,`statistic_job_scheduler_execution_interval_ms` and `statistic_task_scheduler_execution_interval_ms`.
3. Optimize the display of statistical tasks.
4. In addition, some `org.apache.parquet.Strings` packages are changed to `com.google.common.base.Strings` to avoid the exception that Strings cannot be found in local debug.
etc.
This config is never used online and there exist bugs if enable this config. So that I remove this config and related tests.
Co-authored-by: yiguolei <yiguolei@gmail.com>
For string/varchar/text type, the length field is fixed to 2GB. (`ColumnMetaPB`)
We don't actually have to allocate 2GB for every string type because we
will reallocate the precise size of memory for the string in
`WrapperField::from_string()`
```
Status from_string(const std::string& value_string, const int precision = 0,
const int scale = 0) {
if (_is_string_type) {
if (value_string.size() > _var_length) {
Slice* slice = reinterpret_cast<Slice*>(cell_ptr());
slice->size = value_string.size();
_var_length = slice->size;
_string_content.reset(new char[slice->size]);
slice->data = _string_content.get();
}
}
return _rep->from_string(_field_buf + 1, value_string, precision, scale);
}
```
Convert Parquet column into doris column via batch method.
In the previous implementation, only numeric types can be converted in batches,
and other types can only be inserted one by one.
This process will generate repeated virtual function calls and container expansion.