HOW to reproduce?
Add export CMAKE_BUILD_TYPE=DEBUG in custom_env.sh. Then build thirdparty in MAC.
There are two problems:
build vectorscan with DEBUG type, will got unused-but-set-variable error:
doris/thirdparty/src/vectorscan-vectorscan-5.4.7/src/nfa/mcclellancompile.cpp:1485:13: error: variable 'total_daddy' set but not used [-Werror,-Wunused-but-set-variable]
u16 total_daddy = 0;
gflags will output libgflags_debug.a instead of libgflags.a while build with DEBUG type. Then we will got error can not find library gflags error.
To avoid these errors, we set CMAKE_BUILD_TYPE while build vectorscan and gflags.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: Adonis Ling <adonis0147@gmail.com>
The origin scan pools are in exec_env.
But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used.
All scan task will be submitted to the scan pool in scanner_scheduler.
BTW, reorganize the scan pool into 3 kinds:
local scan pool
For olap scan node
remote scan pool
For file scan node
limited scan pool
For query which set cpu resource limit or with small limit clause
TODO:
Use bthread to unify all IO task.
Some trivial issues:
fix bug that the memtable flush size printed in log is not right
Add RuntimeProfile param in VScanner
In #15037, I modified the build script of libgsasl to enable GSSAPI,
but it is still wrong, because the PATH does not include the `thirdparty/installed/bin`,
so when building libgsasl, it will report error:
`WARNING: MIT Kerberos krb5-config not found, disabling GSSAPI`
but `krb5-config` is in `thirdparty/installed/bin`.
Without GSSAPI, the libhdfs3 can not access hdfs with kerberos authentication.
On macOS, we need some extra libraries to build the codebase,
therefore two packages were introduced to the project. They are `binutils` and `gettext`.
It takes a lot of time to build these packages completely. This PR introduces a way to build the needed libraries
and other stuff are skipped to build. It can save the time to build the third-party libraries on macOS.
* [regression-test](mtmv) add mtmv write data regression test
* [regression-test](mtmv) add mtmv write data regression test
* [regression-test](mtmv) add mtmv write data regression test
* [regression-test](mtmv) add mtmv write data regression test
* [regression-test](mtmv) add mtmv write data regression test
Support new table value function `iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")`
we can use the sql `select * from iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` to get snapshots info of a table. The other iceberg metadata will be supported later when needed.
One of the usage:
Before we use following sql to time travel:
`select * from ice_table FOR TIME AS OF "2022-10-10 11:11:11"`;
`select * from ice_table FOR VERSION AS OF "snapshot_id"`;
we can use the snapshots metadata to get the `committed time` or `snapshot_id`,
and then, we can use it as the time or version in time travel clause
1. fix bug in estimation of min/max of Year
2. remove Utils.getLocalDatetimeFromLong(Long). this method is will throw exception if input parameter is too big. And this method is not used any more when we fix the above bug
Describe your changes.
Firstly having clause of Mysql is really very complex, we are hard to follow all rules, so we revert pr15143 to keep the logic the same as before.
Secondly the origin implementation has problem while having clause has multi-conditions.
For example:
case1: here v2 inside having clause use table column test_having_alias_tb.v2
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v2>1);
ERROR 1105 (HY000): errCode = 2, detailMessage = HAVING clause not produced by aggregation output (missing from GROUP BY clause?): (`v2` > 1)
case2: here v2 inside having clause use alias name v2 =sum(test_having_alias_tb.v2), another condition make logic of v2 differently.
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v>0 AND v2>1) ORDER BY id,v;
+------+------+------+
| id | v | v2 |
+------+------+------+
| 2 | 1 | 3 |
+------+------+------+
So here we try to make the having clause rules simple:
Rule1: if alias name inside having clause is the same as column name, we use column name not alias name;
Rule2: if alias name inside having clause do not have same name as column name, we use alias name;
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
The main purpose of this pr is to import `fileCache` for lakehouse reading remote files.
Use the local disk as the cache for reading remote file, so the next time this file is read,
the data can be obtained directly from the local disk.
In addition, this pr includes a few other minor changes
Import File Cache:
1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy.
2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`.
Other changes:
1. Add a new interface `fs()` for `FileReader`.
2. `IOContext` adds some statistical information to count the situation of `FileCache`
Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
1. generate lots of aggregate functions
2. support `group_concat(columns order by order_columns)` grammer
3. support and generate array aggregate/scalar functions, like `array_union`. we should support array grammar in the future, e.g. `select [1, 2, 3]`
4. add `checkLegalityBeforeTypeCoercion` and `checkLegalityAfterRewrite` function to check the legality of expression before type coercion and after rewrite, copy the semantic check of `FunctionCallExpr` to the checkLegality; remove the `ForbiddenMetricTypeArguments`; move the check of aes/sm4 crypto function from translator to checkLegalityBeforeTypeCoercion
5. refactor the `NullableAggregateFunction`: distinct is the first parameter, alwaysNullable is the second parameter; Fix some wrong initialize order: some function invoke super(distinct, alwaysNullable) but some function invoke super(alwaysNullable, distinct)
all conjuncts should be added before HashJoinNode init. Otherwise, some slots on conjuncts linked to the tuple not in intermediate tuple on HashJoinNode
For outer join / right outer join / right semi join, when HashJoinNode::pull->process_data_in_hashtable outputs a block, it will output all rows of a key in the hash table into a block, and the output of a key is completed After that, it will check whether the block size exceeds the batch size, and if it exceeds, the output will be terminated.
If a key has 2000w+ rows, memory overflow will occur when the subsequent block operations on the 2000w+ rows are performed.
Tablet::version_for_delete_predicate should travel all rowset metas in tablet meta which complex is O(N), however we can directly judge whether this rowset is a delete rowset by RowsetMeta::has_delete_predicate which complex is O(1).
As we won't call Tablet::version_for_delete_predicate when pick input rowsets for compaction, we can reduce the critical area of Tablet::_meta_lock.