1. markConstantConjunct method shouldn't change the input conjunct
2. Use Expr's comeFrom method to check if the pushed expr is one of the group by exprs, this is the correct way to check if the conjunct can be pushed down through the agg node.
3. migrateConstantConjuncts should substitute the conjuncts using inlineViewRef's analyzer to make the analyzer recognize the column in the conjuncts in the following analyze phase
which operation can update this time:
1.when refresh catalog,lastUpdateTime of catalog will be update
2.when refresh db,lastUpdateTime of db will be update
3.when reload table schema to cache,lastUpdateTime of dbtable will be update
4.when receive add/drop table event,lastUpdateTime of db will be update
5.when receive alter table event,lastUpdateTime of table will be update
failed to run sql:
```sql
create alias function f1(int) with parameter(n) as dayofweek(hours_add('2023-06-18', n))
create alias function f2(int) with parameter(n) as dayofweek(hours_add(makedate(year('2023-06-18'), f1(3)), n))
select f2(f1(3))
```
it will throw an exception: f1 is not a builtin-function.
because f2's original function contains f1, and f1 is not a builtin-function, should be rewritten firstly.
we should avoid of it. And we will support it later.
1. check minio region, set default region if user region is not provided, and throw minio error msg
2. support read root path s3://bucket1
3. fix max compute public access
The getTable function in CascadesContext only handles the internal catalog case (try to find table only in internal
catalog and dbs). However, it should take all the external catalogs into consideration, otherwise, it will failed to find a
table or get the wrong table while querying external table. This pr is to fix this bug.
Add more profile information for external table plan time. Including init and finalize scan node time, getSplits time, create scan range time, get all partitions time and get all files for all partitions time. Also modified the Indentation to make it easier to read.
This is an example output of the new profile summary.
```
Execution Summary:
- Analysis Time: 3ms
- Plan Time: 26s885ms
- JoinReorder Time: N/A
- CreateSingleNode Time: N/A
- QueryDistributed Time: N/A
- Init Scan Node Time: 1ms
- Finalize Scan Node Time: 26s868ms
- Get Splits Time: 26s554ms
- Get PARTITIONS Time: 20s189ms
- Get PARTITION FILES Time: 6s289ms
- Create Scan Range Time: 314ms
- Schedule Time: 1s67ms
- Fetch Result Time: 56ms
- Write Result Time: 0ms
- Wait and Fetch Result Time: 57ms
```
When we were altering the catalog, we did not verify the new parameters of the catalog, and now we have added verification
My changes:
When We are altering the catalog, I have carried out a full inspection, and if an exception occurs, the parameters will be rolled back
We should not remove any limit from uncorrelated subquery. For Example
```sql
-- should return nothing, but return all tuple of t if we remove limit from exists
SELECT * FROM t WHERE EXISTS (SELECT * FROM t limit 0);
-- should return the tuple with smallest c1 in t,
-- but report error if we remove limit from scalar subquery
SELECT * FROM t WHERE c1 = (SELECT * FROM t ORDER BY c1 LIMIT 1);
```
1. materialization table skip the process of backup
2. materialization table to full refresh mode atomically
3. Handle the case where the `rename table` is null
When creating a Hive external table for Spark loading, the Hive external table includes related information such as the Hive Metastore. However, when submitting a job, it is required to have the hive-site.xml file in the Spark conf directory; otherwise, the Spark job may fail with an error message indicating that the corresponding Hive table cannot be found.
The SparkEtlJob.initSparkConfigs method sets the properties of the external table into the Spark conf. However, at this point, the Spark session has already been created, and the Hive-related parameters will not take effect. To ensure that the Spark Hive catalog properly loads Hive tables, you need to set the Hive-related parameters before creating the Spark session.
Co-authored-by: zhangshixin <zhangshixin@youzan.com>
consider the window function:
```sql
substr(
ref_1.cp_type,
sum(CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END) OVER (),
1)
```
Before the pr, only "CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END" is pushed down.
But both "ref_1.cp_type" and "CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END"
should be pushed down.
This pr fix it
In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB.
If not unescape, then later jsonb parse will be failed
fix: #21136
mem tracker group uses class static variables instead of global variables
https://stackoverflow.com/questions/2204608/does-c-call-destructors-for-global-and-class-static-variables
TODO: A mem tracker manager is required, don't use global variables, it will sad
==3623982==ERROR: AddressSanitizer: heap-use-after-free on address 0x60f0000056b8 at pc 0x56478bbe3ae0 bp 0x7f20953d2270 sp 0x7f20953d2268
READ of size 8 at 0x60f0000056b8 thread T41 (memory_tracker_)
*** Query id: 0-0 ***
*** Aborted at 1689749969 (unix time) try "date -d @1689749969" if you are using GNU date ***
*** Current BE git commitID: b3e9cad48e ***
*** SIGSEGV address not mapped to object (@0x0) received by PID 3623982 (TID 3624277 OR 0x7f19e06dd640) from PID 0; stack trace: ***
#0 0x56478bbe3adf in std::__shared_ptr::operator bool() const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1295:16
#1 0x56478bbe306e in doris::MemTracker::refresh_profile_counter() /doris/be/src/runtime/memory/mem_tracker.h:149:13
#2 0x56478bbec669 in doris::MemTrackerLimiter::refresh_all_tracker_profile() /doris/be/src/runtime/memory/mem_tracker_limiter.cpp:119:22
#3 0x564788f53fa0 in doris::Daemon::memory_tracker_profile_refresh_thread() /doris/be/src/common/daemon.cpp:295:9
#4 0x564788f5d04b in doris::Daemon::start()::$_4::operator()() const /doris/be/src/common/daemon.cpp:473:30
#5 0x564788f5cff6 in void std::__invoke_impl(std::__invoke_other, doris::Daemon::start()::$_4&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
#6 0x564788f5cf78 in std::enable_if, void>::type std::__invoke_r(doris::Daemon::start()::$_4&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2
#7 0x564788f5cdae in std::_Function_handler::_M_invoke(std::_Any_data const&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291:9
#8 0x56478903f576 in std::function::operator()() const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
#9 0x56478c4a35af in doris::Thread::supervise_thread(void*) /doris/be/src/util/thread.cpp:465:5
#10 0x7f217c8a244f in start_thread nptl/pthread_create.c:473:8
#11 0x7f217cb27d52 in __clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
0x60f0000056b8 is located 56 bytes inside of 168-byte region [0x60f000005680,0x60f000005728)
freed by thread T0 here:
#0 0x564788e7280d in operator delete(void*) (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x1758280d) (BuildId: 219493cc924323ee)
#1 0x56478acec1d5 in std::default_delete::operator()(doris::MemTrackerLimiter*) const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
#2 0x56478ace9faf in std::unique_ptr >::~unique_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4
#3 0x56478ace1471 in doris::ShardedLRUCache::~ShardedLRUCache() /doris/be/src/olap/lru_cache.cpp:581:1
#4 0x56478ace14c8 in doris::ShardedLRUCache::~ShardedLRUCache() /doris/be/src/olap/lru_cache.cpp:572:37
#5 0x56478acd0984 in std::default_delete::operator()(doris::Cache*) const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
#6 0x56478acceddf in std::unique_ptr >::~unique_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4
#7 0x56478ad96dc6 in doris::StoragePageCache::~StoragePageCache() /doris/be/src/olap/page_cache.h:78:7
#8 0x7f217ca54146 in __run_exit_handlers stdlib/exit.c:108:8
previously allocated by thread T0 here:
#0 0x564788e71fad in operator new(unsigned long) (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x17581fad) (BuildId: 219493cc924323ee)
#1 0x56478ace9c90 in std::_MakeUniq::__single_object std::make_unique, std::allocator > const&>(doris::MemTrackerLimiter::Type&&, std::__cxx11::basic_string, std::allocator > const&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:962:30
#2 0x56478acde930 in doris::ShardedLRUCache::ShardedLRUCache(std::__cxx11::basic_string, std::allocator > const&, unsigned long, doris::LRUCacheType, unsigned int, unsigned int) /doris/be/src/olap/lru_cache.cpp:526:20
#3 0x56478ace22e1 in doris::new_lru_cache(std::__cxx11::basic_string, std::allocator > const&, unsigned long, doris::LRUCacheType, unsigned int) /doris/be/src/olap/lru_cache.cpp:670:16
#4 0x56478ad91da2 in doris::StoragePageCache::StoragePageCache(unsigned long, int, long, unsigned int) /doris/be/src/olap/page_cache.cpp:47:17
#5 0x56478ad9156e in doris::StoragePageCache::create_global_cache(unsigned long, int, long, unsigned int) /doris/be/src/olap/page_cache.cpp:31:29
#6 0x56478b98b3d3 in doris::ExecEnv::_init_mem_env() /doris/be/src/runtime/exec_env_init.cpp:251:5
#7 0x56478b98946c in doris::ExecEnv::_init(std::vector > const&) /doris/be/src/runtime/exec_env_init.cpp:182:5
#8 0x56478b987139 in doris::ExecEnv::init(doris::ExecEnv*, std::vector > const&) /doris/be/src/runtime/exec_env_init.cpp:98:17
#9 0x564788e79b50 in main /doris/be/src/service/doris_main.cpp:429:5
#10 0x7f217ca38564 in __libc_start_main csu/../csu/libc-start.c:332:16
Thread T41 (memory_tracker_) created by T0 here:
#0 0x564788e1fcaa in pthread_create (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x1752fcaa) (BuildId: 219493cc924323ee)
#1 0x56478c4a2366 in doris::Thread::start_thread(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::function const&, unsigned long, scoped_refptr*) /doris/be/src/util/thread.cpp:419:15
#2 0x564788f59b91 in doris::Status doris::Thread::create(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, doris::Daemon::start()::$_4 const&, scoped_refptr*) /doris/be/src/util/thread.h:50:16
#3 0x564788f58165 in doris::Daemon::start() /doris/be/src/common/daemon.cpp:471:10
#4 0x564788e79a96 in main /doris/be/src/service/doris_main.cpp:420:12
#5 0x7f217ca38564 in __libc_start_main csu/../csu/libc-start.c:332:16
"Enabling two-phase query for similar select * from tbl into outfile "file:/xxx/" format as orc; queries can lead to performance issues due to the fetch operation."
Problem:
When inferring predicate, we assume that slot reference need to be inferred. But in this case:
carete table tb1(l1 smallint) ...;
create table tb2(l2 int) ...;
select * from tb1 inner join tb2 where tb1.l1 = tb2.l2 and tb2.l2 = 1;
We can not get tb1.l1 = 1 filter because we will add a cast to l1 (Cast smallint to int l1) = l2.
Solved:
Add cast consideration when inferring predicate, also add change judgement when judging equals to slotreference and cast expression. But when we want to infer predicate from bigger type cast to smaller type, it is logical error.
For example:
select * from tb1 inner join tb2 where tb1.l1 = cast(tb2.l2 as smallint) and tb2.l2 = (number between smallint max and intmax);
tb2.l2 value can not infer to left side because tb1.l1 would be false value, and when we add one more condition like tb1.l1 = tb3.l3(smallint). It would cause this predicate be false.