Commit Graph

18263 Commits

Author SHA1 Message Date
4d4cd43458 [Fix](Nereids) fix leading syntax problems and data mismatched problem (#32286)
- fix syntax problems of only one table used in leading or mistake usage of brace
  example: leading(t1),leading(t1 {t2})
- fix cte used in subquery of using leading
  example: with cte as (select c1 from t1) select count(*) from t1 join (select /*+ leading(cte t2) */ c2 from t2 join cte on c2 = cte.c1) as alias on t1.c1 = alias.c2;
  which cte used in subquery and subquery also have leading
- fix data mismatched with original plan cause of on predicate push to nullable side
  example: select count(*) from t1 left join t2 on c1 > 500 and c2 > 500 can not change to select count(*) from t1 left join t2 on c2 > 500 where c1 > 500
2024-03-21 14:07:23 +08:00
fab48f54b1 [enhancement](nereids)simplify OneRowRelation scalar subquery (#32276)
select count() from t where dt > (select '2024-02-02 00:00:00');
-->
select count() from t where dt > '2024-02-02 00:00:00';
2024-03-21 14:07:23 +08:00
163007a665 [fix](grouping sets) fix grouping sets have multiple empty sets (#32317)
in this #32112, handling empty sets (empty expression cases) has been addressed. However, multiple empty sets in grouping sets have different grouping IDs
2024-03-21 14:07:22 +08:00
403820599d [bug](inverted index) fix npe of InvertedIndexStorageFormat in table property (#32357)
Fix problem that if fe upgrade from a older version, it has error like:

```
MySQL [test]> show full tables;
ERROR 1105 (HY000): NullPointerException, msg: java.lang.NullPointerException: Cannot invoke "org.apache.doris.thrift.TInvertedIndexStorageFormat.toString()" because the return value of "org.apache.doris.catalog.OlapTable.getInvertedIndexStorageFormat()" is null
```
2024-03-21 14:07:22 +08:00
e99b33c274 [opt](file-meta-cache) reduce file meta cache size and disable cache for some cases (#32340)
File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
2024-03-21 14:07:22 +08:00
279ea2f366 [feature](proxy-protocol) Support proxy protocol v1 (#32338)
Enable proxy protocol to support IP transparency.
See: `IP Transparency` in f57387b502/docs/en/docs/admin-manual/cluster-management/load-balancing.md
for details
2024-03-21 14:07:22 +08:00
3da8e4b04a [chore](build) delete palo_be soft link (#32353) 2024-03-21 14:07:22 +08:00
2e564036ef [fix](profile) avoid update profile in deconstructor (#32131)
In previous, the counter in `profile` may be updated when close the file reader.
And the file reader may be closed when the object being deconstruted.
But at that time, the `profile` object may already be deleted, causing NPE and BE will crash.

This PR try to fix this issue:

1. Remove the "profile counter update" logic from all `close()` method.

2. Add a new interface `ProfileCollector`

	It has 2 methods:
	
	- `collect_profile_at_runtime()`

		It can be called at runtime, eg, in every `get_next_block()` method.
		So that the counter in profile can be updated at runtime.
		
	- `collect_profile_before_close()`

		Should be called before the object call `close()`. And it will only be called once.
		
3. Derived from `ProfileCollector`

	All classes which may update the profile counter in `close()` method should extends
	the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()`
	
	And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.
2024-03-21 14:07:22 +08:00
8bd101129a [behavior change](output) change float output format (#32049) 2024-03-21 14:07:22 +08:00
7874edf992 [doc](ranger)change path of access_controller.class (#32138) 2024-03-21 14:07:19 +08:00
724bc82362 [refactor](chore) replace HashMapWithStackMemory with std::unordered_map (#32309) 2024-03-21 14:07:19 +08:00
fd1345bef0 fix load channel may memory leak (#32277) 2024-03-21 14:07:19 +08:00
0990014e94 [fix](datetime) fix datetime rounding on BE (#32075) 2024-03-21 14:07:19 +08:00
b5ab1159bb [Enhancement](inverted index) make compiler happy (#32332) 2024-03-21 14:07:19 +08:00
d31331344b [enhance][fix] add ccr downstreamurl for ccr (#32325)
Co-authored-by: 胥剑旭 <xujianxu@xujianxudeMacBook-Pro.local>
2024-03-21 14:07:19 +08:00
85b2c42f76 [Enhancement](jdbc catalog) Add a property to test the connection when creating a Jdbc catalog (#32125) (#32531) 2024-03-21 14:05:59 +08:00
27973b6999 [fix](schema-change) fix the bug of handling empty blocks in schema change (#32396)
* [fix](schema-change) fix the bug of handling empty blocks in schema change

* add case
2024-03-19 22:12:26 +08:00
9eb2f90e27 [Optimize](inverted index) optimize inverted index bitmap copy (#32279) (#32469) 2024-03-19 17:28:59 +08:00
fc2588c786 [fix](insert)fix sink user name (#32465) 2024-03-19 16:04:09 +08:00
115834d1dd Revert "[fix](merge-cloud) fix no cluster for common user (#32097)" (#32457)
This reverts commit 3f4ae002a8cf932dced6166353b7bdbe5b99354f.

Co-authored-by: stephen <hello-stephen@qq.com>
2024-03-19 15:06:39 +08:00
93fe9521bf [feature](insert)fix implement hive table sink plan (#32430)
introduced by #32386
2024-03-19 09:55:16 +08:00
ecadb60bcd [Pick 2.1](inverted index) support inverted index format v2 (#30145) (#32418) 2024-03-19 08:11:33 +08:00
711c0cd55c [feature](insert)implement hive table sink plan (#31765) (#32386)
from #31765
2024-03-18 22:49:30 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
a444e84be6 [feature](hive)add 'HmsCommiter' to support inserting data into hive table (#32283) (#32362)
bp #32283
Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
2024-03-18 10:59:32 +08:00
2add3bc13a [fix](partial update) compaction may cause update failue (#31551) (#32361) 2024-03-18 10:58:51 +08:00
b82de68d7e [feature][insert]add hive table sink thrift (#32274) (#32360)
bp #32274
2024-03-18 10:46:17 +08:00
1645f2e0a7 [feature](insert)add hive table sink definition (#31662) (#32347)
bp #31662
Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
2024-03-17 20:52:44 +08:00
4732aae628 [Refactor](insert) refactor insert command to support other type of table (#31610) (#32345)
bp #31610
2024-03-17 20:46:07 +08:00
47019133c0 [improvement](Nereids) Support to remove sort which is under table sink (#31751) (#32337) 2024-03-17 15:45:53 +08:00
c34f5045c8 fix compile 2024-03-16 21:37:02 +08:00
4bf202db04 [pipelineX](exchange) Make exchange buffer size configurable (#32201) 2024-03-16 20:58:20 +08:00
bf9332a275 [docs](docs) Fix invalid link and typo of Master Branch (#32213) 2024-03-16 20:58:07 +08:00
5ceccb5ba5 [fix](compatibility) should enable windown funnel mode from 2.0 (#32284) 2024-03-16 20:56:16 +08:00
c5ffeff833 [fix](s3 client)add default ca cert list for s3 client to avoid problem:'curlCode:77' (#32285)
Co-authored-by: ryanzryu <ryanzryu@tencent.com>
2024-03-16 20:55:28 +08:00
83ab61ad22 Add QUEUE_START_TIME/QUEUE_END_TIME/QUERY_STATUS column for active_queries (#32259) 2024-03-16 20:53:46 +08:00
a15bf3057f [Fix](nereids) remove duplicate expr in grouping set (#32290)
db reported a error " expression duplicate in grouping set" when there are duplicate expression in grouping set.
e.g.select a from mal_test1 group by grouping sets((a,a))
This pr removes duplicate expr in grouping set :
select a from mal_test1 group by grouping sets((a))
2024-03-16 20:53:46 +08:00
bf82030270 [Chore](FE)Remove unused components (#32295)
The tomcat-embed-el dependency is primarily used for standardizing EL functionality, which we don't require in our application. Therefore, we can safely remove it.
2024-03-16 20:53:46 +08:00
844a1b53b7 [fix](retry) Set query encounter rpc exception default retry times to 3 (#28555) 2024-03-16 20:53:46 +08:00
f64a9a33f8 [fix](Nereids): don't pushdown project when project contains both side of join (#32214) 2024-03-16 20:53:46 +08:00
a90a1a76f1 [bugfix](profile) support multi execution profile for brokerload (#32280)
The bug is introduced by #27184
Profile Format is :
Summary
MergedProfile
ExecutionProfile1
ExecutionProfile2
...

There maybe multiple execution profiles for broker load.
2024-03-16 20:53:43 +08:00
9ad196f189 Revert "[fix](cloud) ignore some case in cloud mode (#32261)"
This reverts commit c0776c7c0756d602204edba76642cafa92e67cd8.
2024-03-16 14:11:22 +08:00
258dcfca97 [Refactor](executor)Add information_schema.workload_groups (#32195) (#32314) 2024-03-15 20:46:54 +08:00
b5a322297b Refactor active queries (#31742) (#32312) 2024-03-15 19:39:54 +08:00
720aaf9dd6 fix compile 2024-03-15 18:13:41 +08:00
e3bb499cc6 [fix](function)revert function REPEAT nullable mode #32226 2024-03-15 18:06:28 +08:00
97b35d6830 [fix](nereids)AssertNumRow node's output should be nullable (#32136)
Co-authored-by: Co-Author Jerry Hu <mrhhsg@gmail.com>
2024-03-15 18:06:28 +08:00
c0776c7c07 [fix](cloud) ignore some case in cloud mode (#32261) 2024-03-15 18:06:20 +08:00
9c1888e7ec [RuntimeFilter](exec) support min max runtime filter and do refactor (#32210) 2024-03-15 18:06:20 +08:00
8d988930bd [Fix](segment write) handle variant bloom filter in segment writer (#32011) 2024-03-15 18:06:20 +08:00