doris

Author	SHA1	Message	Date
Mingyu Chen	d8a38f8128	[community](action) modify auto trigger teamcity script (#19095 )	2023-06-15 09:33:46 +08:00
Mryange	460399f214	[fix](profile) remove same profile in join node (#20734 )	2023-06-15 08:08:39 +08:00
Chenyang Sun	2a2e485456	[Enhancement](compaction) time-series scenario cumulative compaction policy (#20715 ) new compaction policy for log and time-series scenario	2023-06-14 23:48:44 +08:00
zy-kkk	09d187ec77	[improvement](ck jdbc) Optimized reading of datetime and ip types of the ClickHouse JDBC Catalog (#20804 )	2023-06-14 23:28:08 +08:00
slothever	bb617ee2cc	[fix](parquet-reader)fix page v2 header offset (#20814 ) fix page v2 header offset. get correct offset when read next page in file.	2023-06-14 23:27:31 +08:00
Jibing-Li	f82e43b96a	[Improvement](jdbc external table)Support jdbc external table for nereids. (#20799 ) Nereids planner only support JDBC external catalog, this pr is to support JDBC external table for nereids.	2023-06-14 23:25:43 +08:00
yujun	4bee226698	[fix](regression-test) fix compile test_vertical_compaction_agg_keys failed (#20792 ) fix compile test_vertical_compaction_agg_keys failed.	2023-06-14 23:25:17 +08:00
morrySnow	7ed03f6b86	[fix](Nereids) EmptySetRelation should be Gather not Any (#20801 )	2023-06-14 19:24:33 +08:00
Mingyu Chen	3c6a27daf0	[community](github) required at least 2 approval (#20797 ) Make `Need_2_Approval` check as required. After this PR merged, all PRs need at least 2 approvals to merge. One must be committer, the other can be anyone.	2023-06-14 18:40:03 +08:00
Pxl	3727483c06	[Chore](build) update ldb_toolchain to v0.18 (#20802 ) * update ldb_toolchain to v0.18 * update	2023-06-14 18:38:35 +08:00
lihangyu	0ecc98df82	[Bug](rowset) expire delayed rowsets should be ignored and should not be deleted in _tablet_meta (#20803 )	2023-06-14 18:30:13 +08:00
yiguolei	31a4f96f01	[refactor](exprcontext) move close to expr context's dector method (#20747 ) The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.	2023-06-14 18:01:07 +08:00
abmdocrt	b97537b04b	[Fix](MOW) Fix load data publish timeout when enable unique key MOW (#20720 )	2023-06-14 17:56:02 +08:00
Mingyu Chen	615778924e	[feature](fs) add fs benchmark tool framework (#20770 ) Add an optional executable binary fs_benchmark_tool, for test the performance of file system such as hdfs, s3. Usage: ./fs_benchmark_tool --conf my.conf --fs_type=s3 --operation=read --iterations=5 in my.conf, you can add any config key value with following format: key1=value1 key2=value2 By default, this binary will not be built. Only build it when setting BUILD_FS_BENCHMARK=ON. The binary will be installed in output/be/lib. For developer, you can add new subclass of BaseBenchmark to add your own benchmark. See be/src/io/fs/benchmark/s3_benchmark.hpp for an example	2023-06-14 17:50:06 +08:00
Pxl	a0d4f11667	[Bug](function) catch error state in function cast to avoid core dump (#20751 ) catch error state in function cast to avoid core dump	2023-06-14 17:34:34 +08:00
YueW	1c9f107185	[feature](nereids) support match syntax (#20781 ) Support match syntax in nereids. match syntax use like: ```sql select * from test where msg match "hello"; select * from test where msg match_any "hello"; select * from test where msg match_all "hello hi"; select * from test where msg match_phrase "hello world"; ``` `match` is same as `match_any`. the pr of match syntax in original planner: https://github.com/apache/doris/pull/14211	2023-06-14 17:30:27 +08:00
airborne12	d922a4a9fa	[Feature-WIP](inverted index) add inverted index file size method (#20758 ) This PR calculates the size of the inverted index files. The changes consist of: Introduction of a new get_inverted_index_size() method in different column writers such as ScalarColumnWriter, StructColumnWriter, ArrayColumnWriter, and MapColumnWriter. This method will fetch the size of the inverted index file associated with that column. If the file size cannot be fetched, it defaults to 0. A new method file_size() has been added in InvertedIndexColumnWriter class which retrieves the size of the file stored on disk. If the file size cannot be fetched, it logs an error and returns -1. Additionally, a new method get_inverted_index_file_size() is introduced in SegmentWriter which aggregates the inverted index file sizes of all the column writers.	2023-06-14 17:18:20 +08:00
Xin Liao	dd5b82fe00	[Enhancement](merge-on-write) optimize contains_agg when calculate delete bitmap (#20762 )	2023-06-14 16:25:11 +08:00
Ashin Gau	062641e8f8	[fix](hudi) set default class loader for hudi serializer (#20680 ) hudi serializer `org.apache.hudi.common.util.SerializationUtils$KryoInstantiator.newKryo` throws error like `java.lang.IllegalArgumentException: classLoader cannot be null`. Set the default class loader for scan thread. ``` public Kryo newKryo() { Kryo kryo = new Kryo(); ... // Thread.currentThread().getContextClassLoader() returns null kryo.setClassLoader(Thread.currentThread().getContextClassLoader()); ... return kryo; } ```	2023-06-14 16:02:56 +08:00
morrySnow	54d42244fe	[feature](Nereids) add cbo rewrite framework (#20746 ) The changes in this PR: 1. rename BatchRewriteJob to AbstractBatchJobExecutor 2. add a new rewrite job type, CostBasedRewriteJob. It receive a RewriteJob as input, compare the cost of two candidate plans using or not using the input RewriteJob and return the lower cost plan as the rewrite result. 3. do some small refactor on NereidsPlanner for better abstraction 4. do some refactor on dir structure of Nereids The usage of cbo rewrite framework: if you want let a rule or a rule list to be run in cbo rewrite frame work, you just need to wrap the rule / rule list with costBased function of class Rewriter, for example ```java ... costBased( custom(RuleType.AGG_SCALAR_SUBQUERY_TO_WINDOW_FUNCTION, AggScalarSubQueryToWindowFunction::new) ), ... ```	2023-06-14 15:57:42 +08:00
lihangyu	0f470fec0e	[Bug](topn opt) Fix Two-Phase read when some rowset swept (#20732 ) * [Bug](topn opt) Fix Two-Phase read when some rowset swept If this is a Two-Phase read query, and we need to delay the release of Rowset by row->update_delayed_expired_timestamp() to expand the lifespan of rowsets. This is necessary to avoid data loss during the second phase reading, where some stale rowsets may be swept and result in missing data.	2023-06-14 15:46:29 +08:00
shuke	f58fa4d3fa	[regression-test](test) fix mv_p0/test_drop_partition_from_index/test_drop_partition_from_index.groovy #20689 finally { sql """ DROP MATERIALIZED VIEW ${testMv} ON ${testTable} """ sql """ DROP TABLE ${testTable} """ sql """ DROP DATABASE ${testDb} """ } in this case, there maybe some error before create materialized view. so when they failed, drop materialized view will be executed, but it does not created at that time. This will cause another exception, and the real failure will be hiden by regression-test.	2023-06-14 15:38:16 +08:00
zy-kkk	9c30fb5a21	[fix](script)Fix the JAVA_OPTS version error of the BE start script (#20766 )	2023-06-14 15:25:00 +08:00
caiconghui	bcf103e993	[enhancement](log4j) support high performance mode for log4j to escape potential bottleneck for doris read and write (#20759 ) As we know, log4j2 some times may be bottleneck in doris fe when there are many logs to be output in sync mode while asynchronous logging has a better performance， and we find that capturing caller location has a similar impact across all logging libraries, and slows down asynchronous logging by about 30-100x. so, here we provide three log mode for log4j2 to meet the needs of different users. refer to https://logging.apache.org/log4j/2.x/performance.html	2023-06-14 15:16:04 +08:00
AKIRA	f707dc9395	[fix](stats) Fix NPE when analyze database sync (#20775 )	2023-06-14 15:01:02 +08:00
Xinyi Zou	f2025b9eed	[fix](memory) before compaction run, check memory exceed limit #20782	2023-06-14 14:20:48 +08:00
Gabriel	20ac940711	[Bug](pipeline) fix bug for file scan node on pipeline engine (#20763 )	2023-06-14 12:52:56 +08:00
mch_ucchi	1c394f4964	Fix](Nereids) insert into table not need unpartitioned as root fragment's data partition (#20737 )	2023-06-14 11:57:41 +08:00
starocean999	8726047f86	[fix](nereids) select text as minimum column unexpected (#20745 ) column of string and text types has width -1, and shouldn't be considered as minimum size column	2023-06-14 11:49:22 +08:00
zy-kkk	affe36d32e	[test](find_in_set) add find_in_set function test case (#20718 )	2023-06-14 09:43:48 +08:00
plat1ko	9b4b0d4bf9	[fix](cooldown) Fix bug when cooldown a dropped tablet (#20750 )	2023-06-14 09:42:55 +08:00
Mingyu Chen	cd46f459db	[minor](script) fix typo in build.sh (#20757 )	2023-06-14 09:05:01 +08:00
Adonis Ling	edd0a1590d	[chore](workflow) Improve the robustness of BE UT (Clang) (#20744 )	2023-06-14 08:33:14 +08:00
wudi	a58a0d4003	[doc](community)update connector release doc (#20476 ) Co-authored-by: wudi <>	2023-06-14 01:01:00 +08:00
bingquanzhao	ba3e065955	[typo](doc) add column type description for range partition (#20691 )	2023-06-14 00:59:30 +08:00
Xin Liao	fd97587aff	[fix](merge-on-write) fix the merged rows is not equal to missed rows when do cumulative compaction (#20754 )	2023-06-13 22:18:59 +08:00
qiye	35c19daec7	[opt](routine load) log BE id when get partitions failed. (#20749 ) Add BackendId when get partitions failed to make debug error easier.	2023-06-13 19:15:05 +08:00
FreeOnePlus	f1fd486f84	[fix](docker)Fix docker be init script restart failed bug (#20505 ) fix docker be restart failed bug	2023-06-13 19:05:31 +08:00
Mingyu Chen	5d2758cb8f	[improvement](build) move add BE extension jars to java_extensions dir (#20740 ) Follow #20185 Move all BE java extension jars to `be/lib/java_extensions/` dir. Also remove `udf` dir, used for BE native udf, which is deprecated since v1.2 The final output is: ``` output ├── be │ ├── bin │ ├── conf │ ├── dict │ ├── lib \| ├── java_extensions │ ├── hudi-scanner-jar-with-dependencies.jar │ ├── java-udf-jar-with-dependencies.jar │ ├── jdbc-scanner-jar-with-dependencies.jar │ ├── max-compute-scanner-jar-with-dependencies.jar │ └── paimon-scanner-jar-with-dependencies.jar │ ├── LICENSE-dist.txt │ ├── licenses │ ├── log │ ├── NOTICE.txt │ ├── storage │ └── www └── fe ├── bin ├── conf ├── doris-meta ├── lib ├── LICENSE-dist.txt ├── licenses ├── log ├── mysql_ssl_default_certificate ├── NOTICE.txt ├── spark-dpp └── webroot ```	2023-06-13 18:55:12 +08:00
Pxl	9244cb6553	[Chore](runtime-filter) do not make query fail when rf publish failed (#20742 ) do not make query fail when rf publish failed	2023-06-13 18:23:46 +08:00
lvshaokang	37db0145b4	[fix](load) fix mysql load parse response npe (#20699 )	2023-06-13 18:14:03 +08:00
airborne12	ad2f1b5647	[Update](clucene) synchronize clucene version to address PFOR adaptation issue (#20736 )	2023-06-13 18:04:48 +08:00
starocean999	7636dd1fdc	[fix](nereids) always use colocate scan when agg's fragment has olap scan (#20695 )	2023-06-13 17:59:17 +08:00
starocean999	7942bd0bf9	[fix](planner) cast string literal to date like type should not be an implict cast (#20709 ) 1. cast string literal to date like type should not be an implict cast 2. the string representation of float like type should not be scientific notation 3. the data type of like function's regex expr should be string type even if it's a null literal 4. add -Xss4m in fe.conf to prevent stack overflow in some case	2023-06-13 17:57:14 +08:00
mch_ucchi	0e82c0d7a2	[Fix](Nereids) constant folding for function timestamp() (#20607 )	2023-06-13 17:41:58 +08:00
TengJianPing	feb21fc9e9	[fix](group_concat) use default seperator ',' instead of ', ' for group_concat, to be consistant with mysql (#20741 )	2023-06-13 17:20:29 +08:00
lihangyu	2dddab03a1	[compatibility](schema cache) ensure schema version when using schema cache (#20729 ) When FE is old version, be is new version, issue a schema change(add column) and then query, old version of FE query without schema version could result in reading stale schema from schema cache	2023-06-13 15:19:26 +08:00
Mingyu Chen	4b15185e25	[improvement](hdfs) add parquet footer cache and hdfs file handle cache (#20544 ) 1. Add hdfs file handle cache for hdfs file reader Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team) This is a lru cache that can store multi entries with same key. The key is build with {file name + modification time} The value is the hdfsFile pointer that point to a certain hdfs file. This cache is to avoid reopen same hdfs file mutli time, which can save query time. Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number of file handle cache, default is 20000. 2. Add file meta cache The file meta cache is a lru cache. the key is {file name + modification time}, the value is the parsed file meta info of the certain file, which can save the time of re-parsing file meta everytime. Currently, it is only used for caching parquet file footer. The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0 in query profile, which can save time when there are lots of files to read.	2023-06-13 15:13:57 +08:00
TengJianPing	2adf5169e6	[improvement](test) improve p2 case of githubevents (#20727 ) Check rows of github_events table after restore finish.	2023-06-13 14:31:24 +08:00
zy-kkk	54a7dbeb4d	[Refactor](External) Move Common ODBC Methods to JDBC Class and Add Default config to Disable ODBC Creation (#20566 ) This PR addresses the refactoring of common methods that were originally located within the ODBC classes, but were used by the JDBC classes. These methods have now been moved to the JDBC classes to improve code readability and maintainability. In addition, we have disabled the creation of ODBC external tables by default. However, this will not affect the existing usage of ODBC. You can still enable the ODBC external tables through the enable_odbc_table setting. Please be aware that we plan to completely remove the ODBC external tables in future versions, so we recommend using the JDBC Catalog as a priority.	2023-06-13 14:29:04 +08:00

1 2 3 4 5 ...

11231 Commits