doris

Author	SHA1	Message	Date
Gabriel	50bfd99b59	[feature](join) support nested loop semi/anti join (#14227 )	2022-11-17 22:20:08 +08:00
HappenLee	d5af4f6558	[Neried](Profile) Add projection timer for neried (#14286 )	2022-11-17 22:17:55 +08:00
slothever	6da2948283	[feature-wip](multi-catalog) support iceberg v2(step 1) (#13867 ) Support position delete(part of).	2022-11-17 17:56:48 +08:00
Mingyu Chen	7182f14645	[improvement][fix](multi-catalog) speed up list partition prune (#14268 ) In previous implementation, when doing list partition prune, we need to generation `rangeToId` every time we doing prune. But `rangeToId` is actually a static data that should be create-once-use-every-where. So for hive partition, I created the `rangeToId` and all other necessary data structures for partition prunning in partition cache, so that we can use it directly. In my test, the cost of partition prune for 10000 partitions reduce from 8s -> 0.2s. Aslo add "partition" info in explain string for hive table. ``` \| 0:VEXTERNAL_FILE_SCAN_NODE \| \| predicates: `nation` = '0024c95b' \| \| inputSplitNum=1, totalFileSize=4750, scanRanges=1 \| \| partition=1/10000 \| \| numNodes=1 \| \| limit: 10 \| ``` Bug fix: 1. Fix bug that es scan node can not filter data 2. Fix bug that query es with predicate like `where substring(test2,2) = "ext2";` will fail at planner phase. `Unexpected exception: org.apache.doris.analysis.FunctionCallExpr cannot be cast to org.apache.doris.analysis.SlotRef` TODO: 1. Some problem when quering es version 8: ` Unexpected exception: Index: 0, Size: 0`, will be fixed later.	2022-11-17 08:30:03 +08:00
Ashin Gau	20634ab7e3	[feature-wip](multi-catalog) support partition&missing columns in parquet lazy read (#14264 ) PR https://github.com/apache/doris/pull/13917 has supported lazy read for non-predicate columns in ParquetReader, but can't trigger lazy read when predicate columns are partition or missing columns. This PR support such case, and fill partition and missing columns in `FileReader`.	2022-11-16 08:43:11 +08:00
huangzhaowei	5badd70db2	[fix](csv-reader) Fix core dump when load text into doris with special delimiter (#14196 )	2022-11-15 16:06:59 +08:00
starocean999	6d2e6d85d3	[enhancement](be)release memory in Node's close() method (#14258 ) * [enhancement](be)release memory in Node's close() method * format code	2022-11-15 15:59:23 +08:00
Gabriel	215a4c6e02	[Bug](BHJ) Fix wrong result when use broadcast hash join for naaj (#14253 )	2022-11-15 09:40:00 +08:00
Ashin Gau	fc70179acb	[multi-catalog](fix) the eof of lazy read columns may be not equal to the eof of predicate columns (#14212 ) Fix three bugs: 1. The EOF of lazy read columns may be not equal to the EOF of predicate columns. (for example: If the predicate column has 3 pages, with 400 rows for each, but the last page is filtered by page index. When batch_size=992, the EOF of predicate column is true. However, we should set batch_size=800 for lazy read column, so the EOF of lazy read column may be false.) 2. The array column does not count the number of nulls 3. Generate wrong NullMap for array column	2022-11-14 14:37:21 +08:00
Adonis Ling	7bb3792d51	[chore](build) Split the compliation units to build them in parallel (#14232 )	2022-11-14 10:57:10 +08:00
pengxiangyu	d55faa7f6a	[feature](remote)Only query can use local cache when reading remote files. (#13865 ) When calling select on remote files, download cache files to local disk. When calling alter table on remote files, read files directly from remote storage. So if tablet is too large, it will not take up too many local disk when creating local cache file.	2022-11-14 10:30:15 +08:00
starocean999	139c4a77f1	[enhancement](be)close ExecNode ASAP to release resource earlier (#14203 )	2022-11-14 09:41:35 +08:00
Xinyi Zou	dd11d5c0a5	[enhancement](memory) Support try catch bad alloc (#14135 )	2022-11-13 11:22:56 +08:00
luozenglin	376b4fda9f	[fix](scankey) fix extended scan key errors. (#14200 ) Issue Number: close #14199	2022-11-12 20:44:09 +08:00
xy720	035657c5a1	[typo](comment) Fix a lot of spell errors in be comments (#14208 ) fix typos.	2022-11-12 16:06:15 +08:00
Gabriel	fe2944d56d	[Bug](nljoin) Keep compatibility for nljoin (#14182 )	2022-11-11 15:54:55 +08:00
Adonis Ling	118a7dff07	[chore](build) Optimize the compilation time (#14170 ) Currently, it takes too much time to build BE from source in workflow environments (P0/P1) which affects the efficiency of daily development. We can measure the time by executing the following command. time EXTRA_CXX_FLAGS='-O3' BUILD_TYPE=ASAN ./build.sh --be --fe --clean -j "$(nproc)" This PR optimizes the compilation time by exploiting the following methods. Reduce the codegen by removing some useless std::visit. Disable the optimization for some template functions which are instantiated by std::visit conditionally (except for the RELEASE build).	2022-11-11 12:09:54 +08:00
Zhengguo Yang	12652ebb0e	[UDF](java udf) using config to enable java udf instead of macro at compile time (#14062 ) * [UDF](java udf) useing config to enable java udf instead of macro at compile time	2022-11-11 09:03:52 +08:00
Gabriel	1ef85ae1f2	[Improvement](join) Support nested loop outer join (#13965 )	2022-11-10 19:50:46 +08:00
Ashin Gau	6bd5378f66	[feature-wip](multi-catalog) lazy read for ParquetReader (#13917 ) Read predicate columns firstly, and use VExprContext(push-down predicates) to generate the select vector, which is then applied to read the non-predicate columns. The data in non-predicate columns may be skipped by select vector, so the value-decode-time can be reduced. If a whole page can be skipped, the decompress-time can also be reduced.	2022-11-10 16:56:14 +08:00
Pxl	0e26f28bf2	[Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581 ) enlarge runtime filter in predicate threshold	2022-11-10 15:48:46 +08:00
Xinyi Zou	a73f4dfdc1	[fix](memtracker) Fix scanner thread ending after fragment thread causing mem tracker null pointer #14143	2022-11-10 15:42:53 +08:00
Tiewei Fang	43eb946543	[feature](table-valued-function)S3 table valued function supports parquet/orc/json file format #14130 S3 table valued function supports parquet/orc/json file format. For example: parquet format	2022-11-10 10:33:12 +08:00
Jerry Hu	10df61b5bf	[improvement](join) Share hash table in fragments for broadcast join (#13921 )	2022-11-10 09:48:34 +08:00
Pxl	794a551b0f	[Enhancement][fix](profile)() modify some profiles (#14074 ) 1. add RemainedDownPredicates 2. fix core dump when _scan_ranges is empty 3. fix invalid memory access on vLiteral's debug_string() 4. enlarge mv test wait time	2022-11-09 21:59:28 +08:00
camby	f912d4e392	[fix](compile) fix compile error #14103 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-09 14:10:06 +08:00
Gabriel	a3c5fa8c01	[Compile](join) Boost compiling and linking (#14081 )	2022-11-09 11:27:46 +08:00
Mingyu Chen	cd8f0713ea	[refactor](new-scan) remove old vectorized scan node (#14029 )	2022-11-09 08:39:20 +08:00
Tiewei Fang	826cfdaf93	[feature](information_schema) add `backends` information_schema table (#13086 )	2022-11-08 22:15:10 +08:00
slothever	c2a01e84b4	[feature-wip](multi-catalog) fix page index filter bug (#14015 ) Fix page index filter not take effect when multiple columns Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-11-08 12:10:12 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
luozenglin	6ed443c7e8	[enhancement](profile) add instanceNum, tableIds to profile. (#13985 )	2022-11-08 08:49:16 +08:00
starocean999	95591ce49a	[refactor](cv)wait on condition variable more gently (#12620 )	2022-11-08 08:40:31 +08:00
Tiewei Fang	27549564a7	[feature](table-valued-function) Support S3 tvf (#13959 ) This pr does three things： 1. Modified the framework of table-valued-function(tvf). 2. be support `fetch_table_schema` rpc. 3. Implemented `S3(path, AK, SK, format)` table-valued-function.	2022-11-06 11:04:26 +08:00
Xinyi Zou	f87be09d69	[fix](load) Fix load channel mgr lock (#13960 ) hot fix load channel mgr lock	2022-11-05 00:48:30 +08:00
Gabriel	9869915279	[refactor](crossjoin) refactor cross join (#13896 )	2022-11-03 22:42:56 +08:00
Gabriel	bfba058ecf	[Feature](join) Support null aware left anti join (#13871 )	2022-11-03 12:11:25 +08:00
Xinyi Zou	32a029d9dc	[enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795 )	2022-11-03 09:47:12 +08:00
Xin Liao	37e4a1769d	[fix](sequence) fix that update table core dump with sequence column (#13847 ) * [fix](sequence) fix that update table core dump with sequence column * update	2022-11-03 09:02:21 +08:00
Mingyu Chen	7b4c2cabb4	[feature](new-scan) support transactional insert in new scan framework (#13858 ) Support running transactional insert operation with new scan framework. eg: admin set frontend config("enable_new_load_scan_node" = "true"); begin; insert into tbl1 values(1,2); insert into tbl1 values(3,4); insert into tbl1 values(5,6); commit; Add some limitation to transactional insert Do not support non-literal value in insert stmt Fix some issue about array type: Forbid cast other non-array type to NESTED array type, it may cause BE crash. Add getStringValueForArray() method for Expr, to get valid string-formatted array type value. Add useLocalSessionState=true in regression-test jdbc url without this config, the jdbc driver will send some init cmd each time it connect to server, such as select @@session.tx_read_only. But when we use transactional insert, after begin command, Doris do not support any other type of stmt except for insert, commit or rollback. So adding this config to let the jdbc NOT send cmd when connecting.	2022-11-03 08:36:07 +08:00
Adonis Ling	ba918b40e2	[chore](macOS) Fix compilation errors caused by the deprecated function (#13890 )	2022-11-02 13:34:51 +08:00
Pxl	be124523f4	[enhancement](profile) add profile to show column predicates (#13862 )	2022-11-02 09:07:26 +08:00
Mingyu Chen	2fb218173e	[improvement](scan) change the max thread num and num of free blocks in new scan (#13793 ) 1. In the previous implementation, the max thread num of olap scanner was set relatively small, such as 3. which would slow down some of queries. In this PR, I changed the max thread num to a quarter of the scaner thread pool(default is 12), which is less than the old scan node's max thread num, but larger than the previous implementation. The upper limit of the max thread num of the old scan node is too high, which is not reasonable. 2. Lower down the number of pre allocated free blocks.	2022-10-31 14:00:06 +08:00
Ashin Gau	e0667b297f	[feature-wip](multi-catalog) reuse hdfsFs and decode parquet values in batch (#13688 ) PR(https://github.com/apache/doris/pull/13404) introduced that ParquetReader will break up batch insertion when encountering null values, which leads to the bad performance compared to OrcReader. So this PR has pushed null map into decode function, reduce the time of virtual function call when encountering null values. Further more, reuse hdfsFS among file readers to reduce the time of building connection to hdfs.	2022-10-28 15:52:52 +08:00
HappenLee	d6b72d9b89	[Bug](update) support to check optional value of agg_sort_infos (#13732 )	2022-10-28 10:37:13 +08:00
camby	738da0b139	[bugfix](join) inner join return wrong result (#13608 ) * bug fix for vhash join * add regression test Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-27 11:48:41 +08:00
starocean999	c874931ac8	[fix](join)output all value from no-null side of outer join (#13655 ) * [fix](joinoutput all value from no-null side of outer join * add regression test	2022-10-27 10:48:36 +08:00
Gabriel	3c95106d45	[Bug](jdbc) Fix memory leak for JDBC datasource (#13657 )	2022-10-27 00:02:25 +08:00
Zhengguo Yang	65aa863dcf	[Bugfix](bitmap) Fix to_bitmap_with_check function symbol is incorrect (#13667 ) * [Bugfix](bitmap) Fix to_bitmap_with_check function symbol is incorrect	2022-10-26 14:27:38 +08:00
Tiewei Fang	c418bbd2d1	[feature-wip](new-scan) support Json reader (#13546 ) Issue Number: close #12574 This pr adds `NewJsonReader` which implements GenericReader interface to support read json format file. TODO: 1. modify `_scann_eof` later. 2. Rename `NewJsonReader` to `JsonReader` when `JsonReader` is deleted.	2022-10-26 12:52:21 +08:00

1 2 3 4 5 ...

371 Commits