doris

Author	SHA1	Message	Date
Tiewei Fang	27549564a7	[feature](table-valued-function) Support S3 tvf (#13959 ) This pr does three things： 1. Modified the framework of table-valued-function(tvf). 2. be support `fetch_table_schema` rpc. 3. Implemented `S3(path, AK, SK, format)` table-valued-function.	2022-11-06 11:04:26 +08:00
Xinyi Zou	f87be09d69	[fix](load) Fix load channel mgr lock (#13960 ) hot fix load channel mgr lock	2022-11-05 00:48:30 +08:00
Gabriel	9869915279	[refactor](crossjoin) refactor cross join (#13896 )	2022-11-03 22:42:56 +08:00
Gabriel	bfba058ecf	[Feature](join) Support null aware left anti join (#13871 )	2022-11-03 12:11:25 +08:00
Xinyi Zou	32a029d9dc	[enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795 )	2022-11-03 09:47:12 +08:00
Xin Liao	37e4a1769d	[fix](sequence) fix that update table core dump with sequence column (#13847 ) * [fix](sequence) fix that update table core dump with sequence column * update	2022-11-03 09:02:21 +08:00
Mingyu Chen	7b4c2cabb4	[feature](new-scan) support transactional insert in new scan framework (#13858 ) Support running transactional insert operation with new scan framework. eg: admin set frontend config("enable_new_load_scan_node" = "true"); begin; insert into tbl1 values(1,2); insert into tbl1 values(3,4); insert into tbl1 values(5,6); commit; Add some limitation to transactional insert Do not support non-literal value in insert stmt Fix some issue about array type: Forbid cast other non-array type to NESTED array type, it may cause BE crash. Add getStringValueForArray() method for Expr, to get valid string-formatted array type value. Add useLocalSessionState=true in regression-test jdbc url without this config, the jdbc driver will send some init cmd each time it connect to server, such as select @@session.tx_read_only. But when we use transactional insert, after begin command, Doris do not support any other type of stmt except for insert, commit or rollback. So adding this config to let the jdbc NOT send cmd when connecting.	2022-11-03 08:36:07 +08:00
Adonis Ling	ba918b40e2	[chore](macOS) Fix compilation errors caused by the deprecated function (#13890 )	2022-11-02 13:34:51 +08:00
Pxl	be124523f4	[enhancement](profile) add profile to show column predicates (#13862 )	2022-11-02 09:07:26 +08:00
Mingyu Chen	2fb218173e	[improvement](scan) change the max thread num and num of free blocks in new scan (#13793 ) 1. In the previous implementation, the max thread num of olap scanner was set relatively small, such as 3. which would slow down some of queries. In this PR, I changed the max thread num to a quarter of the scaner thread pool(default is 12), which is less than the old scan node's max thread num, but larger than the previous implementation. The upper limit of the max thread num of the old scan node is too high, which is not reasonable. 2. Lower down the number of pre allocated free blocks.	2022-10-31 14:00:06 +08:00
Ashin Gau	e0667b297f	[feature-wip](multi-catalog) reuse hdfsFs and decode parquet values in batch (#13688 ) PR(https://github.com/apache/doris/pull/13404) introduced that ParquetReader will break up batch insertion when encountering null values, which leads to the bad performance compared to OrcReader. So this PR has pushed null map into decode function, reduce the time of virtual function call when encountering null values. Further more, reuse hdfsFS among file readers to reduce the time of building connection to hdfs.	2022-10-28 15:52:52 +08:00
HappenLee	d6b72d9b89	[Bug](update) support to check optional value of agg_sort_infos (#13732 )	2022-10-28 10:37:13 +08:00
camby	738da0b139	[bugfix](join) inner join return wrong result (#13608 ) * bug fix for vhash join * add regression test Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-27 11:48:41 +08:00
starocean999	c874931ac8	[fix](join)output all value from no-null side of outer join (#13655 ) * [fix](joinoutput all value from no-null side of outer join * add regression test	2022-10-27 10:48:36 +08:00
Gabriel	3c95106d45	[Bug](jdbc) Fix memory leak for JDBC datasource (#13657 )	2022-10-27 00:02:25 +08:00
Zhengguo Yang	65aa863dcf	[Bugfix](bitmap) Fix to_bitmap_with_check function symbol is incorrect (#13667 ) * [Bugfix](bitmap) Fix to_bitmap_with_check function symbol is incorrect	2022-10-26 14:27:38 +08:00
Tiewei Fang	c418bbd2d1	[feature-wip](new-scan) support Json reader (#13546 ) Issue Number: close #12574 This pr adds `NewJsonReader` which implements GenericReader interface to support read json format file. TODO: 1. modify `_scann_eof` later. 2. Rename `NewJsonReader` to `JsonReader` when `JsonReader` is deleted.	2022-10-26 12:52:21 +08:00
Jibing-Li	44c9163b3c	[Fix](multi-catalog)Fix partition external table query bug. (#13535 ) The index for external table columns from path is incorrect in new scanner. This is a fix for it. e.g. In the next query, nation and city columns are from path ``` mysql> select nation, city, count() from parquet_two_part group by nation, city; +--------+------------+----------+ \| nation \| city \| count() \| +--------+------------+----------+ \| cn \| beijing \| 1199969 \| \| cn \| shanghai \| 1199771 \| \| jp \| tokyo \| 599715 \| \| rus \| moscow \| 600659 \| \| us \| chicago \| 1199805 \| \| us \| washington \| 1201296 \| +--------+------------+----------+ 6 rows in set (0.39 sec) ```	2022-10-26 12:47:37 +08:00
Yongqiang YANG	295d887cf5	[improvement](thread) set name for priority thread pool (#13552 )	2022-10-26 09:32:15 +08:00
Adonis Ling	2cf89c55c2	[chore](macOS) Fix issues found on macOS x86_64 (#13583 ) 1. Use `brew --prefix` instead of `brew --repo` in scripts. 2. `sprintf` is marked as a deprecated function in MacOSX sdk (13.0).	2022-10-24 20:59:20 +08:00
starocean999	40e122e5ef	[fix](join)the build and probe expr should be calculated before converting input block to nullable (#13436 ) * [fix](join)the build and probe expr should be calculated before converting input block to nullable * remove_nullable can be called on const column	2022-10-24 14:50:06 +08:00
luozenglin	e17c2416f0	[fix](join) fix be core dump when using right join with other join predicates (#13511 )	2022-10-24 10:35:07 +08:00
Mingyu Chen	3a3def447d	[fix](csv-reader) fix bug that csv reader can not read text format hms table (#13515 ) 1. Missing field and line delimiter 2. When query external table with text(csv) format, we should pass the column position map to BE, otherwise the column order is wrong. TODO: 1. For now, if we query csv file with non-exist column, it will return null. But it should return null or default value of that column. 2. Add regression test after hive docker is ready.	2022-10-22 22:40:03 +08:00
Gabriel	8e19b13f18	[Improvement](runtimefilter) don nott allocate memory if all targets are local (#13557 )	2022-10-21 21:43:38 +08:00
Gabriel	3006b258b0	[Improvement](bloomfilter) allocate memory for BF in open phase (#13494 )	2022-10-21 17:37:26 +08:00
starocean999	5dde13fb7d	[fix](scan)extend_scan_key should not change the range parameter (#13530 ) * [fix](scan)extend_scan_key should not change the range parameter * [fix](scan)new olap scan node has the same issue	2022-10-21 15:17:12 +08:00
Gabriel	d3f65aa746	[Improvement](join) remove unnecessary state for join (#13472 )	2022-10-21 09:59:34 +08:00
Mingyu Chen	32b1456b28	[feature-wip](array) remove array config and check array nested depth (#13428 ) 1. remove FE config `enable_array_type` 2. limit the nested depth of array in FE side. 3. Fix bug that when loading array from parquet, the decimal type is treated as bigint 4. Fix loading array from csv(vec-engine), handle null and "null" 5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null. 6. Remove `check_array_format()`, because it's logic is wrong and meaningless 7. Add stream load csv test cases and more parquet broker load tests	2022-10-20 15:52:31 +08:00
TengJianPing	b5cd167713	[fix](hashjoin) fix coredump of hash join in ubsan build (#13479 ) * [fix](hashjoin) fix coredump of hash join in ubsan build	2022-10-20 10:16:19 +08:00
Ashin Gau	f7c69ade18	[feature-wip](multi-catalog) implement predicate pushdown in native OrcReader (#13453 ) # Proposed changes Implement predicate pushdown in `OrcReader` by converting doris `ColumnValueRange` to orc `SearchArgument`. ## Remaining problems 1. Orc support `not in`, which may have effect on bloom filter. However, doris `ScanNode` has not push down `not in` to file scanner. 2. Orc support `is null`, and row range has `hasNull` identifier. However, `_contain_null` in `ColumnValueRange` is ambiguous. `_contain_null = true` only means that the value can be nullable, not equal to null. 3. `DateTimeV2` has lost microsecond precision in `ColumnValueRange`, which may cause filtering error when a min-max value equals to the predicate value. 4. `DateTimeV1` is not accurate enough, and only saved to seconds. 5. Orc support the predicate pushdown of `float&double` type, but doris has not push down `float&double` type for precision reason.	2022-10-20 10:07:36 +08:00
xy720	f329d33666	[chore](fix) Fix some spell errors in be's comments. #13452	2022-10-20 08:56:01 +08:00
Mingyu Chen	5423de68dd	[refactor](new-scan) remove old file scan node (#13433 ) All these files are not used anymore, can be removed.	2022-10-19 14:25:32 +08:00
Ashin Gau	21f233d7e7	[feature-wip](multi-catalog) use apache orc reader to read orc file (#13404 ) Use apache orc to read orc file, and convert ColumnVectorBatch to doris block.	2022-10-18 13:47:56 +08:00
Adonis Ling	125def5102	[enhancement](macOS M1) Support building from source on macOS (M1) (#13195 ) # Proposed changes This PR fixed lots of issues when building from source on macOS with Apple M1 chip. ## ATTENTION The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime: 1. Some errors with memory tracker occur when BE (RELEASE) starts. 2. Some UT cases fail. ... Temporarily, the following changes are made on macOS to start BE successfully. 1. Disable memory tracker. 2. Use tcmalloc instead of jemalloc. This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues. ## Use case ```shell ./build.sh -j 8 --be --clean cd output/be/bin ulimit -n 60000 ./start_be.sh --daemon ``` ## Something else It takes around _10+_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.	2022-10-18 13:10:13 +08:00
Gabriel	cd3450bd9d	[Improvement](join) optimize join probing phase (#13357 )	2022-10-18 12:37:17 +08:00
Mingyu Chen	dbf71ed3be	[feature-wip](new-scan) Support stream load with csv in new scan framework (#13354 ) 1. Refactor the file reader creation in FileFactory, for simplicity. Previously, FileFactory had too many `create_file_reader` interfaces. Now unified into two categories: the interface used by the previous BrokerScanNode, and the interface used by the new FileScanNode. And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files. 2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode 3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside. 4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.	2022-10-17 23:33:41 +08:00
xy720	c114d87d13	[Enhancement](array-type) Tuple is null predicate support array type (#13307 ) Issue Number: #12689	2022-10-17 18:50:56 +08:00
Pxl	632670a49c	[Enhancement](function) refactor of date function (#13362 ) refactor of date function	2022-10-16 14:31:26 +08:00
zhangstar333	4bc33a54a1	[Fix](agg) fix bitmap agg core dump when phmap pointer assert alignment (#13381 )	2022-10-15 10:39:23 +08:00
Gabriel	8218cfed40	[Bug](function) Fix constant predicate evaluation (#13346 )	2022-10-15 01:05:29 +08:00
Gabriel	baf2689610	[Improvement](join) compute hash values by vectorized way (#13335 )	2022-10-13 16:04:58 +08:00
Gabriel	3e84c04195	[Bug](predicate) fix nullptr in scan node (#13316 )	2022-10-13 12:14:42 +08:00
Gabriel	dfe308f501	[Improvement](join) refine prefetch strategy (#13286 )	2022-10-12 19:02:06 +08:00
slothever	4fc7a048d2	[feature-wip](parquet-reader) fix string test and support decimal64 (#13184 ) 1. Refactor arguments list of parquet min max filter, pass parquet type for min max value parsing 2. Fix the filter of string min max Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-10-12 16:52:28 +08:00
Ashin Gau	bb4414e303	[feature-wip](multi-catalog) optimize parquet profile & add null map timer (#13257 ) Use indentation to make `ParquetReader`'s profile more readable Add `ParquetReader.DecodeNullMapTime` to show the time of parsing `NullMap` for `NullableColumn` ``` VFILE_SCAN_NODE (id=0):(Active: 279.62ms, % non-child: 85.83%) - FileReadBytes: 2.36 MB - FileReadCalls: 20 - FileReadTime: 5.686ms - MaxScannerThreadNum: 1 - NewlyCreateFreeBlocksNum: 125 - NumScanners: 1 - ParquetReader: 0ns - ColumnReadTime: 259.946ms - DecodeDictTime: 0ns - DecodeHeaderTime: 437.707us - DecodeLevelTime: 30.101us - DecodeNullMapTime: 53.295ms - DecodeValueTime: 62.607ms - DecompressCount: 511 - DecompressTime: 1.159ms - FilteredBytes: 0.00 - FilteredGroups: 0 - FilteredRowsByGroup: 0 - FilteredRowsByPage: 0 - ParseMetaTime: 22.517ms - ReadBytes: 2.36 MB - ReadGroups: 20 ```	2022-10-12 16:51:06 +08:00
Tiewei Fang	b7621e1615	[feature-wip](new-scan) support csv reader (#13282 ) Issue Number: close #12574 This pr adds CsvReader which implements GenericReader interface to support read csv format file.	2022-10-12 16:22:13 +08:00
Xinyi Zou	df54c6b63a	[enhancement](memtracker) Add independent and unique scanner mem tracker for each query (#13262 )	2022-10-11 19:47:12 +08:00
Gabriel	1724a91f53	[Bug](predicate) Cover all const predicates in scan node (#13238 ) For an vectorized expression which meets the condition vexpr->is_constant(), a const column is expected to return. But now we still don't cover all predicates for const expression. For example, for query SELECT col FROM tbl WHERE 'PROMOTION' LIKE 'AAA%', predicate like will return a ColumnVector which contains a single value. This PR want to cover all const predicates in scan node whether it returns a constcolumn or not	2022-10-11 15:49:53 +08:00
Mingyu Chen	c1ce48ffe4	[fix](new-scann) scanner may be marked close twice (#13263 )	2022-10-11 15:37:15 +08:00
Pxl	bdcb600f3d	[Bug](load) fix core dump on big block load (#13014 )	2022-10-10 12:38:32 +08:00

1 2 3 4 5 ...

338 Commits