doris

Author	SHA1	Message	Date
Pxl	245490d6b7	[Enhancement](runtime filter) optimize for runtime filter (#12856 ) optimize for runtime filter	2022-10-09 14:11:03 +08:00
weizuo93	8b03977689	fix bug that last line of data lost for stream load when line delimiter is more than one character (#13066 )	2022-10-07 16:12:05 +08:00
Tiewei Fang	b41748efa1	[feature-wip](new-scan)Add new jdbc scanner and new jdbc scan node (#12848 ) Related pr: #11582 This pr is the new jdbc scan node and scanner.	2022-10-07 09:55:17 +08:00
Mingyu Chen	d286aa7bf7	[fix](spark-load) no need to filter row group when doing spark load (#13116 ) 1. Fix issue #13115 2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly. Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers. 3. Add more checks for broker load test cases.	2022-10-05 23:00:56 +08:00
Lightman	7b75c2df54	[fix](BE) fix the stream load error when upgrade BE from 1.1.2 to master (#13058 )	2022-10-05 12:13:26 +08:00
zhangstar333	e167aa120f	[fix](jdbc) fix insert into date type to oracle using wrong type (#12883 ) using JDBC insert into date type to ORACLE, it's should be use to_date function convert string to java.sql.date	2022-10-04 21:24:33 +08:00
slothever	820ec435ce	[feature-wip](parquet-reader) refactor parquet_predicate (#12896 ) This change serves the following purposes: 1. use ScanPredicate instead of TCondition for external table, it can reuse old code branch. 2. simplify and delete some useless old code 3. use ColumnValueRange to save predicate	2022-09-28 21:27:13 +08:00
Mingyu Chen	d80b7b9689	[feature-wip](new-scan) support more load situation (#12953 )	2022-09-27 21:48:32 +08:00
Adonis Ling	429ac929fb	[chore](build) Support building from source on ubuntu-22.04 (aarch64) (#12813 ) Support building from source on ubuntu-22.04	2022-09-27 10:29:13 +08:00
Pxl	8731eea26e	[Chore](clang) fix some build fail on clang15 (#12882 ) remove unused variables	2022-09-26 23:13:28 +08:00
luozenglin	0fcb93aae2	[fix](parquet) fix write error data as parquet format. (#12864 ) * [fix](parquet) fix write error data as parquet format. Fix incorrect data conversion when writing tiny int and small int data to parquet files in non-vectorized engine.	2022-09-26 10:41:17 +08:00
Tiewei Fang	acd5d67355	[feature-wip](new-scan)Add new odbc scanner and new odbc scan node (#12899 )	2022-09-26 09:24:25 +08:00
yiguolei	7b230e41a8	[bugfix](scanner) olap scanner compute is wrong (#12857 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-09-24 09:59:59 +08:00
Yongqiang YANG	a7d42b5d81	[fix](streamload&sink) release and allocate memory in the same tracker (#12820 ) 1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker. 2. We can assume brpc allocate memory in doris thread. Above problems leads to wrong result of memtracker.	2022-09-23 17:51:44 +08:00
zhangstar333	617820b1f5	[Refactor](parquet) refactor parquet write to uniform and consistent logic (#12730 )	2022-09-23 09:12:34 +08:00
luozenglin	d68b8cce1a	[fix](intersect) fix intersect query failed in row storage code (#12712 )	2022-09-19 11:47:50 +08:00
carlvinhust2012	fa8ed2bccc	[fix](array-type) fix the invalid format load for stream load (#12424 ) this pr is used to fix the invalid format load for stream load. before the change , we will get the error when we load the invalid array format. the origin file to load : 1 [1, 2, 3] 2 [4, 5, 6] 3 \N 4 [7, \N, 8] 5 10, 11, 12 [hugo@xafj-palo]$ sh curl_cmd.sh { "TxnId": 11035, "Label": "11c9f111-188e-4616-9a50-aec8b7814513", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "Array does not start with '[' character, found '1'", "NumberTotalRows": 0, "NumberLoadedRows": 0, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 55, "LoadTimeMs": 7, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 2, "ReadDataTimeMs": 0, "WriteDataTimeMs": 3, "CommitAndPublishTimeMs": 0 } 3. after this change, we will get success and the error url which report the error line. [hugo@xafj-palo]$ sh curl_cmd.sh { "TxnId": 11046, "Label": "249808ee-55f4-4c08-b671-b3d82689d614", "TwoPhaseCommit": "false", "Status": "Success", "Message": "OK", "NumberTotalRows": 5, "NumberLoadedRows": 4, "NumberFilteredRows": 1, "NumberUnselectedRows": 0, "LoadBytes": 55, "LoadTimeMs": 39, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 2, "ReadDataTimeMs": 0, "WriteDataTimeMs": 19, "CommitAndPublishTimeMs": 16, "ErrorURL": "http://10.81.85.89:8502/api/_load_error_log?file=__shard_3/error_log_insert_stmt_8d4130f0c18aeb0a-ad7ffd4233c41893_8d4130f0c18aeb0a_ad7ffd4233c41893" } the sql select result: MySQL [example_db]> select * from array_test06; +------+--------------+ \| k1 \| k2 \| +------+--------------+ \| 1 \| [1, 2, 3] \| \| 2 \| [4, 5, 6] \| \| 3 \| NULL \| \| 4 \| [7, NULL, 8] \| +------+--------------+ 4 rows in set (0.019 sec) the url page show us: "Reason: Invalid format for array column(k2). src line [10, 11, 12]; " Issue Number: #7570	2022-09-19 08:52:59 +08:00
luozenglin	3030a3606a	[fix](load) fix stream load fail when setting strict mode (#12684 )	2022-09-17 17:02:11 +08:00
zhangstar333	22a8d35999	[Feature](vectorized) support jdbc sink for insert into data to table (#12534 )	2022-09-15 11:08:41 +08:00
Mingyu Chen	c5ad989065	[refactor](reader) refactor the interface of file reader (#12574 ) Currently, Doris has a variety of readers for different file formats, such as parquet reader, orc reader, csv reader, json reader and so on. The interfaces of these readers are not unified, which makes it impossible to call them through a unified method. In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class to use the `get_next_block()` method. This PR currently only modifies `arrow_reader` and `parquet reader`. Other readers will be modified one by one in subsequent PRs.	2022-09-14 22:31:11 +08:00
HappenLee	d913ca5731	[Opt](vectorized) Speed up bucket shuffle join hash compute (#12407 ) * [Opt](vectorized) Speed up bucket shuffle join hash compute	2022-09-13 20:19:22 +08:00
AlexYue	58508aea13	[enhance](information_schema) show hll type and bitmap type instead of unknown (#12519 ) Before this pr, when querying data type of hll/bitmap column, 'unknown' would be returned instead of the correct data type of queried column.	2022-09-13 19:43:42 +08:00
Jibing-Li	dc80a993bc	[feature-wip](new-scan) New load scanner. (#12275 ) Related pr: https://github.com/apache/doris/pull/11582 https://github.com/apache/doris/pull/12048 Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node. The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark. Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.	2022-09-13 13:36:34 +08:00
zhengyu	569ab30556	[bug](NodeChannel) fix OOM caused by pending queue in sink send (#12359 ) (#12362 ) Each NodeChannel has its own queue, with size up to 1/20 exec_mem_limit. User will crash into OOM if set exec_mem_limit high. This commit uses fixed number to control the total max memory used by NodeChannels. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-09-07 20:49:08 +08:00
zhangstar333	42bdde8750	[Feature](Vectorized) support jdbc scan node (#12010 )	2022-09-07 10:29:41 +08:00
Mingyu Chen	893567628e	[fix](exec-node) fix nullptr of runtime state (#12395 ) Remove default nullptr runtime state, which is very error-prone	2022-09-07 08:46:42 +08:00
camby	b8cc576cba	[fix](array-type) add data valid check for ARRAY type while insert or load (#12283 ) Add data valid check for ARRAY type while insert or load Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-06 20:48:58 +08:00
Xinyi Zou	e175a7ed63	[fix](memtracker) Fix the exceeded limit of the first query execution (#12332 ) In some cases, when the user executes the query for the first time, an error of the exceeded mem limit will be reported, and the query will be successful only after the second execution. This is because when the query is executed for the first time, the memory consumed by adding the page cache and other caches is recorded in the query mem tracker, hoping to unify the behavior of multiple queries. A temporary solution, remove the hook of scanner thread, test clickbench q13 Before removing the scanner thread hook Enable page cache: 3G for the first query, 3G for the tracker; 900M for the second query, 900M for the tracker. Turn off page cache: 1.9G for the first query, 1.9G for the tracker; 900M for the second query, 900M for the tracker After removing the scanner thread hook and fix MemTrackerLimiter::cache_consume_local bug Enable page cache: 2916M for the first query, 1147M for the tracker; 979M for the second query, 1144M for the tracker Turn off page cache: 1809M for the first query, 1147M for the tracker; 975M for the second query, 1145M for the tracker TODO, a better solution is to track storage-related memory separately, in the scanner thread. Otherwise, it is impossible to know where the process memory grows when querying.	2022-09-05 19:22:46 +08:00
Gabriel	90fb3b7783	[Improvement](load) accelerate tablet sink (#12174 )	2022-09-01 10:08:09 +08:00
Jibing-Li	ec4863b63a	[feature-wip](new-scan)Add new file scan node (#12048 ) Related pr: #11582 This is the new file scan node and scanner for external hms catalog.	2022-09-01 10:01:20 +08:00
HappenLee	573e5476dd	[Opt](load) Speed up the vectorized load (#12146 ) * [Opt](load) Speed up the vectorized load	2022-08-31 16:23:36 +08:00
Kikyou1997	9a74ad1702	[feature](Nereids)add the ability of projection on each ExecNode and add column prune on OlapScan (#11842 ) We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE. This PR: - add projections on each ExecNode in BE - translate PhysicalProject into projections on PlanNode in FE - do column prune on ScanNode in FE Co-authored-by: HappenLee <happenlee@hotmail.com>	2022-08-30 16:17:10 +08:00
Mingyu Chen	a16cf0e2c8	[feature-wip](scan) add profile for new olap scan node (#12042 ) Copy most of profiles from VOlapScanNode and VOlapScanner to NewOlapScanNode and NewOlapScanner. Fix some blocking bug of new scan framework. TODO: Memtracker Opentelemetry spen The new framework is still disabled by default, so it will not effect other feature.	2022-08-30 10:55:48 +08:00
Xinyi Zou	8370115cf6	[enhancement](memtracker) Improve performance of tracking real physical memory of PODArray #12168	2022-08-30 10:22:12 +08:00
yiguolei	2f192019d3	[bugfix](delete hanlder) delete predicate is merged and could not find schema cause core dump (#12161 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-30 09:18:21 +08:00
Ashin Gau	dec576a991	[feature-wip](parquet-reader) generate null values and NullMap for parquet column (#12115 ) Generate null values and NullMap for the nullable column by analyzing the definition levels.	2022-08-29 09:30:32 +08:00
yiguolei	ccff3f5711	[bugfix](light weight schema change) support delete condition in schema change (#11869 ) * [bugfix](light weight schema change) support delete condition in schema change Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-26 11:45:55 +08:00
camby	0f4a1e811b	[Enhancement](table_function) table function node enhancement (#12038 ) * table function node enhancement * also avoid copy for non-vec table function node * fix table function node output slots calculation while lateral view involves subquery Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-26 10:37:15 +08:00
Xinyi Zou	1fc5515a78	[enhancement](memory) Remove unused reservation tracker (#11969 )	2022-08-24 08:49:34 +08:00
Mingyu Chen	05da3d947f	[feature-wip](new-scan) add scanner scheduling framework (#11582 ) There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including: Runtime filter Predicate pushdown Scanner generation and scheduling So I intend to unify the common logic of all ScanNodes. Different data sources only need to implement different Scanners for data access. So that the future optimization for scan can be applied to the scan of all data sources, while also reducing the code duplication. This PR mainly adds 4 new class: VScanner All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods. VScanNode The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling. ScannerContext ScannerContext is responsible for recording the execution status of a group of Scanners corresponding to a ScanNode. Including how many scanners are being scheduled, and maintaining a producer-consumer blocks queue between scanners and scan nodes. ScannerContext is also the scheduling unit of ScannerScheduler. ScannerScheduler schedules a ScannerContext at a time, and submits the Scanners to the scanner thread pool for data scanning. ScannerScheduler Unified responsible for all Scanner scheduling tasks Test: This work is still in progress and default is disabled. I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data. The QPS can reach about 9000. I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready. Co-authored-by: morningman <morningman@apache.org>	2022-08-23 08:45:18 +08:00
Xinyi Zou	92cef580f3	[enhancement](memory) Reduce virtual memory used by PaddedPODArray (#11816 )	2022-08-22 11:33:07 +08:00
Ashin Gau	6d925054de	[feature-wip](parquet-reader) decode parquet time & datetime & decimal (#11845 ) 1. Spark can set the timestamp precision by the following configuration: spark.sql.parquet.outputTimestampType = INT96(NANOS), TIMESTAMP_MICROS, TIMESTAMP_MILLIS DATETIME V1 only keeps the second precision, DATETIME V2 keeps the microsecond precision. 2. If using DECIMAL V2, the BE saves the value as decimal128, and keeps the precision of decimal as (precision=27, scale=9). DECIMAL V3 can maintain the right precision of decimal	2022-08-22 10:15:35 +08:00
Xinyi Zou	b1fd701493	[fix](memtracker) Improve memory tracking accuracy for exec nodes (#11947 )	2022-08-22 08:56:05 +08:00
Xinyi Zou	5eb5444476	[fix](memtracker) Remove useless memory exceed check #11939	2022-08-22 08:40:19 +08:00
Pxl	64dc3b360f	[Bug](function) fix dcheck fail on close vexpr ctx (#11908 )	2022-08-19 19:11:10 +08:00
Xinyi Zou	fcae979798	[fix](memtracker) Fix PartitionedAggregationNode DCHECK when mem exceed limit (#11902 )	2022-08-19 09:56:49 +08:00
Yongqiang YANG	8eb9ac3b04	[impovement](sink) print load_id when sink fails (#11893 )	2022-08-19 08:48:02 +08:00
Xinyi Zou	b300b4faa0	[enhancement](memtracker) Optimize readability of mem exceed limit error message #11877	2022-08-18 14:39:41 +08:00
Stalary	e1a1a04c2f	[Enhancement](Doe) Be query es use fe generate dsl. (#11840 )	2022-08-18 10:31:17 +08:00
HappenLee	582be130dd	[Feature] (ODBC) support read/write emoji of utf16 via odbc table (#11863 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-08-18 09:09:02 +08:00

1 2 3 4 5 ...

689 Commits