doris

Author	SHA1	Message	Date
Gabriel	fc4298e85e	[feature](outfile) support parquet writer (#12492 )	2022-09-15 11:09:12 +08:00
zhangstar333	22a8d35999	[Feature](vectorized) support jdbc sink for insert into data to table (#12534 )	2022-09-15 11:08:41 +08:00
HappenLee	e413a2b8e9	[Opt](vectorized) Use new way to do hash shffle to speed up query (#12586 )	2022-09-15 11:08:04 +08:00
starocean999	8e4374b7ec	[enhancement](agg)remove unnessasery mem alloc and dealloc in agg node (#12535 )	2022-09-15 11:07:06 +08:00
yixiutt	b136d80e1a	[enhancement](compress) reuse compression ctx and buffer (#12573 ) Reuse compression ctx and buffer. Use a global instance for every compression algorithm, and use a thread saft buffer pool to reuse compression buffer, pool size is equal to max parallel thread num in compression, and this will not be too large. Test shows this feature increase 5% of data import and compaction. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-15 10:59:46 +08:00
Zhengguo Yang	d8b6f09cc1	[Bugfix](string_functions) fix heap-buffer-overflow on find_in_set (#12613 )	2022-09-15 08:43:10 +08:00
lihangyu	f50054f547	[Enhancement](array-type) record offsets info to speed up the seek performance (#12293 ) Store the offset rather than the length in file for the data with array type. The new file format can improve the seek performance. Please refer to #12246 to get the performance report. Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>	2022-09-14 22:41:54 +08:00
Mingyu Chen	c5ad989065	[refactor](reader) refactor the interface of file reader (#12574 ) Currently, Doris has a variety of readers for different file formats, such as parquet reader, orc reader, csv reader, json reader and so on. The interfaces of these readers are not unified, which makes it impossible to call them through a unified method. In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class to use the `get_next_block()` method. This PR currently only modifies `arrow_reader` and `parquet reader`. Other readers will be modified one by one in subsequent PRs.	2022-09-14 22:31:11 +08:00
Pxl	0ead048b93	[Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456 ) 1. remove ColumnString terminating zero 2. add a data_version for pblock 3. change EncryptionMode to enum class	2022-09-14 21:25:22 +08:00
Jerry Hu	501e7b9132	[chore][config] increase the default value of doris_blocking_priority_queue_wait_timeout_ms (#12580 ) The default value of Config::doris_blocking_priority_queue_wait_timeout_ms make PriorityWorkStealingThreadPool::work_thread high CPU usage (about 8%)	2022-09-14 14:26:13 +08:00
Yongqiang YANG	5dcf933012	[Bug](column) ColumnNullable::replace_column_data should DCHECK size > sel… #12558	2022-09-14 08:42:15 +08:00
camby	56b2fc43d4	[enhancement](array-type) shrink column suffix zero for type ARRAY<CHAR> (#12443 ) In compute level, CHAR type will shrink suffix zeros. To keep the logic the same as CHAR type, we also shrink for ARRAY or ARRAY<ARRAY> types. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-13 23:24:48 +08:00
HappenLee	d913ca5731	[Opt](vectorized) Speed up bucket shuffle join hash compute (#12407 ) * [Opt](vectorized) Speed up bucket shuffle join hash compute	2022-09-13 20:19:22 +08:00
AlexYue	58508aea13	[enhance](information_schema) show hll type and bitmap type instead of unknown (#12519 ) Before this pr, when querying data type of hll/bitmap column, 'unknown' would be returned instead of the correct data type of queried column.	2022-09-13 19:43:42 +08:00
TengJianPing	6bf5fc6db5	[improvement](storage) For debugging problems: add session variable `skip_storage_engine_merge` to treat agg and unique data model as dup model (#11952 ) For debug purpose: Add session variable skip_storage_engine_merge, when set to true, tables of aggregate key model and unique key model will be read as duplicate key model. Add session variable skip_delete_predicate, when set to true, rows deleted with delete statement will be selected.	2022-09-13 19:18:56 +08:00
Pxl	9e49f68663	[fix](new-scan) try to fix invalid call to nullptr slot (#12552 )	2022-09-13 18:54:29 +08:00
Pxl	2306e46658	[Enhancement](compaction) reduce VMergeIterator copy block (#12316 ) This pr change make VMergeIterator support return row reference to instead copy a full block.	2022-09-13 16:19:34 +08:00
Jibing-Li	dc80a993bc	[feature-wip](new-scan) New load scanner. (#12275 ) Related pr: https://github.com/apache/doris/pull/11582 https://github.com/apache/doris/pull/12048 Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node. The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark. Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.	2022-09-13 13:36:34 +08:00
slothever	9f25544f2f	[feature-wip](parquet-reader) page index bug fix (#12428 ) Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-09-13 10:28:53 +08:00
Mingyu Chen	8a274d7851	[feature-wip](new-scan) refactor some interface about predicate push down in scan node (#12527 ) This PR introduce a new enum type `PushDownType`: ``` enum class PushDownType { // The predicate can not be pushed down to data source UNACCEPTABLE, // The predicate can be pushed down to data source // and the data source can fully evaludate it ACCEPTABLE, // The predicate can be pushed down to data source // but the data source can not fully evaluate it. PARTIAL_ACCEPTABLE }; ``` And derived class of VScanNode can override following method to determine whether to accept a bianry/in/bloom filter/is null predicate: ``` PushDownType _should_push_down_binary_predicate(); PushDownType _should_push_down_in_predicate(); PushDownType _should_push_down_function_filter(); PushDownType _should_push_down_bloom_filter(); PushDownType _should_push_down_is_null_predicate(); ```	2022-09-13 10:25:13 +08:00
Stalary	87439e227e	[Enhancement](DOE): Doe support object/nested use string (#12401 ) * MOD: doe support object/nested use string	2022-09-13 09:59:48 +08:00
Mingyu Chen	e33f4f90ae	[fix](exec) Avoid query thread block on wait_for_start (#12411 ) When FE send cancel rpc to BE, it does not notify the wait_for_start() thread, so that the fragment will be blocked and occupy the execution thread. Add a max wait time for wait_for_start() thread. So that it will not block forever.	2022-09-13 08:57:37 +08:00
TaoZex	c8e9a32bb2	[Function](cbrt)Add cbrt function for doris (#12523 ) Add cbrt function for doris	2022-09-12 19:58:45 +08:00
Henry2SS	ecfefae715	[enhancement](load) make default load mem limit configurable (#12348 ) * make LoadMemLimit valid for broker load, stream load and routine load Co-authored-by: wuhangze <wuhangze@jd.com>	2022-09-12 10:25:01 +08:00
carlvinhust2012	fc605779ed	[fix](array-type) support to export the array type to hdfs (#12504 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-09-12 10:23:33 +08:00
Mingyu Chen	efd2bdb203	[improvement](new-scan) avoid too many scanner context scheduling (#12491 ) When select large number of data from a table, the profile will show that: - ScannerCtxSchedCount: 2.82664M(2826640) But there is only 8 times of ScannerSchedCount, most of them are busy running. After improvement, the ScannerCtxSchedCount will be reduced to only 10.	2022-09-12 10:22:54 +08:00
weizuo93	e879c26232	[Enhancement](ChunkAllocator) Constructor of singleton class should be private #12516 Co-authored-by: weizuo <weizuo@xiaomi.com>	2022-09-12 10:21:49 +08:00
Xin Liao	554ba40b13	[feature-wip](unique-key-merge-on-write) update delete bitmap when increamental clone (#12364 )	2022-09-09 17:03:27 +08:00
Gabriel	66491ec137	[Improvement](sort) improve partial sort algorithm (#12349 ) * [Improvement](sort) improve partial sort algorithm	2022-09-09 15:44:18 +08:00
Mingyu Chen	f98ec06783	[feature-wip](new-scan) Add memtracker and span for new olap scan node (#12281 ) Add memtracker and span for new olap scan node	2022-09-09 09:39:08 +08:00
Ashin Gau	b4663062da	[feature-wip](parquet-reader) bug fix, parquet footer buffer is small when containing many columns (#12477 ) Failed when reading parquet file with many columns(>1600). mysql> select int_col from types_sf100_r100w limit 5; ERROR 1105 (HY000): errCode = 2, detailMessage = Couldn't deserialize thrift msg: TProtocolException: Invalid data parse_thrift_footer uses fixed length buffer(=64k) to read parquet footer, but the meta data of a parquet file with 1600 columns can exceed 5MB. Therefore, the buffer size needs to be applied according to the actual length.	2022-09-09 09:12:34 +08:00
Ashin Gau	3c4c4b1a87	[feature-wip](parquet-reader) add gzip compression codec (#12488 ) Query failed when reading parquet data compressed by GZIP: mysql> select * from customer limit 1; ERROR 1105 (HY000): errCode = 2, detailMessage = unknown compression type(GZIP)	2022-09-09 09:10:25 +08:00
zhengyu	22dec46f48	[fix](vectorized load) fix incomplete errmsg when find partition failed (#12485 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-09-09 09:03:06 +08:00
yinzhijian	2ccbbb5392	[fix](stream load) Fix wrong conversion of null value when vstream load json format (#12460 )	2022-09-08 16:48:35 +08:00
Jerry Hu	14221adbbd	[fix](agg) crash caused by failure of prepare (#12437 )	2022-09-08 15:03:45 +08:00
Yongqiang YANG	c3af60eff8	[fix](threadpool) threadpool schedules does not work right on concurr… (#12370 ) * [fix](threadpool) threadpool schedules does not work right on concurrent token Assuming there is a concurrent thread token whose concurrency is 2, and the 1st submit on the token is submitted to threadpool while the 2nd is not submitted due to busy. The token's active_threads is 1, then thread pool does not schedule the token. The patch fixes the problem.	2022-09-08 14:54:46 +08:00
camby	26cf2d3742	[enhancement](array-type) avoid abuse of Offset and Offset64 #12378 We already separate Array Offset64 and String Offset(32bit) in PR: #12341 Now we limit: Offset inside IColumn, Offset64 only inside ColumnArray, to avoid abuse of them. If we use the wrong one, it will compile failed.	2022-09-08 14:53:07 +08:00
Yongqiang YANG	53b619c487	[brpc]using pooled connection and enlarge brpc connection timeout and retry… (#10443 ) * using pooled connection and enlarge brpc connection timeout and retry times When a connection failure happen, doris fails queries using the connection. We should lower the impact of a connection failure by using pooled connection and enlaring connection timeout and retry times. * clang format	2022-09-08 14:50:15 +08:00
zxealous	af0f4584d5	fix cache cleaner (#12432 )	2022-09-08 13:31:19 +08:00
yixiutt	2a64571bef	[enhancement](generic_iterator) fix num check and add some notes (#12434 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-08 12:09:02 +08:00
Ashin Gau	dd2f834c79	[feature-wip](parquet-reader) bug fix, create compress codec before parsing dictionary (#12422 ) ## Fix five bugs: 1. Parquet dictionary data may be compressed, but `ColumnChunkReader` try to parse dictionary data before creating compression codec, causing unexpected data errors. 2. `FE` doesn't resolve array type 3. `ParquetFileHdfsScanner` doesn't fill partition values when the table is partitioned 4. `ParquetFileHdfsScanner` set `_scanner_eof = true` when a scan range is empty, causing the end of the scanner, and resulting in data loss 5. typographical error in `PageReader`	2022-09-08 09:54:25 +08:00
Luwei	d40a9d0555	[fix](memtracker) Fix memtracker did not subtract the memory released by load channel cancel (#12405 ) When the load channel is canceled, the memtracker does not subtract the memory released by the load channel. This will cause the memory usage counted by the memtracker of the load channel mgr to be larger than the actual memory usage.	2022-09-08 09:22:11 +08:00
Gabriel	41bc6b857d	[refactor](shuffle) remove unused code (#12442 )	2022-09-08 09:15:25 +08:00
yixiutt	018b4b7e1e	[bugfix](report) fix continuous version miss check (#12415 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-08 08:39:22 +08:00
yixiutt	e7aa131506	[enhancement](tcmalloc) add aggressive_memory_decommit conf and make it disable (#12436 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-08 08:37:16 +08:00
Gabriel	86e347f3bb	[Bug](doe) fix closing scanner twice (#12408 )	2022-09-07 22:45:30 +08:00
zhengyu	569ab30556	[bug](NodeChannel) fix OOM caused by pending queue in sink send (#12359 ) (#12362 ) Each NodeChannel has its own queue, with size up to 1/20 exec_mem_limit. User will crash into OOM if set exec_mem_limit high. This commit uses fixed number to control the total max memory used by NodeChannels. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-09-07 20:49:08 +08:00
yongjinhou	09b45f2b71	[Function](ELT)Add elt function (#12321 )	2022-09-07 15:21:08 +08:00
Gabriel	449d0c219f	[Improvement](sort) Accumulate blocks to do partial sort (#12336 )	2022-09-07 10:34:28 +08:00
zhangstar333	42bdde8750	[Feature](Vectorized) support jdbc scan node (#12010 )	2022-09-07 10:29:41 +08:00

1 2 3 4 5 ...

2772 Commits