doris

Author	SHA1	Message	Date
Mingyu Chen	f98ec06783	[feature-wip](new-scan) Add memtracker and span for new olap scan node (#12281 ) Add memtracker and span for new olap scan node	2022-09-09 09:39:08 +08:00
Ashin Gau	b4663062da	[feature-wip](parquet-reader) bug fix, parquet footer buffer is small when containing many columns (#12477 ) Failed when reading parquet file with many columns(>1600). mysql> select int_col from types_sf100_r100w limit 5; ERROR 1105 (HY000): errCode = 2, detailMessage = Couldn't deserialize thrift msg: TProtocolException: Invalid data parse_thrift_footer uses fixed length buffer(=64k) to read parquet footer, but the meta data of a parquet file with 1600 columns can exceed 5MB. Therefore, the buffer size needs to be applied according to the actual length.	2022-09-09 09:12:34 +08:00
Ashin Gau	3c4c4b1a87	[feature-wip](parquet-reader) add gzip compression codec (#12488 ) Query failed when reading parquet data compressed by GZIP: mysql> select * from customer limit 1; ERROR 1105 (HY000): errCode = 2, detailMessage = unknown compression type(GZIP)	2022-09-09 09:10:25 +08:00
zhengyu	22dec46f48	[fix](vectorized load) fix incomplete errmsg when find partition failed (#12485 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-09-09 09:03:06 +08:00
yinzhijian	2ccbbb5392	[fix](stream load) Fix wrong conversion of null value when vstream load json format (#12460 )	2022-09-08 16:48:35 +08:00
Jerry Hu	14221adbbd	[fix](agg) crash caused by failure of prepare (#12437 )	2022-09-08 15:03:45 +08:00
Yongqiang YANG	c3af60eff8	[fix](threadpool) threadpool schedules does not work right on concurr… (#12370 ) * [fix](threadpool) threadpool schedules does not work right on concurrent token Assuming there is a concurrent thread token whose concurrency is 2, and the 1st submit on the token is submitted to threadpool while the 2nd is not submitted due to busy. The token's active_threads is 1, then thread pool does not schedule the token. The patch fixes the problem.	2022-09-08 14:54:46 +08:00
camby	26cf2d3742	[enhancement](array-type) avoid abuse of Offset and Offset64 #12378 We already separate Array Offset64 and String Offset(32bit) in PR: #12341 Now we limit: Offset inside IColumn, Offset64 only inside ColumnArray, to avoid abuse of them. If we use the wrong one, it will compile failed.	2022-09-08 14:53:07 +08:00
Yongqiang YANG	53b619c487	[brpc]using pooled connection and enlarge brpc connection timeout and retry… (#10443 ) * using pooled connection and enlarge brpc connection timeout and retry times When a connection failure happen, doris fails queries using the connection. We should lower the impact of a connection failure by using pooled connection and enlaring connection timeout and retry times. * clang format	2022-09-08 14:50:15 +08:00
zxealous	af0f4584d5	fix cache cleaner (#12432 )	2022-09-08 13:31:19 +08:00
yixiutt	2a64571bef	[enhancement](generic_iterator) fix num check and add some notes (#12434 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-08 12:09:02 +08:00
Ashin Gau	dd2f834c79	[feature-wip](parquet-reader) bug fix, create compress codec before parsing dictionary (#12422 ) ## Fix five bugs: 1. Parquet dictionary data may be compressed, but `ColumnChunkReader` try to parse dictionary data before creating compression codec, causing unexpected data errors. 2. `FE` doesn't resolve array type 3. `ParquetFileHdfsScanner` doesn't fill partition values when the table is partitioned 4. `ParquetFileHdfsScanner` set `_scanner_eof = true` when a scan range is empty, causing the end of the scanner, and resulting in data loss 5. typographical error in `PageReader`	2022-09-08 09:54:25 +08:00
Luwei	d40a9d0555	[fix](memtracker) Fix memtracker did not subtract the memory released by load channel cancel (#12405 ) When the load channel is canceled, the memtracker does not subtract the memory released by the load channel. This will cause the memory usage counted by the memtracker of the load channel mgr to be larger than the actual memory usage.	2022-09-08 09:22:11 +08:00
Gabriel	41bc6b857d	[refactor](shuffle) remove unused code (#12442 )	2022-09-08 09:15:25 +08:00
yixiutt	018b4b7e1e	[bugfix](report) fix continuous version miss check (#12415 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-08 08:39:22 +08:00
yixiutt	e7aa131506	[enhancement](tcmalloc) add aggressive_memory_decommit conf and make it disable (#12436 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-08 08:37:16 +08:00
Gabriel	86e347f3bb	[Bug](doe) fix closing scanner twice (#12408 )	2022-09-07 22:45:30 +08:00
zhengyu	569ab30556	[bug](NodeChannel) fix OOM caused by pending queue in sink send (#12359 ) (#12362 ) Each NodeChannel has its own queue, with size up to 1/20 exec_mem_limit. User will crash into OOM if set exec_mem_limit high. This commit uses fixed number to control the total max memory used by NodeChannels. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-09-07 20:49:08 +08:00
yongjinhou	09b45f2b71	[Function](ELT)Add elt function (#12321 )	2022-09-07 15:21:08 +08:00
Gabriel	449d0c219f	[Improvement](sort) Accumulate blocks to do partial sort (#12336 )	2022-09-07 10:34:28 +08:00
zhangstar333	42bdde8750	[Feature](Vectorized) support jdbc scan node (#12010 )	2022-09-07 10:29:41 +08:00
HappenLee	54d1630c42	[Opt](vectorized) speed up hash function compute in hash partition (#12334 ) After do the opt of hash function, the compute of siphash in HASH_PARTITION in vdata_stream_sender Before: 1s800ms After: 800ms	2022-09-07 10:11:40 +08:00
zxealous	e4b894a318	[Bug](remote) Fix BE crash because of call the future's get method twice (#12357 ) call the future's get method once and save it.	2022-09-07 10:11:27 +08:00
zhengyu	445f0882d1	[Enhancement](log) improve error msg for delta writer fail (#12121 ) (#12360 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-09-07 10:10:51 +08:00
Jerry Hu	3485dfa927	[chore](profile) add some counters in aggregatation & sender (#12385 )	2022-09-07 10:09:05 +08:00
Gabriel	922b04fdc1	[Improvement](vectorized) change `static_cast` to `assert_cast` for reference (#12379 ) * [Improvement](vectorized) change `static_cast` to `assert_cast` for reference	2022-09-07 09:27:13 +08:00
Mingyu Chen	893567628e	[fix](exec-node) fix nullptr of runtime state (#12395 ) Remove default nullptr runtime state, which is very error-prone	2022-09-07 08:46:42 +08:00
camby	b8cc576cba	[fix](array-type) add data valid check for ARRAY type while insert or load (#12283 ) Add data valid check for ARRAY type while insert or load Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-06 20:48:58 +08:00
slothever	4a55b504c0	[feature-wip](parquet-reader) bug fix, get the correct group reader (#12294 ) Fix the problem that cannot read the lineitem table of TPCH , and the error of allocate memory Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-09-06 13:59:35 +08:00
camby	cf5d194fe1	[enhancement](array-type) Split Array Offsets and String Offsets (#12341 ) In old Doris version string offsets are 32bit, but it is not enough for Array type. If we change string offsets from 32bit to 64bit, there will be problem if we upgrade BE one by one. Because at the same time 32bit Offsets and 64 bit Offsets String will exist at the same time. As a result, we separate the Codes for Array Offsets. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-06 11:18:27 +08:00
HappenLee	b8e38b9167	[Bug](load) block call clear_column_data may have ref not equal 1 (#12350 )	2022-09-05 20:40:40 +08:00
Xinyi Zou	e175a7ed63	[fix](memtracker) Fix the exceeded limit of the first query execution (#12332 ) In some cases, when the user executes the query for the first time, an error of the exceeded mem limit will be reported, and the query will be successful only after the second execution. This is because when the query is executed for the first time, the memory consumed by adding the page cache and other caches is recorded in the query mem tracker, hoping to unify the behavior of multiple queries. A temporary solution, remove the hook of scanner thread, test clickbench q13 Before removing the scanner thread hook Enable page cache: 3G for the first query, 3G for the tracker; 900M for the second query, 900M for the tracker. Turn off page cache: 1.9G for the first query, 1.9G for the tracker; 900M for the second query, 900M for the tracker After removing the scanner thread hook and fix MemTrackerLimiter::cache_consume_local bug Enable page cache: 2916M for the first query, 1147M for the tracker; 979M for the second query, 1144M for the tracker Turn off page cache: 1809M for the first query, 1147M for the tracker; 975M for the second query, 1145M for the tracker TODO, a better solution is to track storage-related memory separately, in the scanner thread. Otherwise, it is impossible to know where the process memory grows when querying.	2022-09-05 19:22:46 +08:00
Xinyi Zou	05f6e1b33d	[fix](memtracker) Fix open query profile to print the complete mem limit exceed log #12339	2022-09-05 19:21:43 +08:00
zhannngchen	38937c15d7	[typo](streamload) fix typo and remove useless method declaration #12343	2022-09-05 19:16:36 +08:00
Adonis Ling	8bfb89c100	[feature-wip](array-type) Add some regression tests for nested array (#12322 ) #11392 made _input_block in each BetaRowsetReaders sharable. However, for some types (e.g. nested array with more than 1 depth), the _column_vector_batches in RowBlockV2 can be nested which means that there is a ColumnVectorBatch inside another ColumnVectorBatch. In this case, the data of inner ColumnVectorBatch may be corrupted because the data of _input_block is copied shallowly to the _output_block.	2022-09-05 14:05:24 +08:00
Jerry Hu	7b352c93ff	[improvement](sink) avoid frequent allocation and deallocation when serializing block (#12310 )	2022-09-05 12:23:43 +08:00
TaoZex	7929500608	[typo](docs)The table_function calling reset() function should set _eos to false #12323	2022-09-05 08:29:19 +08:00
morrySnow	7f10fa9768	[fix](compile)compile error when use clang on aarch64 platform (#12319 )	2022-09-05 08:28:51 +08:00
Gabriel	d5e5afe437	[Bug](function) disable LUT for yearweek (#12324 )	2022-09-05 08:27:43 +08:00
xy720	62561834a8	[Feature](array-type) Support is-null-predicate for array type (#12237 )	2022-09-03 11:37:57 +08:00
xy720	e7303c12c7	[Enhancement](array-type) Support Floating/Decimal type for array aggregation functions (#12271 )	2022-09-03 09:55:56 +08:00
Pxl	a8c8ebf5cf	[Enhancement](compaction) empty string optimize for binary dict code (#12259 ) improve write empty string perfomance.	2022-09-02 14:25:19 +08:00
Ashin Gau	202ad5c659	[feature-wip](parquet-reader) bug fix, the number of rows are different among columns in a block (#12228 ) 1. `ExprContext` is delete in `ParquetReader::close()`, but it has not been closed, so the `DCHECH` in `~ExprContext()` is failed. the lifetime of `ExprContext` is managed by scan node, so we should not delete its pointer in `ParquetReader::close()`. 2. `RowGroupReader::next_batch` will update `_read_rows` in every column loop, and does not ensure the number of rows in every column are equal. 3. The skipped row ranges are variables in stack, which are released when calling `ArrayColumnReader::read_column_data`, so we should copy them out.	2022-09-02 09:50:25 +08:00
Mingyu Chen	3ce305134a	[fix](scan) fix potential wrong cancel when sql has limit (#12224 )	2022-09-01 19:11:40 +08:00
Gabriel	3bcab8bbef	[feature](function) support now/current_timestamp functions with precision (#12219 ) * [feature](function) support now/current_timestamp functions with precision	2022-09-01 14:35:12 +08:00
pengxiangyu	c5481dfdf7	[fix](remote)Fix bug for Segment::open() in case: config::file_cache_type (#12249 ) * fix bug for Segment::open() in case: config::file_cache_type * fix bug for Segment::open() in case: config::file_cache_type	2022-09-01 14:16:41 +08:00
TengJianPing	f294d33332	[bugfix](index) index page should not be bitshuffle decoded (#12231 ) * [bugfix](index) index page should not be bitshuffle decoded * minor change	2022-09-01 11:56:44 +08:00
camby	fc05d54f0d	[fix](array-type) array_sort function with empty input #12175 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-01 10:54:09 +08:00
HappenLee	8c8078ad28	[fix](projections) get error row_descriptor when have projections on ExecNode (#12232 ) When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.	2022-09-01 10:48:10 +08:00
yixiutt	60a2fa7dea	[Improvement](compaction) copy row in batch in VCollectIterator&VGenericIterator (#12214 ) In VCollectIterator&VGenericIterator, use insert_range_from to copy rows in a block which is continuous to save cpu cost. If rows in rowset and segment are non overlapping, this whill improve 30% throughput of compaction.If rows are completely overlapping such as load two same files, the throughput goes nearly same as before. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-01 10:20:17 +08:00

1 2 3 4 5 ...

2743 Commits