doris

Author	SHA1	Message	Date
carlvinhust2012	cff9ffa0e1	fix the inaccurate comments (#10617 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-07-06 17:54:43 +08:00
camby	a7df6e3dee	rename some files inside vec/sink dir (#10636 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-07-06 17:52:47 +08:00
Mingyu Chen	8e364fb848	[fix](load) skip empty orc file (#10593 ) Something the upstream system(eg, hive) may create empty orc file which only has a header and footer, without schema. And if we call `_reader->createRowReader()` with selected columns, it will throw ParserError: Invalid column selected xx. So here we first check its number of rows and skip these kind of files. This is only a fix for non-vec load, for vec load, it use arrow scanner to read orc file, which does not have this problem.	2022-07-05 22:18:56 +08:00
Gabriel	a2f74bf260	[Improvement] remove profile with poor readability (#10581 )	2022-07-05 11:09:23 +08:00
Jibing-Li	73ba806046	[feature-wip](multi-catalog) Add catalog to information_schema table "columns". (#10592 )	2022-07-05 09:57:19 +08:00
huangzhaowei	46bff6bba0	[fix](multi-catalog) fix the core dump on hms table (#10573 ) In the funciton `TextConverter::write_vec_column`, it should execute the statement `nullable_column->get_null_map_data().push_back(0);` for every row. Otherwise the null map will get error and cause the core dump.	2022-07-04 15:52:05 +08:00
carlvinhust2012	9d4a9b95a4	[Build] fix the compile error with clang (#10570 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-07-04 11:13:17 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
yiguolei	97996c9275	[fix](Insert) fix 5 concurrent "insert...select..." OOM (#10501 ) * [hotfix](dev-1.0.1) 5 concurrent insert...select... OOM Co-authored-by: minghong <minghong.zhou@163.com> Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-07-01 15:29:26 +08:00
yiguolei	4ec6e3ee81	[refactor] Remove debug action since it is never used. (#10484 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-06-29 20:37:51 +08:00
huangzhaowei	abd10f0f3e	[feature-wip](multi-catalog) Impl FileScanNode in be (#10402 ) Define a new file scanner node for hms table in be. This file scanner node is different from broker scan node as blow: 1. Broker scan node will define src slot and dest slot, there is two memory copy in it: first is from file to src slot and second from src to dest slot. Otherwise FileScanNode only have one stemp memory copy just from file to dest slot. 2. Broker scan node will read all the filed in the file to src slot and FileScanNode only read the need filed. 3. Broker scan node will convert type into string type for src slot and then use cast to convert to dest slot type, but FileScanNode will have the final type. Now FileScanNode is a standalone code, but we will uniform the file scan and broker scan in the feature.	2022-06-29 11:04:01 +08:00
Tiewei Fang	17eb8c00d3	[feature] add table valued function framework and numbers table valued function (#10214 )	2022-06-28 14:01:57 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
Stalary	79ad05eec6	[fix](doe) fix doe on es v8 (#10391 ) doris on es8 can not work, because type change. The use of type is no longer recommended in es7, and support for type has been removed from es8. 1. /_mapping not support include_type_name 2. /_search not support use type	2022-06-26 09:51:29 +08:00
Gabriel	eebfbd0c91	Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323 )" (#10424 ) This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.	2022-06-25 22:18:08 +08:00
Mingyu Chen	7fe4b20da3	[feature-wip](multi-catalog) refactor catalog interface (#10320 )	2022-06-25 21:51:54 +08:00
Jibing-Li	8abd00dcd5	[feature-wip](multi-catalog) Add catalog name to information schema. (#10349 ) Information schema database need to show catalog name after multi-catalog is supported. This part is step 1, add catalog name for schemata table.	2022-06-25 11:53:04 +08:00
HappenLee	2cc670dba6	[fix](vectorized) Support outer join for vectorized exec engine (#10323 ) In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple.	2022-06-24 08:59:30 +08:00
Gabriel	139cd3d11a	[Improvement] remove olap filters when use in key ranges (#10278 )	2022-06-23 09:12:29 +08:00
Yongqiang YANG	274a0f2603	[fix] do not read seq column when reading a compacted rowset (#10344 ) SEQ_COL is used on tables with unique key to order data in one transaction(rowset), when there is only one rowset and the rowset is compacted, rows in the rowset is sorted and rows with same keys are resolved by compaction, so a scanner sets direct_mode to optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL. However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator. Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator does not handle it, so a core dump happen. Actually, in the above case, segment iterator does not need to read SEQ_COL. When SEQ_COL is really needed, iterators creates SEQ_COL column in block, so segment Iterator does not need do create SEQ_COL at all.	2022-06-23 08:44:43 +08:00
camby	0e404edf54	[improvement] Change array offset type from UInt32 to UInt64 (#10070 ) Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now. If we call array_union to merge arrays repeatedly, the size of array may overflow. So we need to extend it before `Array Data Type` release.	2022-06-19 10:24:08 +08:00
Pxl	fd0bd395ac	[Enhancement] Remove some unused include (#10035 )	2022-06-17 10:47:25 +08:00
yinzhijian	75a7e72402	[Refactor] Use iequal to replace boost::iequals (#10146 ) * [Refactor] Use iequal to replace boost::iequals * remove unused include	2022-06-16 18:18:38 +08:00
Gabriel	28e8effc52	[Refactor] Refactor vectorized scan node (#9968 )	2022-06-16 11:10:56 +08:00
Jerry Hu	4b9d500425	[improvement](profile) Add table name and predicates (#10093 )	2022-06-16 10:59:31 +08:00
Xinyi Zou	85362a907e	[fix](mem tracker) Fix some memory leaks, inaccurate statistics, core dump, deadlock bugs (#10072 ) 1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time. 2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`. 3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker. 4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock. 5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance. 6. Optimize some details of mem tracker display.	2022-06-14 21:38:37 +08:00
yinzhijian	2a96d7ffde	[spell] Fix spell error in row_batch.h (#10109 )	2022-06-14 15:28:29 +08:00
zxealous	d4d2e82bdf	[typo] Fix typos in comments (#10106 )	2022-06-14 08:17:19 +08:00
Xinyi Zou	d58e00c49c	[fix](brpc) Embed serialized request into the attachment and transmit it through http brpc (#9803 ) When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the `Tuple/Block data` into the controller attachment and transmit it through http brpc. This is to avoid errors when the length of the protoBuf request exceeds 2G: `Bad request, error_text=[E1003]Fail to compress request`. In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`, but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`. Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically reduces one serialization and improves performance. However, the test found that the performance did not improve, but the memory peak increased due to the addition of a memory copy.	2022-06-13 20:41:48 +08:00
Zhengguo Yang	e0cf2677a0	[dependency][enhancement] support build libhdfs in arm cpus (#10018 ) Supports native hdfs functionality on arm cpu This pr mainly upgrades libdfs3 and supports running on arm，and make libhdfs3 with kerberos as default	2022-06-10 19:40:41 +08:00
Gabriel	1220cc147d	[feature](vectorized) Support outfile on vectorized engine (#10013 ) This PR supports output csv format file on vectorized engine. Parquet is still not supported.	2022-06-10 09:15:53 +08:00
yinzhijian	19bc14cf8d	[feature-wip](array-type) Add array type support for vectorized parquet-orc scanner (#9856 ) Only support one level array now. for example: - nullable(array(nullable(tinyint))) is support. - nullable(array(nullable(array(xx))) is not support.	2022-06-09 12:11:47 +08:00
HappenLee	94089b9192	[Refactor] Use file factory to replace create file reader/writer (#9505 ) 1. Simplify code logic and improve abstraction 2. Fix the mem leak of raw pointer Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-08 15:07:39 +08:00
Gabriel	35c3e4e33c	[Bug] runtime filter is not used as expected (#10001 ) * [Bug] runtime filter is not used as expected * update	2022-06-08 11:10:39 +08:00
HappenLee	c426c2e4b1	[Vectorized-Load] Support vectorized load table with materialized view (#9923 ) * [Vectorized-Load] Support vectorized load table with materialized view * fix ut Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-02 14:59:01 +08:00
Pxl	d34d631519	[bugfix]fix TableFunctionNode memory leak (#9853 )	2022-05-31 19:20:22 +08:00
HappenLee	7199102d7c	[Opt][VecLoad] Opt the vec stream load performance (#9772 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-31 11:53:32 +08:00
Adonis Ling	f377c26bf7	[refactor][be] Optimize headers (#9708 )	2022-05-30 16:12:10 +08:00
yiguolei	4af2493c42	[Improvement] optimize scannode concurrency query performance in vectorized engine. (#9792 )	2022-05-30 16:04:40 +08:00
EmmyMiao87	0683181fef	[API changed](parser) Remove merge join syntax (#9795 ) Remove merge join sql and merge join node	2022-05-30 09:04:21 +08:00
Gabriel	a96b41db7a	[Improvement] Simplify expressions for _vconjunct_ctx_ptr (#9816 )	2022-05-29 23:05:21 +08:00
Pxl	f33ef32d92	[Bug] [Bitmap] change to_bitmap to always_not_nullable (#9716 )	2022-05-28 17:33:55 +08:00
yinzhijian	cbbda7857b	[feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541 )	2022-05-26 21:39:12 +08:00
jacktengg	f4dd3bf013	[bugfix] fix memleak in olapscannode(#9736 )	2022-05-26 15:06:54 +08:00
Xinyi Zou	ca05d1ee01	[fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661 ) 1. Fix Lru Cache MemTracker consumption value is negative. 2. Fix compaction Cache MemTracker has no track. 3. Add USE_MEM_TRACKER compile option. 4. Make sure the malloc/free hook is not stopped at any time.	2022-05-25 08:56:17 +08:00
xiepengcheng01	31e40191a8	[Refactor] add vpre_filter_expr for vectorized to improve performance (#9508 )	2022-05-22 11:45:57 +08:00
Gabriel	61a60d1dcc	[code style] minor update for code style (#9695 )	2022-05-20 11:47:49 +08:00
HappenLee	8fa677b59c	[Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner (#9666 ) * [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner 1. fix bug of vjson scanner not support `range_from_file_path` 2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different 3. fix bug of vparquest filter_block reference of column in not 1 4. refactor code to simple all the code It only changed vectorized load, not original row based load. Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-20 11:43:03 +08:00
Jibing-Li	5fa6e892be	[fix](broker-scan-node) Remove trailing spaces in broker_scanner. Make it consistent with hive and trino behavior. (#9190 ) Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't. This would cause different query result with hive. Add a new session variable "trim_tailing_spaces_for_external_table_query". If set to true, when reading csv from broker scan node, it will trim the tailing space of the column	2022-05-20 09:55:13 +08:00
Lightman	ef65f484df	[Enhancement] improve parquet reader via arrow's prefetch and multi thread (#9472 ) * add ArrowReaderProperties to parquet::arrow::FileReader * support perfecth batch	2022-05-19 23:52:01 +08:00

1 2 3 4 5 ...

585 Commits