Commit Graph

585 Commits

Author SHA1 Message Date
cff9ffa0e1 fix the inaccurate comments (#10617)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-07-06 17:54:43 +08:00
a7df6e3dee rename some files inside vec/sink dir (#10636)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-06 17:52:47 +08:00
8e364fb848 [fix](load) skip empty orc file (#10593)
Something the upstream system(eg, hive) may create empty orc file
which only has a header and footer, without schema.
And if we call `_reader->createRowReader()` with selected columns,
it will throw ParserError: Invalid column selected xx.
So here we first check its number of rows and skip these kind of files.

This is only a fix for non-vec load, for vec load, it use arrow scanner
to read orc file, which does not have this problem.
2022-07-05 22:18:56 +08:00
a2f74bf260 [Improvement] remove profile with poor readability (#10581) 2022-07-05 11:09:23 +08:00
73ba806046 [feature-wip](multi-catalog) Add catalog to information_schema table "columns". (#10592) 2022-07-05 09:57:19 +08:00
46bff6bba0 [fix](multi-catalog) fix the core dump on hms table (#10573)
In the funciton `TextConverter::write_vec_column`, it should execute the statement `nullable_column->get_null_map_data().push_back(0);` for every row.
Otherwise the null map will get error and cause the core dump.
2022-07-04 15:52:05 +08:00
9d4a9b95a4 [Build] fix the compile error with clang (#10570)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-07-04 11:13:17 +08:00
c9f86bc7e2 [refactor] Refactoring Status static methods to format message using fmt(#9533) 2022-07-02 18:58:23 +08:00
97996c9275 [fix](Insert) fix 5 concurrent "insert...select..." OOM (#10501)
* [hotfix](dev-1.0.1) 5 concurrent insert...select... OOM

Co-authored-by: minghong <minghong.zhou@163.com>
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-07-01 15:29:26 +08:00
4ec6e3ee81 [refactor] Remove debug action since it is never used. (#10484)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-29 20:37:51 +08:00
abd10f0f3e [feature-wip](multi-catalog) Impl FileScanNode in be (#10402)
Define a new file scanner node for hms table in be.
This file scanner node is different from broker scan node as blow:
1. Broker scan node will define src slot and dest slot, there is two memory copy in it: first is from file to src slot
    and second from src to dest slot. Otherwise FileScanNode only have one stemp memory copy just from file to dest slot.
2. Broker scan node will read all the filed in the file to src slot and FileScanNode only read the need filed.
3. Broker scan node will convert type into string type for src slot and then use cast to convert to dest slot type,
    but FileScanNode will have the final type.

Now FileScanNode is a standalone code, but we will uniform the file scan and broker scan in the feature.
2022-06-29 11:04:01 +08:00
17eb8c00d3 [feature] add table valued function framework and numbers table valued function (#10214) 2022-06-28 14:01:57 +08:00
ca94867b4e [Feature-wip] add date v2 type (#9916) 2022-06-26 16:07:56 +08:00
79ad05eec6 [fix](doe) fix doe on es v8 (#10391)
doris on es8 can not work, because type change. The use of type is no longer recommended in es7,
and support for type has been removed from es8.

1. /_mapping not support include_type_name
2. /_search not support use type
2022-06-26 09:51:29 +08:00
eebfbd0c91 Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323)" (#10424)
This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.
2022-06-25 22:18:08 +08:00
7fe4b20da3 [feature-wip](multi-catalog) refactor catalog interface (#10320) 2022-06-25 21:51:54 +08:00
8abd00dcd5 [feature-wip](multi-catalog) Add catalog name to information schema. (#10349)
Information schema database need to show catalog name after multi-catalog is supported.
This part is step 1, add catalog name for schemata table.
2022-06-25 11:53:04 +08:00
2cc670dba6 [fix](vectorized) Support outer join for vectorized exec engine (#10323)
In a vectorized scenario, the query plan will generate a new tuple for the join node.
This tuple mainly describes the output schema of the join node.
Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema.
For example:
1. The case where the null side column caused by outer join is converted to nullable.
2. The projection of the outer tuple.
2022-06-24 08:59:30 +08:00
139cd3d11a [Improvement] remove olap filters when use in key ranges (#10278) 2022-06-23 09:12:29 +08:00
274a0f2603 [fix] do not read seq column when reading a compacted rowset (#10344)
SEQ_COL is used on tables with unique key to order data in one transaction(rowset),
when there is only one rowset and the rowset is compacted, rows in the rowset is sorted 
and rows with same keys are resolved by compaction, so a scanner sets direct_mode to 
optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL. 
However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator.
Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator 
creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator 
does not handle it, so a core dump happen.

Actually, in the above case, segment iterator does not need to read SEQ_COL. 
When SEQ_COL is really needed, iterators creates SEQ_COL column in block,
so segment Iterator does not need do create SEQ_COL at all.
2022-06-23 08:44:43 +08:00
0e404edf54 [improvement] Change array offset type from UInt32 to UInt64 (#10070)
Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
2022-06-19 10:24:08 +08:00
Pxl
fd0bd395ac [Enhancement] Remove some unused include (#10035) 2022-06-17 10:47:25 +08:00
75a7e72402 [Refactor] Use iequal to replace boost::iequals (#10146)
* [Refactor] Use iequal to replace boost::iequals

* remove unused include
2022-06-16 18:18:38 +08:00
28e8effc52 [Refactor] Refactor vectorized scan node (#9968) 2022-06-16 11:10:56 +08:00
4b9d500425 [improvement](profile) Add table name and predicates (#10093) 2022-06-16 10:59:31 +08:00
85362a907e [fix](mem tracker) Fix some memory leaks, inaccurate statistics, core dump, deadlock bugs (#10072)
1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time.
2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`.
3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker.
4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock.
5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance.
6. Optimize some details of mem tracker display.
2022-06-14 21:38:37 +08:00
2a96d7ffde [spell] Fix spell error in row_batch.h (#10109) 2022-06-14 15:28:29 +08:00
d4d2e82bdf [typo] Fix typos in comments (#10106) 2022-06-14 08:17:19 +08:00
d58e00c49c [fix](brpc) Embed serialized request into the attachment and transmit it through http brpc (#9803)
When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the
`Tuple/Block data` into the controller attachment and transmit it through http brpc.

This is to avoid errors when the length of the protoBuf request exceeds 2G:
`Bad request, error_text=[E1003]Fail to compress request`.

In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`,
but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`.

Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically
reduces one serialization and improves performance. However, the test found that the performance did not improve,
but the memory peak increased due to the addition of a memory copy.
2022-06-13 20:41:48 +08:00
e0cf2677a0 [dependency][enhancement] support build libhdfs in arm cpus (#10018)
Supports native hdfs functionality on arm cpu
This pr mainly upgrades libdfs3 and supports running on arm,and make libhdfs3 with kerberos as default
2022-06-10 19:40:41 +08:00
1220cc147d [feature](vectorized) Support outfile on vectorized engine (#10013)
This PR supports output csv format file on vectorized engine.

** Parquet is still not supported. **
2022-06-10 09:15:53 +08:00
19bc14cf8d [feature-wip](array-type) Add array type support for vectorized parquet-orc scanner (#9856)
Only support one level array now.
for example:
- nullable(array(nullable(tinyint))) is **support**.
- nullable(array(nullable(array(xx))) is **not support**.
2022-06-09 12:11:47 +08:00
94089b9192 [Refactor] Use file factory to replace create file reader/writer (#9505)
1. Simplify code logic and improve abstraction
2. Fix the mem leak of raw pointer

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-08 15:07:39 +08:00
35c3e4e33c [Bug] runtime filter is not used as expected (#10001)
* [Bug] runtime filter is not used as expected

* update
2022-06-08 11:10:39 +08:00
c426c2e4b1 [Vectorized-Load] Support vectorized load table with materialized view (#9923)
* [Vectorized-Load] Support vectorized load table with materialized view

* fix ut

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-02 14:59:01 +08:00
Pxl
d34d631519 [bugfix]fix TableFunctionNode memory leak (#9853) 2022-05-31 19:20:22 +08:00
7199102d7c [Opt][VecLoad] Opt the vec stream load performance (#9772)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-31 11:53:32 +08:00
f377c26bf7 [refactor][be] Optimize headers (#9708) 2022-05-30 16:12:10 +08:00
4af2493c42 [Improvement] optimize scannode concurrency query performance in vectorized engine. (#9792) 2022-05-30 16:04:40 +08:00
0683181fef [API changed](parser) Remove merge join syntax (#9795)
Remove merge join sql and merge join node
2022-05-30 09:04:21 +08:00
a96b41db7a [Improvement] Simplify expressions for _vconjunct_ctx_ptr (#9816) 2022-05-29 23:05:21 +08:00
Pxl
f33ef32d92 [Bug] [Bitmap] change to_bitmap to always_not_nullable (#9716) 2022-05-28 17:33:55 +08:00
cbbda7857b [feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541) 2022-05-26 21:39:12 +08:00
f4dd3bf013 [bugfix] fix memleak in olapscannode(#9736) 2022-05-26 15:06:54 +08:00
ca05d1ee01 [fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661)
1. Fix Lru Cache MemTracker consumption value is negative.
2. Fix compaction Cache MemTracker has no track.
3. Add USE_MEM_TRACKER compile option.
4. Make sure the malloc/free hook is not stopped at any time.
2022-05-25 08:56:17 +08:00
31e40191a8 [Refactor] add vpre_filter_expr for vectorized to improve performance (#9508) 2022-05-22 11:45:57 +08:00
61a60d1dcc [code style] minor update for code style (#9695) 2022-05-20 11:47:49 +08:00
8fa677b59c [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner (#9666)
* [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner
1. fix bug of vjson scanner not support `range_from_file_path`
2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different
3. fix bug of vparquest filter_block reference of column in not 1
4. refactor code to simple all the code

It only changed vectorized load, not original row based load.

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-20 11:43:03 +08:00
5fa6e892be [fix](broker-scan-node) Remove trailing spaces in broker_scanner. Make it consistent with hive and trino behavior. (#9190)
Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't.
This would cause different query result with hive.

Add a new session variable "trim_tailing_spaces_for_external_table_query".
If set to true, when reading csv from broker scan node, it will trim the tailing space of the column
2022-05-20 09:55:13 +08:00
ef65f484df [Enhancement] improve parquet reader via arrow's prefetch and multi thread (#9472)
* add ArrowReaderProperties to parquet::arrow::FileReader

* support perfecth batch
2022-05-19 23:52:01 +08:00