Commit Graph

546 Commits

Author SHA1 Message Date
0683181fef [API changed](parser) Remove merge join syntax (#9795)
Remove merge join sql and merge join node
2022-05-30 09:04:21 +08:00
a96b41db7a [Improvement] Simplify expressions for _vconjunct_ctx_ptr (#9816) 2022-05-29 23:05:21 +08:00
Pxl
f33ef32d92 [Bug] [Bitmap] change to_bitmap to always_not_nullable (#9716) 2022-05-28 17:33:55 +08:00
cbbda7857b [feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541) 2022-05-26 21:39:12 +08:00
f4dd3bf013 [bugfix] fix memleak in olapscannode(#9736) 2022-05-26 15:06:54 +08:00
ca05d1ee01 [fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661)
1. Fix Lru Cache MemTracker consumption value is negative.
2. Fix compaction Cache MemTracker has no track.
3. Add USE_MEM_TRACKER compile option.
4. Make sure the malloc/free hook is not stopped at any time.
2022-05-25 08:56:17 +08:00
31e40191a8 [Refactor] add vpre_filter_expr for vectorized to improve performance (#9508) 2022-05-22 11:45:57 +08:00
61a60d1dcc [code style] minor update for code style (#9695) 2022-05-20 11:47:49 +08:00
8fa677b59c [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner (#9666)
* [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner
1. fix bug of vjson scanner not support `range_from_file_path`
2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different
3. fix bug of vparquest filter_block reference of column in not 1
4. refactor code to simple all the code

It only changed vectorized load, not original row based load.

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-20 11:43:03 +08:00
5fa6e892be [fix](broker-scan-node) Remove trailing spaces in broker_scanner. Make it consistent with hive and trino behavior. (#9190)
Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't.
This would cause different query result with hive.

Add a new session variable "trim_tailing_spaces_for_external_table_query".
If set to true, when reading csv from broker scan node, it will trim the tailing space of the column
2022-05-20 09:55:13 +08:00
ef65f484df [Enhancement] improve parquet reader via arrow's prefetch and multi thread (#9472)
* add ArrowReaderProperties to parquet::arrow::FileReader

* support perfecth batch
2022-05-19 23:52:01 +08:00
73c4ec7167 Fix some typos in be/. (#9681) 2022-05-19 20:55:39 +08:00
Pxl
26353ba8b5 [clang build]fix clang compile error (#9615) 2022-05-18 07:42:31 +08:00
bee5c2f8aa [feature-wip](parquet-vec) Support parquet scanner in vectorized engine (#9433) 2022-05-17 09:37:17 +08:00
cd105bee0a [refactor](es) Clean es tcp scannode and related thrift definitions (#9553)
PaloExternalSourcesService is designed for es_scan_node using tcp protocol.
But es tcp protocol need deploy a tcp jar into es code. Both es version and lucene version are upgraded,
and the tcp jar is not maintained any more.

So that I remove all the related code and thrift definitions.
2022-05-14 10:03:55 +08:00
b817efd652 [feature] add vectorized vjson_scanner (#9311)
This pr is used to add the vectorized vjson_scanner, which can support vectorized json import in stream load flow.
2022-05-14 09:50:05 +08:00
375c1bf5c0 [feature](mysql-table) support utf8mb4 for mysql external table (#9402)
This patch supports utf8mb4 for mysql external table.

if someone needs a mysql external table with utf8mb4 charset, but only support charset utf8 right now.

When create mysql external table, it can add an optional propertiy "charset" which can set character fom mysql connection, 
default value is "utf8". You can set "utf8mb4" instead of "utf8" when you need.
2022-05-11 09:39:23 +08:00
718a51a388 [refactor][style] Use clang-format to sort includes (#9483) 2022-05-10 21:25:35 +08:00
b34ed43ec9 [feature-wip] (memory tracker) (step6, End) Fix some details (#9301)
1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track.
2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability.
3. More powerful MemTracker debugging capabilities.
4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%.
5. Fix some other details.
2022-05-10 18:17:09 +08:00
e61d296486 [Refactor] Replace '#ifndef' with '#pragma once' (#9456)
* Replace '#ifndef' with '#pragma once'
2022-05-10 09:25:59 +08:00
eec1dfde3a [feature] (vec) instead of converting line to src tuple for stream load in vectorized. (#9314)
Co-authored-by: xiepengcheng01 <xiepengcheng01@xafj-palo-rpm64.xafj.baidu.com>
2022-05-09 11:24:07 +08:00
6834fb23ca [fix](s3) fix s3 Temp file may write failed because of has no space on disk (#9421) 2022-05-09 09:28:43 +08:00
a9831f87f2 [refactor]refactor lazy materialized (#8834)
[refactor]refactor lazy materialized (#8834)
2022-05-06 19:16:35 +08:00
832338c55e [improvement] set name for scanner threads and fix compile error in clang (#9336) 2022-05-05 09:53:43 +08:00
Pxl
b1870faddd [Bug] [Build] fix clang build fail (#9323)
* fix clang compile fail
2022-05-02 18:04:57 +08:00
Pxl
d2374dbd5e [fix](Lateral-View) fix outer combinator not work on non-vectorized (#9212) 2022-05-01 22:09:50 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
d330bc3806 [Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709) (#9280)
Implement vectorized stream load.
Added fe configuration option `enable_vectorized_load` to enable vectorized stream load.

    Co-authored-by: tengjp@outlook.com
    Co-authored-by: mrhhsg@gmail.com
    Co-authored-by: minghong.zhou@163.com
    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
2022-04-29 09:50:51 +08:00
26bc462e1c [feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query (#9145)
1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: 731730da85/docs/en/server.md (bthread-local)

2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
2022-04-27 20:34:02 +08:00
47a59c7fe6 [fix](OlapScanner)fix bitmap or hll's OOM when loading too many unqualified data (#9205) 2022-04-26 10:25:56 +08:00
7cfebd05fd [fix](hierarchical-storage) Fix bug that storage medium property change back to SSD (#9158)
1. fix bug described in #9159
2. fix a `fill_tuple` bug introduced from #9173
2022-04-26 10:15:19 +08:00
c3d0fee01b [fix](broker load) sync the workflow of BrokerScanner to other Scanner to avoid oom (#9173) 2022-04-25 10:01:42 +08:00
ae680b4248 [UDF] support RPC udaf part 1: support create RPC udaf in fe (#8510) 2022-04-21 17:38:58 +08:00
869fdff2f0 [refactor] add reference path for source file from impala (#9115)
According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.
2022-04-20 12:29:57 +08:00
51db4e54c0 [fix](table-function) Fix bug of table function with outer join cause nullptr of tuple (#9041) 2022-04-18 19:35:26 +08:00
f3dce9a6c1 [fix](planner) fix is-null predicate in where statement cannot be pushed down to the storage layer (#9035) 2022-04-18 19:35:02 +08:00
afce993ca7 [feature](load)(csv) CSV import and export support header (#8765)
- Add two new types to stream load boker load: **csv_with_names** and **csv_with_name_sand_types**
- Add two new types to export: **csv_with_names** and **csv_with_names_and_types**
2022-04-18 15:29:18 +08:00
9051ed7c7d Revert "[Refactor] remove some useless code (#8976)" (#9074)
This reverts commit de7dce4df84fcbfbbaf715cbac151e802321f80f.
Reverts apache/incubator-doris#8976
This cause BE ut failed: sh run-be-ut.sh --run --filter OlapTableSinkTest.*

```
==62008==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7ffff36867c0 in thread T0
```
2022-04-18 12:01:14 +08:00
de7dce4df8 [Refactor] remove some useless code (#8976) 2022-04-18 09:55:54 +08:00
5e95d99925 [fix](load) fix bug of infinite loop in orc scanner (#9007)
When encounter unqualified data, orc scanner may not be able
to quit correctly.
2022-04-14 11:46:48 +08:00
e5e0dc421d [refactor] Change ALL OLAPStatus to Status (#8855)
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
2022-04-14 11:43:49 +08:00
8765881d8b [fix](load) wait _send_batch_thread_pool_token rather than shutdown. (#8970)
We can not shutdown _send_batch_thread_pool_token, because _packet_in_flight
has to be clear finally. Otherwise a never ended join on rpc would happen.

It is difficult to handle concurrent problem if a flag setter is not guaranteed to run.
2022-04-14 10:05:14 +08:00
5881e8fdc6 [refactor] use c++ 14 deprecated instaed of comment, this detect usage of deprecated var or func at compile time (#8439) 2022-04-13 10:19:04 +08:00
290366787c [refactor] refactor code, replace some file with stl libs (#8759)
1. replace ConditionVariables with std::condition_variable
2. repalace Mutex with std::mutex
3. repalce MonoTime with std::chrono
2022-04-13 09:55:29 +08:00
66d2f4e1fd [fix][mem tracker] Fix MemTracker null pointer in vectorized (#8925)
Fix ThreadMemTrackerMgr::update_tracker null pointer and some details.

Issue Number: close #8920
2022-04-12 10:17:10 +08:00
6ed59bb98b [refactor](code_style) remove useless inline #8933
1.Member functions defined in a class are inline by default (implicitly), and do not need to be added
2.inline is a keyword used for implementation, which has no effect when placed before the function declaration
2022-04-10 18:29:55 +08:00
519305cb22 [feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669)
Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
2022-04-08 09:02:26 +08:00
d51545a952 [fix](ut)(memory-leak) Fix be asan ut failed and hdfs file reader memory leak (#8905) 2022-04-08 00:07:00 +08:00
98cab78320 [refactor](schema_hash) remove schema_hash since every tablet id in be is unique (#8574) 2022-04-07 08:37:45 +08:00
f90a1a1919 [fix](ut)(compile) Fix ut failure at functions_geo and compilation bug (#8843) 2022-04-05 21:30:40 +08:00