doris

Author	SHA1	Message	Date
stdpain	a1bce25677	[BUG] Fix Memory Leak in SchemaChange And Fix some DCHECK error (#5491 )	2021-03-17 09:27:05 +08:00
xxiao2018	1100a0f3a0	[Profile] Add more timer for scan thread (#5511 ) 1. Add timer to count the time the transfer thread waits for the scaner thread to return rowbatch. 2. Add timer to count the time that the scanner thread waits for the available worker threads in the thread pool. Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2021-03-15 10:07:11 +08:00
HappenLee	689602e686	[Enhancement] Support Pallralel Merge In Exchange Node (#5468 ) Support Parallel Merge In Exchange Node	2021-03-11 22:34:18 +08:00
Yingchun Lai	0131c33966	[Enhance] Improve the readability of memtrackers' name (#5455 ) Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker	2021-03-11 22:33:31 +08:00
Zhengguo Yang	7a8fbe5db8	[internal] [doris-1084] support compressed csv file in stream load (#5463 )	2021-03-11 10:53:05 +08:00
Zhengguo Yang	e023ef5404	[Load] Support multi bytes LineDelimiter and ColumnSeparator (#5462 ) * [Internal][Support Multibytes Separator] doris-1079 support multi bytes LineDelimiter and ColumnSeparator	2021-03-09 09:35:39 +08:00
Lijia Liu	805f98e0f9	[Bug] Set dest tuple to null when src_tuple is NULL. (#5431 )	2021-03-04 22:26:05 +08:00
HappenLee	4e1b6b3eef	[ODBC] Let the type conversion of the fail in query in ODBC of MySQL table to prompt the information of the column (#5422 ) Let the type conversion of the fail in query in ODBC of MySQL table to prompt the information of the column	2021-03-04 22:23:37 +08:00
Hao Tan	6dcc1b0a55	[Doris on ES] Fix query failed when ES field value is null (#5363 ) * Update fe-idea-dev.md use `brew install thrift@0.9` to install thrift 0.9.3.1 `brew edit thrift090 \| head` shows thrift@0.9 uses thrift 0.9.3.1 * [Refactor] Remove the unnecessary if statement Future<?> submit(Runnable task) Submits a Runnable task for execution and returns a Future representing that task. The Future's get method will return null upon successful completion. * Fix null type * add comment Co-authored-by: tanhao <tanhao.0902@bytedance.com>	2021-02-23 10:42:25 +08:00
Zhengguo Yang	6ede4c6ec1	[Feature] Support backup,restore,load,export directly connect to s3 (#5399 ) * [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol * Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3 1. Support load and export data from/to s3 directly. 2. Add a config to auto convert broker access to s3 acces when available Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7 (cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201) * [Internal][S3DirectAccess] File path glob compatible with broker Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68 (cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f) * [internal] [doris-1008] fix log4j class not found Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3 (cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232) * add poms Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>	2021-02-22 16:07:56 +08:00
stdpain	7eae3e280a	[optimization] use inline optimize ExprContext::get_value (#5385 )	2021-02-16 22:35:14 +08:00
Mingyu Chen	51ccd44865	[Load Parallel][3/3] Support parallel delta writer (#5369 ) In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel, and because of the lock granularity problem, LoadChannel could only process these requests serially, which made it impossible to make full use of cluster resources. This CL modifies the related locks so that LoadChannel can process these requests in parallel. In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min. Also modify the profile of load job.	2021-02-07 22:42:18 +08:00
HappenLee	462efeaf39	[Performance Optimization and Refactor] (#5358 ) (#5364 ) 1. Add BlockColumnPredicate support OR and AND column predicate in RowBlockV2 2. Support evaluate vectorization delete predicate in storage engine not in Reader in SegmentV2	2021-02-07 22:41:33 +08:00
Mingyu Chen	6b0521032d	[Bug] Fix the problem of floating point precision when importing parquet data (#5360 ) The double data "4206.9" in parquet is converted to decimal data "4206.8999" in Doris, which is not right.	2021-02-07 22:40:51 +08:00
Mingyu Chen	8ad50bf745	[Bug] Fix bug that BE core will loading empty json array (#5349 ) When loading json data like `[]` (an empty array). BE will crash with stack: ``` * Aborted at 1612273824 (unix time) try "date -d @1612273824" if you are using GNU date * PC: @ 0xe0cce7 rapidjson::GenericValue<>::Accept<>() * SIGSEGV (@0xe) received by PID 36798 (TID 0x7f7812114700) from PID 14; stack trace: * @ 0x7f791b74b470 (unknown) @ 0xe0cce7 rapidjson::GenericValue<>::Accept<>() @ 0x169ff79 _ZN5doris10JsonReader17_print_json_valueB5cxx11ERKN9rapidjson12GenericValueINS1_4UTF8IcEENS1_19MemoryPoolAllocatorINS1_12CrtAllocatorEEEEE @ 0x16a0689 doris::JsonReader::_write_values_by_jsonpath() @ 0x16a2cb4 doris::JsonReader::_handle_flat_array_complex_json() @ 0x16a3761 doris::JsonScanner::get_next() @ 0x1659bd4 doris::BrokerScanNode::scanner_scan() @ 0x165a671 doris::BrokerScanNode::scanner_worker() @ 0x281f67f execute_native_thread_routine @ 0x7f791b5001c3 start_thread @ 0x7f791b7fd12d __clone ```	2021-02-07 22:38:15 +08:00
Mingyu Chen	780900ac9c	[Feature] Support preceding filter original data when loading (#5338 ) Support conditional filtering of original data in broker load and routine load eg: ``` LOAD LABEL `label1` ( DATA INFILE ('bos://cmy-repo/1.csv') INTO TABLE tbl2 COLUMNS TERMINATED BY '\t' (event_day, product_id, ocpc_stage, user_id) SET ( ocpc_stage = ocpc_stage + 100 ) PRECEDING FILTER user_id = 1381035 WHERE ocpc_stage > 30 ) ... ```	2021-02-07 22:37:48 +08:00
stdpain	a841905184	[optimization] use replace top instead of push pop in priority #5312 (#5313 )	2021-02-04 09:21:54 +08:00
Mingyu Chen	cd96ded1ad	[Bugs] Fix bugs that FE heartbeat api of httpv2 does not return version info (#5306 ) Co-authored-by: morningman <chenmingyu@baidu.com>	2021-01-30 20:34:33 +08:00
stdpain	bf0cb78b67	[optimization] avoid extra memory copy while build hash table (#5301 ) avoid extra memory copy while build hash table	2021-01-30 20:32:12 +08:00
HappenLee	a5298d617d	[Performance Improve] Push Down _conjunctf of 'not in' and '!=' to Storage Engine. (#5207 )	2021-01-23 21:07:01 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
Yingchun Lai	58e58c94d8	[TSAN] Fix tsan bugs (part 1) (#5162 ) ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread problems, such as data race, mutex problems, etc. We should detect TSAN problems for Doris BE, both unit tests and server should pass through TSAN mode, to make Doris more robustness. This is the very beginning patch to fix TSAN problems, and some difficult problems are suppressed in file 'tsan_suppressions', you can suppress these problems by setting: export TSAN_OPTIONS="suppressions=tsan_suppressions" before running: `BUILD_TYPE=tsan ./run-be-ut.sh --run`	2021-01-15 09:45:11 +08:00
HuangWei	5d6a1a7290	[Load] support ignoring eovercrowded when tablet sink (#5156 ) If adding the ignore_eovercrowded flag, the `PTabletWriterAddBatchRequest` won't failed on `EOVERCROWDED` to avoid load jobs failed in this error. It only effects the NodeChannel(the load job), other rpc requests will still check if overcrowded.	2021-01-09 23:40:51 +08:00
HuangWei	17d939b789	[Bug] Fix scanner threads heap-use-after-free (#5111 ) Scanner threads may be running and using the member vars of OlapScanNode, when the OlapScanNode has already destroyed. We can use `_running_thread` to be the last accessed member variable. And `transfer_thread` need to wait for `_running_thread==0`. After `transfer_thread` joined, `OlapScanNode::close()` can continue.	2021-01-04 09:28:51 +08:00
HappenLee	9e19b6b133	[Performance Improve] Push Down _conjunct of 'A is NULL' and 'B is not NULL' to Storage Engine. (#5092 ) This patch mainly do the following: - Support #5086 - Refactor ColumnRangeValue to support contain null	2021-01-03 15:45:07 +08:00
Zhengguo Yang	279ae1cb75	Add fuzzy_parse option to speed up json import (#5114 ) add a flag of fuzzy_parse, if the json file all object keys are the same and has same order, we only need to parse the first row, and then use index instead key to parse value	2020-12-25 09:19:42 +08:00
Mingyu Chen	c57145b4c2	[Bug] Fix bug that routine load may lost some data (#5093 ) In the previous implementation, whether a subtask is in commit or abort state, we will try to update the job progress, such as the consumed offset of kafka. Under normal circumstances, the aborted transaction does not consume any data, and all progress is 0, so even we update the progress, the progress will remain unchanged. However, in the case of high cluster load, the subtask may fail half of the execution on the BE side. At this time, although the task is aborted, part of the progress is updated. Cause the next subtask to skip these data for consumption, resulting in data loss.	2020-12-23 09:33:52 +08:00
stdpain	6afa14cda7	[Bug] Fix Memory Leak in Json Load (#5073 ) fix json load memory leak #5069	2020-12-15 22:55:47 +08:00
Mingyu Chen	81c7c0360e	[Bug] Fix a core dump of counter in BE (#5078 ) Introduced by PR #5051. As @liutang123 said, when PlanFragmentExecutor is destructed, it will call `close -> ExecNode::close -> OlapScanNode::close`. OlapScanNode will wait for `_transfer_thread`. `_transfer_thread` will wait for all OlapScanner processing to complete. OlapScanner is processed by the scanner thread. When the last scanner processing is completed, `_transfer_thread` will break out of the loop, and PlanFragmentExecutor will continue to destruct. And if it is completed, its RuntimeProfile::Counter will also be destructed. At this time, the ScopedTimer in the Scan thread may still use this Counter when it is destructed. So we must make sure that the timer is deconstructed before deconstructing the runtime profile.	2020-12-15 09:33:38 +08:00
HappenLee	0a0e46fd53	[Bug] Fix the bug of where condition a in ('A', 'B', 'V') and a in ('A') return error result (#5072 ) And Refactor ColumnRangeValue and OlapScanNode This patch mainly do the following: - Fix issue #5071 - Change type_min in ColumnRangeValue as static - Add Class of type_limit make code clear - Refactor the function of normalize_in_and_eq_predicate	2020-12-15 09:29:10 +08:00
Mingyu Chen	90e7f7005e	[Bug] Fix bug that query multi mysql external table with union will get incomplete result (#5067 ) The `eos` flag should be reset to false after opening next child of union node.	2020-12-15 09:28:39 +08:00
Zhengguo Yang	193db4207e	[enhancement]improve performance of json load (#5055 ) * imporve performance of json load	2020-12-15 09:27:51 +08:00
Lijia Liu	ff4bd1223f	[Profile] Add cpu time cost in query audit (#5051 )	2020-12-13 22:22:15 +08:00
HappenLee	115d4332aa	[ODBC] Support ODBC Sink for insert into data to ODBC external table (#5033 ) issue:#5031 1. Support ODBC Sink for insert into data to ODBC external table. 2. Support Transaction for ODBC sink to make sure insert into data is atomicital. 3. The document about ODBC sink has been modified	2020-12-13 21:53:27 +08:00
Mingyu Chen	ca9e5c4785	[Bug] Add a flag to prevent repeated close operation of OlapTabletSink (#5034 ) The close method of OlapTabletSink may be called twice. In the open_internal() method of plan_fragment_executor, close is called once. If an error occurs in this call, it will be called again in fragment_mgr. So here we use a flag to prevent repeated close operations. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-12-09 09:30:09 +08:00
Zhengguo Yang	b9dabc3b5b	[Enhance] Push down predicate on value column of unique table to base rowset (#5022 )	2020-12-06 08:50:37 +08:00
HappenLee	6021d6fc7f	[Performance Optimization] Remove push down conjuncts in olap scan node (#4999 ) Push conjunct to Storage Engine as more as possible olap scan node do not need filter data use push down conjuncts again. fix #4986	2020-12-06 08:50:08 +08:00
HappenLee	b954dfd82d	[Bug] Fix the bug of Largetint and Decimal json load failed. (#4983 ) Use param of json load "num_as_string" to use flag kParseNumbersAsStringsFlag to parse json data.	2020-12-06 08:49:30 +08:00
songchuangyuan	1ae6de7117	[Enhance] Add "statistics" meta table and fix some mysql compatibility problem (#4991 ) 1. Add metadata table 'statistics' to store index information; 2. In the header information returned by mysql, the data type length is returned according to the actual type.	2020-12-03 09:38:18 +08:00
Yunfeng,Wu	af06adb57f	[Doris On ES][Bug-fix] fix boolean predicate pushdown manner (#4990 ) Correct handling `boolean` field predicate through set the predicate value to `true`、`false` or `empty set` for DOE	2020-12-02 10:13:13 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
xinghuayu007	2331ce10f1	[Bug]Parquet map/list/struct structure recognize (#4968 ) When a parquet file contains a `Map/List/Struct` structure, Doris can not recognize the column correctly, and throws exception 'Invalid column: xxxx', that means Doris can not find the column. The `Map` structure will be recognized into two columns: `key and value`. The follow is the schema of a parquet file recognized by Doris. This patch tries to solve this problem.	2020-11-28 09:56:29 +08:00
xinghuayu007	cb749ce51d	[Improvement] Add parquet file name to the error message (#4954 ) When a user tries to load parquet file into Doris, like this path: `hdfs://hadoop/user/data/date=20201024/*`, but acturally the path contains some none parquet files，the error is throwed `Couldn't deserialize thrift: No more data to read.\\nDeserializing page header failed.`. If the error message includes the file name information, we can quickly locate the errors. Therefore, this patch try to add the file name to the error message.	2020-11-28 09:54:18 +08:00
sduzh	10e1e29711	Remove header file common/names.h (#4945 )	2020-11-26 17:00:48 +08:00
weizuo93	6247408689	[Compact]Take tablet scan frequency into consider when selecting tablet for compaction (#4837 ) A large number of small segment files will lead to low efficiency for scan operations. Multiple small files can be merged into a large file by compaction operation. So we could take the tablet scan frequency into consideration when selecting an tablet for compaction and preferentially do compaction for those tablets which are scanned frequently during a latest period of time at the present. Using the compaction strategy of Kudu for reference, scan frequency can be calculated for tablet during a latest period of time and be taken into consideration when calculating compaction score.	2020-11-18 21:51:12 +08:00
luozenglin	448df42fb0	[Compatibility] Add table_privileges, schema_privileges and user_privileges tables(#4899 ) Add privileges tables in information_schema database	2020-11-16 21:58:30 +08:00
Xinyi Zou	e706a6bca4	[Doc] Running Profile document add HASH_JOIN_NODE, etc. (#4878 ) - Running Profile document add `HASH_JOIN_NODE`, `CROSS_JOIN_NODE`, `UNION_NODE`, `ANALYTIC_EVAL_NODE`. - `UNION_NODE` increase`MaterializeExprsEvaluateTime` profile.	2020-11-16 21:53:25 +08:00
HangyuanLiu	18a22bd347	[BUG] Fix field error in information_schema.columns (#4858 )	2020-11-15 22:01:32 +08:00
Mingyu Chen	e9923100f2	[Profile][UT] Fix UT and remove useless profile (#4879 ) Fix UT failed by #4825 and remove useless profile	2020-11-12 16:28:57 +08:00
Xinyi Zou	66132d2836	[Feature] Running Profile OLAP_SCAN_NODE layering and enhance readability (#4825 ) mainly includes: - `OLAP_SCAN_NODE` profile layering: `OLAP_SCAN_NODE`,`OlapScanner`, and `SegmentIterator`. - Delete meaningless statistical values. mainly in scan_node.cpp. - Increase `RowsConditionsFiltered` statistical, split from `RowsDelFiltered`, the meaning is the number of rows filtered by various column indexes, only in segment V2. - Modify the document based on the above, and enhance readability.	2020-11-11 21:21:25 +08:00

1 2 3 4 5 ...

336 Commits