doris

Author	SHA1	Message	Date
starocean999	c4341d3d43	[fix](like)prevent null pointer by unimplemented like_vec functions (#12910 ) * [fix](like)prevent null pointer by unimplemented like_vec functions * fix pushed like predicate on dict encoded column bug	2022-09-27 10:02:10 +08:00
pengxiangyu	e040dccbec	[fix](remote)fix bug for delete s3 dir and list s3 dir (#12918 ) * fix bug for delete s3 dir and list s3 dir	2022-09-27 09:54:37 +08:00
Adonis Ling	72b909b5e8	[enhancement](workflow) Enable the shellcheck workflow to comment the PRs (#12633 ) > Due to the dangers inherent to automatic processing of PRs, GitHub’s standard pull_request workflow trigger by default prevents write permissions and secrets access to the target repository. However, in some scenarios such access is needed to properly process the PR. To this end the pull_request_target workflow trigger was introduced. According to the article [Keeping your GitHub Actions and workflows secure](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/) , the trigger condition in `shellcheck.yml` which is `pull_request` can't comment the PR due to the lack of write permissions of the workflow. Despite the `ShellCheck` workflow checkouts the source, but it doesn't build and test the source code. I think it is safe to change the trigger condition from `pull_request` to `pull_request_target` which can make the workflow have write permissions to comment the PR.	2022-09-27 09:08:12 +08:00
Xinyi Zou	b14b178928	[enhancement](memory) Trigger load channel flush based on process physical memory to avoid OOM #12960 When the physical memory of the process reaches 90% of the mem limit, trigger the load channel mgr to brush down The default value of be.conf mem_limit is changed from 90% to 80%, and stability is the priority. Fix deadlock in arena_locks in BufferPool::BufferAllocator::ScavengeBuffers and _lock in DebugString	2022-09-27 09:07:38 +08:00
TengJianPing	df9dcba6db	[regression-case](improve) improve regression test case (#12979 )	2022-09-27 08:53:53 +08:00
wxy	c4b6d4d839	[enhancement](AuditLoaderPlugin): add audit queue capacity configurat… (#12887 )	2022-09-27 08:50:30 +08:00
Pxl	12d6efa92b	[Bug](function) fix substr return null on row-based engine #12906	2022-09-27 08:47:32 +08:00
Xiaocc	5790d23624	[fix](transfer_thread) fix the loss of notification. (#12988 )	2022-09-27 08:44:02 +08:00
Pxl	8731eea26e	[Chore](clang) fix some build fail on clang15 (#12882 ) remove unused variables	2022-09-26 23:13:28 +08:00
zxealous	595a5337dc	fix doc typos (#12967 )	2022-09-26 20:11:26 +08:00
Shane	35076431ab	[fix](column)fix get_shrinked_column misspell (#12961 ) Fix misspell	2022-09-26 17:32:03 +08:00
shee	7977bebfed	[feature](Nereids) constant expression folding (#12151 )	2022-09-26 17:16:23 +08:00
DingGeGe	3902b2bfad	[refactor](fe-core src test catalog): refactor and replace use NIO #12818 (#12818 )	2022-09-26 16:51:46 +08:00
TengJianPing	1bb42a7bc0	[function](hash) add support of murmur_hash3_64 (#12923 )	2022-09-26 14:23:37 +08:00
Xinyi Zou	72220440dc	[fix](memtracker) Remove mem tracker record mem pool actual memory usage #12954 In order to avoid different mem tracker consumption values of multiple queries/loads, and the difference between the virtual memory of alloc and the physical memory actually increased by the process. The memory alloc in PODArray and mempool will not be recorded in the query/load mem tracker immediately, but will be gradually recorded in the mem tracker during the memory usage. But mem pool allocates memory from chunk allocator. If this chunk is used after the second time, it may have used physical memory. The above mechanism will cause the load channel memory statistics to be less than the actual value.	2022-09-26 12:54:06 +08:00
zy-kkk	9afa3cdb19	Optimized materialized view documentation (#12798 ) Optimized materialized view documentation	2022-09-26 12:25:20 +08:00
caoliang-web	18433d7105	Spark load import kerberos parameter modification (#12924 ) Spark load import kerberos parameter modification	2022-09-26 12:24:43 +08:00
minghong	c809a21993	[feature](nereids) extract single table expression for push down (#12894 ) TPCH q7, we have expression like (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') this expression implies (n1.n_name='FRANCE' or n1.n_name=''GERMANY) The implied expression is logical redundancy, but it could be used to reduce the output tuple number of scan(n1), if nereids pushes this expression down. This pr introduces a RULE to extract such expressions. NOTE: 1. we only extract expression on a single table. 2. if the extracted expression cannot be pushed down, e.g. it is on right table of left outer join, we need another rule to remove all the useless expressions.	2022-09-26 11:19:37 +08:00
luozenglin	0fcb93aae2	[fix](parquet) fix write error data as parquet format. (#12864 ) * [fix](parquet) fix write error data as parquet format. Fix incorrect data conversion when writing tiny int and small int data to parquet files in non-vectorized engine.	2022-09-26 10:41:17 +08:00
jiafeng.zhang	9c03deb150	[fix](log)Audit log status is incorrect (#12824 ) Audit log status is incorrect	2022-09-26 09:57:52 +08:00
zy-kkk	978dae267e	[typo](docs)Optimized string and date function doc (#12949 )	2022-09-26 09:26:12 +08:00
zy-kkk	91134cff61	[typo](docs)Optimized date function doc order and add partial function doc #12878	2022-09-26 09:25:11 +08:00
Tiewei Fang	acd5d67355	[feature-wip](new-scan)Add new odbc scanner and new odbc scan node (#12899 )	2022-09-26 09:24:25 +08:00
Jerry Hu	56fc00cb53	[chore](config) increase minimum thread num of some thread pool (#12917 ) Too small minimum thread num will cause additional overhead for creating and recycling threads.	2022-09-26 09:00:18 +08:00
Adonis Ling	32144ccda8	[Enhancement](debugging) Add more debug info for clang build (#12845 )	2022-09-26 08:50:12 +08:00
Yongqiang YANG	7f2ea35b63	[enhancement](test) add brown cases to p2 (#12694 )	2022-09-25 23:46:45 +08:00
Yongqiang YANG	60556070bb	[enhancement](test) add github events cases to p2 (#12696 )	2022-09-25 23:46:15 +08:00
Ashin Gau	692176ec07	[feature-wip](parquet-reader) pre read page data in advance to avoid frequent seek (#12898 ) 1. Fix the bug of file position in `HdfsFileReader` 2. Reserve enough buffer for `ColumnColumnReader` to read large continuous memory	2022-09-25 21:21:06 +08:00
Gabriel	380c3f42ab	[Refactor](datev2) Update comments for datev2/datetimev2 (#12823 )	2022-09-25 18:43:32 +08:00
Gabriel	f879a51ce9	[Improvement](dict) optimize dictionary column (#12852 )	2022-09-25 18:29:10 +08:00
Gabriel	d8e8bc0e69	[Improvement](predicate) Replace for-loop by memcpy (#12867 )	2022-09-25 18:27:59 +08:00
Shane	59699a4321	[feature](JSON datatype)Support JSON datatype (#10322 ) Add `JSON` datatype, following features are implemented by this PR: 1. `CREATE` tables with `JSON` type columns 2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE 3. `SELECT` JSON columns Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type) * add JSONB data storage format type * fix JsonLiteral resolve bug * add DataTypeJson case in data_type_factory * add JSON syntax check in FE * add operators for jsonb_document, currently not support comparison between any JSON type value * add ColumnJson and DataTypeJson * add JsonField to store JsonValue * add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral * add push_json for MysqlResultWriter * JSON column need no zone_map_index * Revert "JSON column need no zone_map_index" This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79. * add JSON writer and reader, ignore zone-map for JSON column * add json_to_string for DataTypeJson * add olap_data_convertor for JSON type * add some enum * add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions * fix column_json offsets overflow bug, format code * remove useless TODOs, add CmpType cases for JSON type * add license header * format license * format be codes * resolve rebase master conflicts * fix bugs for CREATE and meta related code * refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes * modification be codes along code review advice * fix rebase conflicts with master * add unit test for json_value and column_json * fix rebase error * rename json to jsonb * fix some data convert bugs, set Mysql type to JSON	2022-09-25 14:06:49 +08:00
zhannngchen	57d5f69814	[fix](load) print detailed error message (#12938 ) fix flush failure return message	2022-09-25 10:31:41 +08:00
starocean999	dd6ed5a9a7	[fix](function)fix string split function buffer overflow (#12834 )	2022-09-24 17:32:00 +08:00
Jibing-Li	f1a64ea09f	[fix](new-scan)Fix new scanner load job bugs (#12903 ) Fix bugs: 1. Fe need to send file format (e.g. parquet, orc ...) to be while processing load jobs using new scanner. 2. Try to get parquet file column type from SchemaElement.type before getting from Logical type and Converted type.	2022-09-24 17:21:19 +08:00
zhannngchen	3bb920ba54	[Enhancement](load) Refine the load channel flush policy on mem limit (#12716 ) 1. Remove single load channel mem limit, only use load channel mgr mem limit 2. Default load channel mgr mem limit from 50% to 80% 3. load channel mgr add soft mem limit. When the soft limit is exceeded, other threads will not hang, only current thread triggers flush 4. When exceed load channel mgr mem limit, find a load channel with the largest mem usage, continue to find a tablet channel with the largest mem usage, and try to flush 1/3 of the mem usage of this tablet channel.	2022-09-24 10:01:13 +08:00
yiguolei	7b230e41a8	[bugfix](scanner) olap scanner compute is wrong (#12857 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-09-24 09:59:59 +08:00
HappenLee	d65756b504	[Bug](bucket shuffle) fix error bucket shuffle join plan in two same table (#12930 )	2022-09-24 09:59:23 +08:00
Xinyi Zou	34d6d36ff5	fix transfer to tracker (#12932 ) ~MemTrackerLimiter() repeated consumption of _untracked_mem, resulting in inaccurate process mem tracker.	2022-09-24 09:01:05 +08:00
jiafeng.zhang	943814a86f	build extension docs failed fix (#12915 ) build extension docs fix	2022-09-23 21:58:02 +08:00
Jeffrey	1cb43b7f38	[fix](frontend) fix peerDependencies error (#12373 ) ```npm install``` problem with peer dependencies in the latest version of npm (v7+) Use ```npm install --legacy-peer-deps``` to fix it. Reference: https://blog.npmjs.org/post/626173315965468672/npm-v7-series-beta-release-and-semver-major	2022-09-23 21:54:52 +08:00
Yongqiang YANG	9dc35ab534	[fix](streamload) set coord for streamLoad (#12744 ) When a stream load is canceled, status is reported to coord.	2022-09-23 20:23:19 +08:00
jakevin	7f5970d62f	[fix](Nereids): add stats in plan. (#12790 ) * [improve](Nereids): add stats for bestPlan and correct fix selectivity	2022-09-23 19:26:49 +08:00
Ashin Gau	5bfdfac387	[feature-wip](parquet-reader) add parquet reader profile (#12797 ) Add profile for parquet reader. New counters: - ParquetFilteredGroups: Filtered row groups by `RowGroup` min-max statistics - ParquetReadGroups: The number of row groups to read - ParquetFilteredRowsByGroup: The number of filtered rows by `RowGroup` min-max statistics - ParquetFilteredRowsByPage: The number of filtered rows by page min-max statistics - ParquetFilteredBytes: The filtered bytes by `RowGroup` min-max statistics - ParquetReadBytes: The total bytes in `ParquetReadGroups`, may be further filtered If a page is skipped as a whole ## Result ``` ┌──────────────────────────────────────────────────────┐ │[0: VFILE_SCAN_NODE] │ │(Active: 1s29ms, non-child: 96.42) │ │ - Counters: │ │ - BytesRead: 0.00 │ │ - FileReadCalls: 1.826K (1826) │ │ - FileReadTime: 510.627ms │ │ - FileRemoteReadBytes: 65.23 MB │ │ - FileRemoteReadCalls: 1.146K (1146) │ │ - FileRemoteReadRate: 128.29331970214844 MB/sec │ │ - FileRemoteReadTime: 508.469ms │ │ - NumDiskAccess: 0 │ │ - NumScanners: 1 │ │ - ParquetFilteredBytes: 0.00 │ │ - ParquetFilteredGroups: 0 │ │ - ParquetFilteredRowsByGroup: 0 │ │ - ParquetFilteredRowsByPage: 6.600003M (6600003)│ │ - ParquetReadBytes: 2.13 GB │ │ - ParquetReadGroups: 20 │ │ - PeakMemoryUsage: 0.00 │ │ - PredicateFilteredRows: 3.399797M (3399797) │ │ - PredicateFilteredTime: 133.302ms │ │ - RowsRead: 3.399997M (3399997) │ │ - RowsReturned: 200 │ │ - RowsReturnedRate: 194 │ │ - TotalRawReadTime(*): 726.566ms │ │ - TotalReadThroughput: 0.0 /sec │ │ - WaitScannerTime: 1s27ms │ └──────────────────────────────────────────────────────┘ ```	2022-09-23 18:42:14 +08:00
HappenLee	f7e3ca29b5	[Opt](Vectorized) Support push down no grouping agg (#12803 ) Support push down no grouping agg	2022-09-23 18:29:54 +08:00
Yongqiang YANG	a7d42b5d81	[fix](streamload&sink) release and allocate memory in the same tracker (#12820 ) 1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker. 2. We can assume brpc allocate memory in doris thread. Above problems leads to wrong result of memtracker.	2022-09-23 17:51:44 +08:00
morrySnow	bd12a49baf	[feature](Nereids) enable bucket shuffle join on fragment without scan node (#12891 ) In the past, with legacy planner, we could only do bucket shuffle join on the join node belonging to the fragment with at least one scan node. But, bucket shuffle join should do on each join node that left child's data distribution satisfy join's demand. In nereids, we have data distribution info on each node. So we could enable bucket shuffle join on fragment without scan node.	2022-09-23 15:01:50 +08:00
morrySnow	c100d24116	[enhancement](Nereids) remove unnecessary ExchangeNode under AssertNumRowsNode (#12841 ) current, we always add exchange under AssertNumRowsNode. Nevertheless, if its child node's partition is unpartitioned, no need to add exchange at all.	2022-09-23 14:50:27 +08:00
ElvinWei	892e53a15b	[fix](test) fix a test failure problem after merging (#12902 )	2022-09-23 14:22:29 +08:00
ElvinWei	e28e30fe71	[Improvement](statistics) collect statistics in parallel and add test cases (#12839 ) This PR mainly improves some functions of the statistics module(#6370)： 1. when collecting partition statistics, filter empty partitions in advance and do not generate statistical tasks. 2. the old statistical update method may have problems when updating statistics in parallel, which has been solved. 3. optimize internal-query. 4. add test cases related to statistics. 5. modify some comments as prompted by CheckStyle.	2022-09-23 11:59:53 +08:00

1 2 3 4 5 ...

6478 Commits