doris

Author	SHA1	Message	Date
slothever	b7fc17da68	[feature-wip](multi-catalog)(step2)support read max compute data by JNI (#19819 ) Issue Number: #19679	2023-06-05 22:10:08 +08:00
lihangyu	e22f5891d2	[WIP](row store) two phase opt read row store (#18654 )	2023-05-16 13:21:58 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
yiguolei	9477c48ef8	[refactor](functioncontext) remove duplicate type definition in function context (#17421 ) remove duplicate type definition in function context remove unused method in function context not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned. remove useless slot_size in all tuple or slot descriptor. remove doris_udf namespace, it is useless. remove some unused macro definitions. init v_conjuncts in vscanner, not need write the same code in every scanner. using unique ptr to manage function context since it could only belong to a single expr context. Issue Number: close #xxx --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-06 16:07:09 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
lihangyu	116e17428b	[Enhancement](point query optimize) improve performace of point query on primary keys (#15491 ) 1. support row format using codec of jsonb 2. short path optimize for point query 3. support prepared statement for point query 4. support mysql binary format	2023-01-20 13:33:01 +08:00
lihangyu	3894de49d2	[Enhancement](topn) support two phase read for topn query (#15642 ) This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`. TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase: 1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode. 2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine. After the second phase read, Block will contain all the data needed for the query	2023-01-19 10:01:33 +08:00
zhangstar333	42bdde8750	[Feature](Vectorized) support jdbc scan node (#12010 )	2022-09-07 10:29:41 +08:00
AlexYue	50ef6e35be	[enhancement](RowDescriptor) enhance tuple_idx check during runtime (#11835 )	2022-08-17 17:50:48 +08:00
Jibing-Li	9b9ed1aef1	[data lake](arrow scanner)Fix file arrow scanner column index out of range core. (#11691 )	2022-08-12 11:34:29 +08:00
Lightman	486cf0ebd4	[Feature] Lightweight schema change of add/drop column (#10136 ) * [Schema Change] support fast add/drop column (#49) * [feature](schema-change) support fast schema change. coauthor: yixiutt * [schema change] Using columns desc from fe to read data. coauthor: Lchangliang * [feature](schema change) schema change optimize for add/drop columns. 1.add uniqueId field for class column. 2.schema change for add/drop columns directly update schema meta Co-authored-by: yixiutt <yixiu@selectdb.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com> [Feature](schema change) fix write and add regression test (#69) Co-authored-by: yixiutt <yixiu@selectdb.com> [schema change] be ssupport that delete use newest schema add delete regression test fix regression case (#107) tmp [feature](schema change) light schema change exclude rollup and agg/uniq/dup key type. [feature](schema change) fe olapTable maxUniqueId write in disk. [feature](schema change) add rpc iface for sc add column. [feature](schema change) add columnsDesc to TPushReq for ligtht sc. resolve the deadlock when schema change (#124) fix columns from fe don't has bitmap_index flag (#134) add update/delete case construct MATERIALIZED schema from origin schema when insert fix not vectorized compaction coredump use segment cache choose newest schema by schema version when compaction (#182) [bugfix](schema change) fix ligth schema change problem. [feature](schema change) light schema change add alter job. (#1) fix be ut [bug] (schema change) unique drop key column should not light schema change [feature](schema change) add schema change regression-test. fix regression test [bugfix](schema change) fix multi alter clauses for light schema change. (#2) [bugfix](schema change) fix multi clauses calculate column unique id (#3) modify PushTask process (#217) [Bugfix](schema change) fix jobId replay cause bdbje exception. [bug](schema change) fix max col unique id repeatitive. (#232) [optimize](schema change) modify pendingMaxColUniqueId generate rule. fix compaction error * fix be ut * fix snapshot load core fix unique_id error (#278) [refact](fe) remove redundant code for light schema change. (#4) [refact](fe) remove redundant code for light schema change. (#4) format fe core format be core fix be ut modify fe meta version fix rebase error flush schema into rowset_meta in old table [refactor](schema change) refact fe light schema change. (#5) delete the change of schemahash and support get max version schema * modify for review * fix be ut * fix schema change test	2022-07-12 19:41:06 +08:00
Pxl	5805f8077f	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003 )	2022-06-16 10:50:08 +08:00
xueweizhang	375c1bf5c0	[feature](mysql-table) support utf8mb4 for mysql external table (#9402 ) This patch supports utf8mb4 for mysql external table. if someone needs a mysql external table with utf8mb4 charset, but only support charset utf8 right now. When create mysql external table, it can add an optional propertiy "charset" which can set character fom mysql connection, default value is "utf8". You can set "utf8mb4" instead of "utf8" when you need.	2022-05-11 09:39:23 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
Mingyu Chen	869fdff2f0	[refactor] add reference path for source file from impala (#9115 ) According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.	2022-04-20 12:29:57 +08:00
chenlinzhong	afce993ca7	[feature](load)(csv) CSV import and export support header (#8765 ) - Add two new types to stream load boker load: csv_with_names and csv_with_name_sand_types - Add two new types to export: csv_with_names and csv_with_names_and_types	2022-04-18 15:29:18 +08:00
hongbin	c71ffc01de	[Refactor] Cleanup some unused include (#9063 )	2022-04-18 09:52:31 +08:00
dataroaring	2c774f5c79	[ubsan] avoid null bit offset to be 255 (#8675 ) For now, invalid null bit offset is -1, but bit_offset in NullIndicator is be of uint8_t, so invalid null bit offset would be 255. Ubsan detects it.	2022-03-31 22:58:51 +08:00
camby	a498463ab5	[feature-wip](array-type)support select ARRAY data type on vectorized engine (#8217 ) (#8584 ) Usage Example: 1. create table for test; ``` `CREATE TABLE `array_test` ( `k1` tinyint(4) NOT NULL COMMENT "", `k2` smallint(6) NULL COMMENT "", `k3` ARRAY<int(11)> NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`k1`) COMMENT "OLAP" DISTRIBUTED BY HASH(`k1`) BUCKETS 5 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" );` ``` 2. insert some data ``` `insert into array_test values(1, 2, [1, 2]);` `insert into array_test values(2, 3, null);` `insert into array_test values(3, null, null);` `insert into array_test values(4, null, []);` ``` 3. open vectorized `set enable_vectorized_engine=true;` 4. query array data `select * from array_test;` +------+------+--------+ \| k1 \| k2 \| k3 \| +------+------+--------+ \| 4 \| NULL \| [] \| \| 2 \| 3 \| NULL \| \| 1 \| 2 \| [1, 2] \| \| 3 \| NULL \| NULL \| +------+------+--------+ 4 rows in set (0.061 sec) Code Changes include： 1. add column_array, data_type_array codes; 2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory; 3. support create data_type for ARRAY date type; 4. RowBlockV2::convert_to_vec_block support ARRAY date type; 5. VMysqlResultWriter::append_block support ARRAY date type; 6. vectorized::Block serialize and deserialize support ARRAY date type;	2022-03-22 15:21:44 +08:00
qiye	87b96cfcd6	[feature](iceberg) Step3: Support query iceberg external table (#8179 ) 1. Add Iceberg scan node 2. Add Iceberg/Hive table type in thrift 3. Support querying Iceberg tables of format types `parquet` and `orc`	2022-02-26 17:04:11 +08:00
Zhengguo Yang	50864aca7d	[refactor] fix warings when compile with clang (#8069 )	2022-02-19 11:29:02 +08:00
EmmyMiao87	1d900d8605	(fix)[planner] Fix the right tuple ids in empty set node (#7931 ) The tuple ids of the empty set node must be exactly the same as the tuple ids of the origin root node. In the issue, we found that once the tree where the root node is located has a window function, the tuple ids of the empty set node cannot be calculated correctly. This pr mostly fixes the problem. In order to calculate the correct tuple ids, the tuple ids obtained from the SelectStmt.getMaterializedTupleIds() function in the past are changed to directly use the tuple ids of the origin root node. Although we tried to fix #7929 by modifying the SelectStmt.getMaterializedTupleIds() function, this method can't get the tuple of the last correct window function. So we use other ways to construct tupleids of empty nodes.	2022-01-29 09:46:05 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
HappenLee	9216735cfa	[New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329 ) 1. FE vectorized plan code 2. Function register vec function 3. Diff function nullable type 4. New thirdparty code and new thrift struct	2021-08-11 14:54:06 +08:00
HappenLee	6597a338dc	[Feature] Support config max length of zone map index (#6293 )	2021-07-30 09:23:11 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
HappenLee	a61d0de173	[ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs (#4438 )	2020-09-25 10:19:50 +08:00
HappenLee	e6059341e8	[Spill To Disk][2/6] Add some runningtime function and change some function will be called in the future (#4152 )	2020-07-30 14:21:48 +08:00
lichaoyong	e20d905d70	Remove unused KUDU codes (#3175 ) KUDU table is no longer supported long time ago. Remove code related to it.	2020-03-24 13:54:05 +08:00
trueeyu	a340bc7a00	Remove unused LLVM related codes of directory:be/src/runtime (#2910 ) (#2985 ) Remove unused LLVM related codes of directory (step 4):be/src/runtime (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/runtime	2020-02-25 13:47:20 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
yiguolei	1f092bb9fb	Add EsTableDescriptor in be (#775 )	2019-03-19 19:33:54 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00
morningman	2868793b6b	Change license to Apache License 2.0 (#262 )	2018-11-01 09:06:01 +08:00
morningman	051aced48d	Missing many files in last commit In last commit, a lot of files has been missed	2018-10-31 16:19:21 +08:00
morningman	2419384e8a	push 3.3.19 to github (#193 ) * push 3.3.19 to github * merge to 20ed420122a8283200aa37b0a6179b6a571d2837	2018-05-15 20:38:22 +08:00
chenhao7253886	756f9bdb6c	exchange check error when parent child is union (#158 )	2017-12-18 11:20:39 +08:00
cyongli	e2311f656e	baidu palo	2017-08-11 17:51:21 +08:00

40 Commits