doris

Author	SHA1	Message	Date
Lightman	b5531c5caf	[BugFix](BE) fix condition index doesn't match (#11474 ) * [BugFix](Be) fix condition index doesn't match	2022-08-05 07:57:18 +08:00
HappenLee	77d82bb292	[Bug](MaterializedView) Fix bug of light schema change do not set right unique id cause MV coredump (#11396 ) Fix bug of light schema change do not set right unique id cause MV coredump	2022-08-03 11:21:28 +08:00
Lightman	b35daf0a04	[improvement](light-schema-change) Support tablet schema cache (#11131 )	2022-08-01 12:18:00 +08:00
Gabriel	babab5d535	[feature-wip] support datetimev2 (#11085 )	2022-07-23 16:07:59 +08:00
Lightman	a71822a74d	[refactor]remove col_unique_id (#11025 )	2022-07-20 16:35:14 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
Lightman	486cf0ebd4	[Feature] Lightweight schema change of add/drop column (#10136 ) * [Schema Change] support fast add/drop column (#49) * [feature](schema-change) support fast schema change. coauthor: yixiutt * [schema change] Using columns desc from fe to read data. coauthor: Lchangliang * [feature](schema change) schema change optimize for add/drop columns. 1.add uniqueId field for class column. 2.schema change for add/drop columns directly update schema meta Co-authored-by: yixiutt <yixiu@selectdb.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com> [Feature](schema change) fix write and add regression test (#69) Co-authored-by: yixiutt <yixiu@selectdb.com> [schema change] be ssupport that delete use newest schema add delete regression test fix regression case (#107) tmp [feature](schema change) light schema change exclude rollup and agg/uniq/dup key type. [feature](schema change) fe olapTable maxUniqueId write in disk. [feature](schema change) add rpc iface for sc add column. [feature](schema change) add columnsDesc to TPushReq for ligtht sc. resolve the deadlock when schema change (#124) fix columns from fe don't has bitmap_index flag (#134) add update/delete case construct MATERIALIZED schema from origin schema when insert fix not vectorized compaction coredump use segment cache choose newest schema by schema version when compaction (#182) [bugfix](schema change) fix ligth schema change problem. [feature](schema change) light schema change add alter job. (#1) fix be ut [bug] (schema change) unique drop key column should not light schema change [feature](schema change) add schema change regression-test. fix regression test [bugfix](schema change) fix multi alter clauses for light schema change. (#2) [bugfix](schema change) fix multi clauses calculate column unique id (#3) modify PushTask process (#217) [Bugfix](schema change) fix jobId replay cause bdbje exception. [bug](schema change) fix max col unique id repeatitive. (#232) [optimize](schema change) modify pendingMaxColUniqueId generate rule. fix compaction error * fix be ut * fix snapshot load core fix unique_id error (#278) [refact](fe) remove redundant code for light schema change. (#4) [refact](fe) remove redundant code for light schema change. (#4) format fe core format be core fix be ut modify fe meta version fix rebase error flush schema into rowset_meta in old table [refactor](schema change) refact fe light schema change. (#5) delete the change of schemahash and support get max version schema * modify for review * fix be ut * fix schema change test	2022-07-12 19:41:06 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
Pxl	f2aa5f32b8	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change (#9811 ) Some pre-refactorings or interface additions for schema change	2022-06-07 15:04:57 +08:00
Lightman	b2c2cdb122	[feature] Support compression prop (#8923 )	2022-05-27 21:52:05 +08:00
Xinyi Zou	b34ed43ec9	[feature-wip] (memory tracker) (step6, End) Fix some details (#9301 ) 1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track. 2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability. 3. More powerful MemTracker debugging capabilities. 4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%. 5. Fix some other details.	2022-05-10 18:17:09 +08:00
BePPPower	51db78d375	[refactor] modify all OLAP_LOG_WARNING to LOG(WARNING) (#9473 ) Co-authored-by: BePPPower <fangtiewei@selectdb.com>	2022-05-10 09:25:25 +08:00
Adonis Ling	bd126f0679	[improvement] Refactor type info for further optimizations. (#8786 ) ## Design: For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types. There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes. 1. `const TypeInfo* get_scalar_type_info(FieldType field_type)` The function is used to get the type info of scalar types. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 2. `const TypeInfo* get_collection_type_info(FieldType sub_type)` The function is used to get the type info of array types with just ONE depth. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)` 4. `TypeInfoPtr get_type_info(const TabletColumn* col)` These functions are used to get the type info of BOTH scalar types and composite types. The caller should be responsible to manage the resources returned. #### About the new type `TypeInfoPtr` `TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter. 1. For scalar types, the deleter does nothing. 2. For composite types, the deleter reclaim the memory. By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr: 1. `Field` 2. `ColumnReader` 3. `DefaultValueColumnIterator` Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance. 1. `ScalarColumnWriter` - holds `Field` 1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types. 2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and `BitmapIndexWriter` doesn't support `ArrayType`. 3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types. 2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types). 3. `ColumnVectorBatch` 1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `IndexedColumnReader` 2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader` 3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`	2022-04-20 14:47:29 +08:00
caiconghui	4076c5466b	[refactor][improvement](type_info) use template and single instance to refactor get type info logic (#8680 ) 1. use const pointer instead of shared_ptr 2. Restrict array types to support only primitive types and nest up to 9 levels.	2022-04-03 10:10:36 +08:00
spaces-x	bea9a7ba4f	[feature] Support pre-aggregation for quantile type (#8234 ) Add a new column-type to speed up the approximation of quantiles. 1. The new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles. 2. support pre-aggregation of new column-type and quantile_state related functions.	2022-03-24 09:11:34 +08:00
camby	9f0b93e3c6	[feature-wip](array-type) Fix conflict while merge array-type branch (#8594 )	2022-03-22 16:35:30 +08:00
camby	a498463ab5	[feature-wip](array-type)support select ARRAY data type on vectorized engine (#8217 ) (#8584 ) Usage Example: 1. create table for test; ``` `CREATE TABLE `array_test` ( `k1` tinyint(4) NOT NULL COMMENT "", `k2` smallint(6) NULL COMMENT "", `k3` ARRAY<int(11)> NULL COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(`k1`) COMMENT "OLAP" DISTRIBUTED BY HASH(`k1`) BUCKETS 5 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" );` ``` 2. insert some data ``` `insert into array_test values(1, 2, [1, 2]);` `insert into array_test values(2, 3, null);` `insert into array_test values(3, null, null);` `insert into array_test values(4, null, []);` ``` 3. open vectorized `set enable_vectorized_engine=true;` 4. query array data `select * from array_test;` +------+------+--------+ \| k1 \| k2 \| k3 \| +------+------+--------+ \| 4 \| NULL \| [] \| \| 2 \| 3 \| NULL \| \| 1 \| 2 \| [1, 2] \| \| 3 \| NULL \| NULL \| +------+------+--------+ 4 rows in set (0.061 sec) Code Changes include： 1. add column_array, data_type_array codes; 2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory; 3. support create data_type for ARRAY date type; 4. RowBlockV2::convert_to_vec_block support ARRAY date type; 5. VMysqlResultWriter::append_block support ARRAY date type; 6. vectorized::Block serialize and deserialize support ARRAY date type;	2022-03-22 15:21:44 +08:00
HappenLee	41a15ccd45	[fix](vectorized) Agg/Unique not null column outer join coredump (#8461 )	2022-03-14 10:52:17 +08:00
HappenLee	51abaa89f3	[fix](vec) Fix some bugs about vec engine (#7884 ) 1. mem leak in vcollector iter 2. query slow in agg table limit 10 3. query slow in SSB q4,q5,q6	2022-02-03 19:21:17 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
xinghuayu007	dd36ccc3bf	[feature](storage-format) Z-Order Implement (#7149 ) Support sort data by Z-Order: ``` CREATE TABLE table2 ( siteid int(11) NULL DEFAULT "10" COMMENT "", citycode int(11) NULL COMMENT "", username varchar(32) NULL DEFAULT "" COMMENT "", pv bigint(20) NULL DEFAULT "0" COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(siteid, citycode) COMMENT "OLAP" DISTRIBUTED BY HASH(siteid) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "data_sort.sort_type" = "ZORDER", "data_sort.col_num" = "2", "in_memory" = "false", "storage_format" = "V2" ); ```	2021-12-02 11:39:51 +08:00
Zhengguo Yang	8738ce380b	Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391 )	2021-08-18 09:05:40 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
Zhengguo Yang	739c0268ff	[refactor] Remove decimal v1 related code from code base (#6079 ) remove ALL DECIMAL V1 type code ， this is a part of #6073	2021-07-07 10:26:32 +08:00
Lijia Liu	4d64612b96	[ARRAY]Save array's size instead of offset. (#5983 ) * Save array's size instead of offset. * Optimize variable name * Fix comment	2021-06-10 12:32:58 +08:00
HappenLee	b423274f17	[Enhance] Make MemTracker more accurate (#5515 ) (#5516 ) * [Enhance] Make MemTracker more accurate (#5515) This PR main about: 1. Improve the readability of MemTrackers' name 2. Add the MemTracker of: * Load * Compaction * SchemaChange * StoragePageCache * TabletManager 3. Change SchemaChange to a Singleon * revise some code for Code Review * change the name of mem_tracker * keep reader_context have the same lifetime of rowset_reader in schema change. * change vlog notice to log(warning) in schema change	2021-04-08 09:14:55 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Lijia Liu	b48c768dc7	[ComplexType] Restructure storage type to support complex types expending (#4905 ) This CL includes: * Change the column metadata to a tree structure. * Refactor the segment_v2.ColumnReader and sgment_v2.ColumnWriter to support complex type. * Implements the reading and writing of array type.	2020-11-16 21:59:41 +08:00
Yingchun Lai	d1c2b3ed0d	[Optimize] Add an unordered_map for TabletSchema to speed up column name lookup (#4779 ) Reduce column name lookup for TabletSchema and Tablet from O(N) to O(1).	2020-11-03 19:53:44 +08:00
Youngwb	068707484d	Support sequence column for UNIQUE_KEYS Table (#4256 ) * add sequence col Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>	2020-09-04 10:10:17 +08:00
Zhengguo Yang	d61c10b761	[Delete] Support batch delete [part 1] (#4310 ) * Implements the grammar of the batch delete #4051 * Process create, alter table when table has delete sign column * Support the syntax for enabling the delete column * Automatically filtered deleted data in the select statement. * Automatically add delete sign when create rollup table TODO: * Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction	2020-08-21 22:57:16 +08:00
Yingchun Lai	3b6a781862	[Bug] Fix a bug that tablet's _preferred_rowset_type may be modified to BETA_ROWSET after cloned (#3750 ) TabletMeta's _preferred_rowset_type is not initialized after object constructing and may be a random value, and this field is not updated when create ALPHA_ROWSET tablet, and it will not be serialized into pb in this case. So if cloning an ALPHA_ROWSET tablet from another BE, this new created local tablet's _preferred_rowset_type field may be random as BETA_ROWSET and can not be overwrote after cloned, then new input rows will be wrote as BETA_ROWSET format which is not we expect. This patch fix this bug by giving _preferred_rowset_type a default value and updating this field when create any type of tablet, and add an unit test and related overwrite equal operator functions.	2020-06-06 11:36:28 +08:00
Yingchun Lai	c08d6e4708	[tablet meta] Do some refactor on TabletMeta (#3136 ) remove some functions' return value which always return OLAP_SUCCESS optimize some loops	2020-03-20 15:03:22 +08:00
trueeyu	3b8e9d8dcf	[UT] Fix the test case of SegmentReaderWriterTest::TestBitmapPredicate (#2961 ) function create_int_key() will create a TableColumn instance with data memger: _aggregation=(random value) if _aggregation==OLAP_FIELD_AGGREGATION_REPLACE SegmentWriter::init() will set opts.need_bitmap_index = false; so the test case TEST_F(SegmentReaderWriterTest, TestBitmapPredicate) of olap/rowset/segment_v2/segment_test.cpp will exec failed if the_aggregation of TableColumn == OLAP_FIELD_AGGREGATION_REPLACE. ``` TEST_F(SegmentReaderWriterTest, TestBitmapPredicate) { TabletSchema tablet_schema = create_schemate({ create_int_key(1, true, false, true), create_int_key(2, true, false, true), create_int_value(3), create_int_value(4)}); ... ASSERT_TRUE(segment->footer().columns(0).has_bitmap_index()); ... } ```	2020-02-21 17:16:49 +08:00
kangkaisen	625411bd28	Doris support in memory olap table (#2847 )	2020-02-18 10:45:54 +08:00
yangzhg	c098178f7a	[Index] Implements create drop show index syntax for bitmap index [#2487 ] (#2573 ) ### create table with index ``` CREATE TABLE table1 ( siteid INT DEFAULT '10', citycode SMALLINT, username VARCHAR(32) DEFAULT '', pv BIGINT SUM DEFAULT '0', INDEX index_name [USING BITMAP] (siteid, citycode) COMMENT 'balabala' ) AGGREGATE KEY(siteid, citycode, username) DISTRIBUTED BY HASH(siteid) BUCKETS 10 PROPERTIES("replication_num" = "1"); ``` ### create index ``` CREATE INDEX index_name ON table1 (siteid, citycod) [USING BITMAP] COMMENT 'balabala'; or ALTER TABLE table1 ADD INDEX index_name [USING BITMAP] (siteid, citycod) COMMENT 'balabala'; ``` ### drop index ``` DROP INDEX index_name ON table1; or ALTER TABLE table1 DROP INDEX index_name ``` ### show index ``` SHOW INDEX[ES] FROM table1 ``` output ``` +---------+-------------+-----------------+------------+---------+ \| Table \| Index_name \| Column_name \| Index_type \| Comment \| +---------+-------------+-----------------+------------+---------+ \| table1 \| index_name \| siteid,citycode \| BITMAMP \| balabala\| +---------+-------------+-----------------+------------+---------+ ```	2020-01-03 17:41:26 +08:00
ZHAO Chun	65c3b0907a	Support aggregation type of REPLACE_IF_NOT_NULL (#2127 ) Some use has the requirment that only some of columns will be update in one load operation, and others will retain as original. However, Doris can't handle this situation, because user must specify value for all columns. Then if a column aggregation method is REPLACE, use must query original value to overwrite it. This often needs some work for user to do. If this CL is applied, user can use REPLACE_IF_NOT_NULL instead of REPLACE. Then when load data to table, if user don't intent to change value of this column, user can specify NULL for this column. Doris will retain original value for this column.	2019-11-05 18:08:34 +08:00
kangkaisen	95a3b4ccfe	Add object type (#1948 ) Add a new type: Object. Currently, it's mainly for complex aggregate metrics(HLL , Bitmap). The Object type has the following constraints： 1 Object type could not as key column type 2 Object type doesn't support all indices (BloomFilter, short key, zone map, invert index) 3 Object type doesn't support filter and group by In the implementation： The Object type reuse the StringValue and StringVal, because in storage engine, the Object type is binary, it has a pointer and length.	2019-10-31 21:42:58 +08:00
wubiao	e43f1a2766	Fix NPE error when creating table with bool column (#1864 )	2019-09-25 14:40:13 +08:00
kangkaisen	1e4dd77d2a	Add bitmap agg type and udaf (#1610 )	2019-08-26 14:24:42 +08:00
ZHAO Chun	0805b05d81	Remove unused FieldInfo (#1540 )	2019-07-24 19:33:30 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00

42 Commits