doris

Author	SHA1	Message	Date
Lijia Liu	99ad56d1bf	Support bitmap index for more type (#2630 ) For #2589 1. date(uint24_t)/datetime(int64_t)/largeint(int128_t) use frame of reference code as dict. 2. decimal(decimal12_t) also uses frame of reference code as dict. 3. float/double use bitshuffle code as dict.	2020-01-31 21:09:29 +08:00
Lishi	89c7234c1c	Support starts_with (str, prefix) function (#2813 ) Support starts_with function	2020-01-21 14:09:08 +08:00
HangyuanLiu	64e99f29e6	Fix parquet arrow read batch bug (#2812 ) Fix parquet arrow read batch bug #2811 The original code was to determine the number of rows in the batch based on the number of rows in the parquet RowGroup.But now it's a batch take 65535 lines. So when parquet row greater than 65535，the number of batch don't match the number of rowgroup. The code using the field "_current_line_of_group" as a position of array can cause the data to be out of array cause be crash	2020-01-21 10:57:56 +08:00
LingBin	7c4149cf27	Improve comparison and printing of Version (#2796 ) * Improve comparison and printing of Version There are two members in `Version`:` first` and `second`. There are many places where we need to print one `Version` object and compare two `Version` objects, but in the current code, these two members are accessed directly, which makes the code very tedious. This patch mainly do: 1. Adds overloaded methods for `operator<<()` for `Version`, so we can directly print a Version object; 2. Adds the `cantains()` method to determine whether it is an containment relationship; 3. Uses `operator==()` to determine if two `Version` objects are equal. Because there are too many places need to be modified, there are still some naked codes left, which will be modified later. This patch also removes some necessary header file references. No functional changes in this patch.	2020-01-19 18:04:28 +08:00
Youngwb	1550401d4b	Support param exec_mem_limit for spark-doris-connctor (#2775 )	2020-01-18 00:14:39 +08:00
LingBin	c71eefa2ac	Add path util (#2747 ) Note that the methods in path_util are only related to path processing, and do not involve any file and IO operations The upcoming patch will use these util methods, used to extract operations such as concatenation of directory strings from processing logic.	2020-01-18 00:05:00 +08:00
yangzhg	fc55423032	[SQL] Support Grouping Sets, Rollup and Cube to extend group by statement Support Grouping Sets, Rollup and Cube to extend group by statement support GROUPING SETS syntax ``` SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) ); ``` cube or rollup like ``` SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP\|CUBE(a,b,c) ``` [ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039) [FIX] fix analyzer error in window function(#2039)	2020-01-17 16:24:02 +08:00
Dayue Gao	3b24287251	Support 64 bits integers for BITMAP type (#2772 ) Fixes #2771 Main changes in this CL * RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h * leveraging Roaring64Map to support unsigned BIGINT for BITMAP type * introduces two new format (SINGLE64 and BITMAP64) for BITMAP type So far we have three storage format for BITMAP type ``` EMPTY := TypeCode(0x00) SINGLE32 := TypeCode(0x01), UInt32LittleEndian BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/) ``` In order to support BIGINT element and keep backward compatibility, introduce two new format ``` SINGLE64 := TypeCode(0x03), UInt64LittleEndian BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64 ``` Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column. Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are 1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](`35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)`). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this. 2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.	2020-01-17 14:13:38 +08:00
LingBin	d0e2fc3305	Remove resource_info related members from TaskWorkerPool (#2704 ) The `TResourceInfo` was used to help `cgruops` to isolate resources, but it is no longer used. In fact, the `TResourceInfo` information is no longer carried in the requests from FE to BE.	2020-01-16 14:39:08 +08:00
HangyuanLiu	0ddca59d36	Add timestampadd/timestampdiff function (#2725 )	2020-01-15 21:47:07 +08:00
kangpinghuang	7fe6431ac7	Fix delete handler init when schema change (#2767 ) delete handler init failed because there are missed version. Schema change should return failure when get version failed.	2020-01-15 15:42:56 +08:00
Mingyu Chen	9e54751098	[Snapshot] Modify the prefer snapshot version (#2748 ) In this CL, prefer snapshot version in snapshot request is defined in thrift. So that both FE and BE can use this version value.	2020-01-15 15:10:14 +08:00
DanyBin	7768629f08	Add bitmap_contains and bitmap_has_any functions (#2752 )	2020-01-15 14:31:44 +08:00
HangyuanLiu	a36193dfab	Support decimal and timestamp type in orc load (#2759 )	2020-01-15 07:40:30 +08:00
kangkaisen	64b2291347	Allow user to ignore the broken disk (#2755 ) Add a BE config `ignore_broken_disk`.	2020-01-14 22:40:43 +08:00
frwrdt	f071d5a307	Support ends_with function (#2746 )	2020-01-14 22:37:20 +08:00
ZHAO Chun	a99a49a444	Add bitamp_to_string function (#2731 ) This CL changes: 1. add function bitmap_to_string and bitmap_from_string, which will convert a bitmap to/from string which contains all bit in bitmap 2. add function murmur_hash3_32, which will compute murmur hash for input strings 3. make the function cast float to string the same with user result logic	2020-01-13 12:31:37 +08:00
kangpinghuang	60dc7c394f	Fix rowset state transition bug of release (#2726 ) Add on_release to tranfer state when release is called. When release called, state should transfer from unloading to unloaded, not from loaded.	2020-01-10 18:29:54 +08:00
kangpinghuang	3690f3e917	Add rowset state (#2691 ) 1. add rowset state to rowset 2. add close api to rowset to release resources issue: #2665	2020-01-10 14:17:57 +08:00
yangzhg	4b8f7f9c32	Use cgroups memory limit and cpu cores in container (#2710 )	2020-01-10 00:45:50 +08:00
lichaoyong	fa4407cf4f	Fix bug for cumulative compaction on singleton rowset with multiple segments (#2719 ) Row will be scanned mistakenly after cumulative compaction on singleton rowset. If I have (1, 1), (2, 2), (3, 3) three records. Now I have read (1, 1), this bug will make return row is (2, 2) instead of (1, 1).	2020-01-09 21:08:21 +08:00
Dayue Gao	e309eb1d40	[Compaction] Rowset with only one segment should be considered as non-overlapping (#2700 ) Currently all singleton rowsets with data are considered overlapping upon construction, even when the rowset contains only one segment. In the meanwhile, singleton rowsets could be input to base compaction (when the tablet hasn't been compacted for base_compaction_interval_seconds_since_last_operation) as long as they are converted into non-overlapping rowsets by cumulative compactor. By making rowset with one segment non-overlapping, we can avoid the work to convert such rowset to non-overlapping rowset in cumulative compaction.	2020-01-08 23:05:11 +08:00
kangkaisen	1c9cfa7e0f	Fix invalid to_bitmap input lead to BE core (#2706 )	2020-01-08 22:14:37 +08:00
DanyBin	a028c52edd	Add BE function bitmap_or and bitmap_and (#2707 )	2020-01-08 19:59:44 +08:00
HuangWei	e90170a5d0	Fix bug: map erase in txn_manager (#2705 )	2020-01-08 18:53:11 +08:00
Mingyu Chen	13e5fdd512	[AlphaRowset] set num_segments field in rowset meta if missing (#2658 ) the num segments should be read from rowset meta pb. But the previous code error caused this value not to be set in some cases. So when init the rowset meta and find that the num_segments is 0(not set), we will try to calculate the num segments from AlphaRowsetExtraMetaPB, and then set the num_segments field. This should only happen in some rowsets converted from old version. and for all newly created rowsets, the num_segments field must be set.	2020-01-07 21:46:02 +08:00
Dayue Gao	4e2f01a9fa	[Compaction] Fix a bug that CumulativeCompaction compares time of different precision (#2693 ) time(NULL) returns second-resolution timestamp, however all compaction related time in Tablet are in millis-resolution. Therefore should use UnixMillis() instead.	2020-01-07 21:31:36 +08:00
LingBin	844ccaafc9	Remove boost filesystem exception in FileUtils (#2692 ) If `errer_code` is provided, then the `boost::filesystem` functions will not throw an exception, so we do not need to catch it.	2020-01-07 07:29:05 -06:00
kangkaisen	7d2610d091	Change bitmap functions return type to BITMAP (#2690 )	2020-01-07 19:27:21 +08:00
yangzhg	852046de29	Fix incompatibility with arm architecture in olap #2645 (#2682 )	2020-01-07 19:16:10 +08:00
HangyuanLiu	2326b478b6	Support load orc format in Apache Doris (#2554 ) Support load orc format in Apache Doris	2020-01-07 14:22:43 +08:00
yangzhg	de4d1778c6	Fix incompatibility with arm architecture in util and gutil (#2650 ) 1. upgrade gutil code from imapla to new verison， include `cpuinfo`, `spinlock` and `linux_syscall_support ` 2. impliments arm version utf8 check code 3. remove incompatible code from stopwatch	2020-01-06 18:39:31 +08:00
ZHAO Chun	87a50070c4	Fix bug: parquet scanner don't seek (#2661 )	2020-01-06 13:55:40 +08:00
ZHAO Chun	1648226927	Adapt arrow 0.15 API (#2657 ) This CL supports arrow's zero copy read interface, which can make code comply with arrow 0.15. And the schema change unit test has some problem, I disable it in run-ut.sh	2020-01-04 15:54:29 +08:00
kangkaisen	5dff936243	Fix HLL_UNION_AGG AnalyticFn result in BE core by adding hll_get_value (#2653 )	2020-01-03 19:23:56 +08:00
yangzhg	c098178f7a	[Index] Implements create drop show index syntax for bitmap index [#2487 ] (#2573 ) ### create table with index ``` CREATE TABLE table1 ( siteid INT DEFAULT '10', citycode SMALLINT, username VARCHAR(32) DEFAULT '', pv BIGINT SUM DEFAULT '0', INDEX index_name [USING BITMAP] (siteid, citycode) COMMENT 'balabala' ) AGGREGATE KEY(siteid, citycode, username) DISTRIBUTED BY HASH(siteid) BUCKETS 10 PROPERTIES("replication_num" = "1"); ``` ### create index ``` CREATE INDEX index_name ON table1 (siteid, citycod) [USING BITMAP] COMMENT 'balabala'; or ALTER TABLE table1 ADD INDEX index_name [USING BITMAP] (siteid, citycod) COMMENT 'balabala'; ``` ### drop index ``` DROP INDEX index_name ON table1; or ALTER TABLE table1 DROP INDEX index_name ``` ### show index ``` SHOW INDEX[ES] FROM table1 ``` output ``` +---------+-------------+-----------------+------------+---------+ \| Table \| Index_name \| Column_name \| Index_type \| Comment \| +---------+-------------+-----------------+------------+---------+ \| table1 \| index_name \| siteid,citycode \| BITMAMP \| balabala\| +---------+-------------+-----------------+------------+---------+ ```	2020-01-03 17:41:26 +08:00
kangpinghuang	7951e15208	Fix estimate_segment_size problem #2643 (#2644 )	2020-01-03 11:11:34 +08:00
Mingyu Chen	9c90b09a3f	[Alter Table] No need to check whether table is stable when doing some kinds of alter operation (#2617 ) * [Alter Table] No need to check whether table is stable when doing some kinds of alter operation. Not all alter table operation require table to be stable. Such as rename, modify meta data.	2020-01-02 20:51:23 +08:00
令狐少侠	d05768ffd4	Fix core when es_scanner_node exit (#2634 )	2020-01-02 16:30:11 +08:00
Mingyu Chen	6cab929d6d	[Compaction] Limit the max concurrency of running compaction tasks (#2635 ) Compaction task may sometimes consume much memory and results in OOM. And currently, there is no good way to predict the mem consumption of a compaction task, so I add a new BE config: max_compaction_concurrency to limit the max concurrency of running compaction tasks manually.	2020-01-02 14:47:54 +08:00
Mingyu Chen	cc924c9e6a	[Rowset Reader] Improve the merge read efficiency of alpha rowsets (#2632 ) When merge reads from one rowset with multi overlapping segments, I introduce a priority queue(A Minimum heap data structure) for multipath merge sort, to replace the old N*M time complexity algorithm. This can significantly improve the read efficiency when merging large number of overlapping data. In mytest: 1. Compaction with 187 segments reduce time from 75 seconds to 42 seconds 2. Compaction with 3574 segments cost 43 seconds, and with old version, I kill the process after waiting more than 10 minutes... This CL only change the reads of alpha rowset. Beta rowset will be changed in another CL. ISSUE: #2631	2020-01-02 14:10:05 +08:00
yangzhg	2a8e77d9cb	Support arm atomicops (#2626 ) (#2627 )	2019-12-31 22:39:22 +08:00
lichaoyong	4c5b0b6dc9	Remove VersionHash used to comparison in BE (#2622 )	2019-12-31 19:38:45 +08:00
LingBin	13733d91e3	Fix the missing sync in SegmentWriter (#2623 ) In the default configuration, `WritableFile` does not sync when close file. We need to do it manually to ensure durability.	2019-12-31 18:34:40 +08:00
wkhappy1	9783fb7221	Fix: UDF version `GLIBCXX_3.4.21' not found (#2629 )	2019-12-31 18:32:42 +08:00
kangpinghuang	5229ea24da	Fix bloom filter statistics bug (#2609 )	2019-12-30 23:23:39 +08:00
lichaoyong	da2838e5fe	Set AGG_KEYS upon upgrade from tablet if has_keys_type() is false (#2620 ) Doris support AGG_KEYS/UNIQUE_KEYS/DUP_KEYS/ three storage model. Among these three model, UNIQUE_KYES/DUP_KEYS is added after AGG_KEYS. For historical tablet, the keys_type field to indicate storage model may be missed for AGG_KEYS. So upgrade from historical tablet, this situation should be taken into consideration and set to be AGG_KEYS.	2019-12-30 23:17:16 +08:00
Youngwb	feda66f99f	Spark return error to users when spark on doris query failed (#2531 )	2019-12-30 21:58:13 +08:00
Dayue Gao	da8c9b4429	[Segment V2] refactor SegmentReaderWriterTest and add UT for lazy materialization (#2614 )	2019-12-30 21:07:58 +08:00
kangpinghuang	368bbfd426	Fix linked schema change bug #2610 (#2613 )	2019-12-30 15:48:52 +08:00

1 2 3 4 5 ...

695 Commits