doris

Author	SHA1	Message	Date
LingBin	4e151b1551	Remove boost exception when parse store path (#2861 )	2020-02-10 17:50:52 +08:00
kangkaisen	e7817053cc	[Uitls] ParseUtil::parse_mem_spec support K and T suffix (#2854 )	2020-02-07 09:31:35 +08:00
kangpinghuang	a27e89065b	Add file cache for v2 (#2782 ) Add file descriptor cache for segment v2 to solve too many open file problems	2020-02-04 00:16:01 +08:00
LingBin	c71eefa2ac	Add path util (#2747 ) Note that the methods in path_util are only related to path processing, and do not involve any file and IO operations The upcoming patch will use these util methods, used to extract operations such as concatenation of directory strings from processing logic.	2020-01-18 00:05:00 +08:00
Dayue Gao	3b24287251	Support 64 bits integers for BITMAP type (#2772 ) Fixes #2771 Main changes in this CL * RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h * leveraging Roaring64Map to support unsigned BIGINT for BITMAP type * introduces two new format (SINGLE64 and BITMAP64) for BITMAP type So far we have three storage format for BITMAP type ``` EMPTY := TypeCode(0x00) SINGLE32 := TypeCode(0x01), UInt32LittleEndian BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/) ``` In order to support BIGINT element and keep backward compatibility, introduce two new format ``` SINGLE64 := TypeCode(0x03), UInt64LittleEndian BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64 ``` Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column. Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are 1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](`35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)`). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this. 2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.	2020-01-17 14:13:38 +08:00
ZHAO Chun	a99a49a444	Add bitamp_to_string function (#2731 ) This CL changes: 1. add function bitmap_to_string and bitmap_from_string, which will convert a bitmap to/from string which contains all bit in bitmap 2. add function murmur_hash3_32, which will compute murmur hash for input strings 3. make the function cast float to string the same with user result logic	2020-01-13 12:31:37 +08:00
yangzhg	4b8f7f9c32	Use cgroups memory limit and cpu cores in container (#2710 )	2020-01-10 00:45:50 +08:00
yangzhg	de4d1778c6	Fix incompatibility with arm architecture in util and gutil (#2650 ) 1. upgrade gutil code from imapla to new verison， include `cpuinfo`, `spinlock` and `linux_syscall_support ` 2. impliments arm version utf8 check code 3. remove incompatible code from stopwatch	2020-01-06 18:39:31 +08:00
ZHAO Chun	1648226927	Adapt arrow 0.15 API (#2657 ) This CL supports arrow's zero copy read interface, which can make code comply with arrow 0.15. And the schema change unit test has some problem, I disable it in run-ut.sh	2020-01-04 15:54:29 +08:00
Seaven	4ed87964fe	Add zip util(#2348 ) (#2441 ) Support .zip file extract by minizip	2019-12-27 10:10:21 +08:00
kangkaisen	cf6d705df9	Add intersect_count UDAF (#2418 ) 1 Because we don't support array type currently, so I use variable arguments instead. 2 intersect_count directly return final count, not bitmap like bitmap_union, because intersect_count return bitmap is more complex and need more serialize. If we really need bitmap format from intersect_count, we could do that in another PR and which won't have compatibility problems.	2019-12-13 16:12:05 +08:00
Dayue Gao	83b5455be5	[Load] Fix several races in stream load that could cause BE crash (#2414 ) This CL fixes the following problems 1. check whether TabletsChannel has been closed/cancelled in `reduce_mem_usage` to avoid using a closed DeltaWriter 2. make `FlushHandle.wait` wait for all submitted tasks to finish so that memtable is deallocated before its delta writer 3. make `~MemTracker()` release its consumption bytes to accommodate situations in aggregate_func.h that bitmap and hll call `MemTracker::consume` without corresponding `MemTracker::release`, which cause the consumption of root tracker never drops to zero	2019-12-10 21:59:05 +08:00
LingBin	f635552a20	Port latest faststring (#2403 )	2019-12-06 20:39:56 +08:00
kangkaisen	ba3d16f4c7	Add BinaryPrefixPage (#2308 )	2019-11-29 07:39:11 +08:00
kangkaisen	14769b0beb	Improve to_bitmap parse int performance (#2223 )	2019-11-19 18:00:19 +08:00
shengyunyao	ccc1b9d98c	Optimize percentile_approx through radix sort (#2102 ) (#2107 )	2019-11-05 09:25:47 +08:00
Yunfeng,Wu	f53f188c5d	Add arrow IPC serialization for Doris-Spark-Connector (#2013 )	2019-10-31 10:32:06 +08:00
ZHAO Chun	05643dc403	Replace Arena with MemPool (#2012 ) After replacing Arena with MemPool, we can achieve one copy for string value read from segment v2. We can exchange MemPool's chunk between RowBlockV2 and RowBlock. This change only replace Arena, this work will be done in other change list.	2019-10-19 15:53:24 +08:00
ZHAO Chun	024348d74b	Enable auto convert when check in (#1926 ) Leverage gitattributes to enable auto convert end-of-line to LF when checking in. Convert already exist CRLF to LF by removing all files and checking out with new .gitattributes file. Except .gitattributes, all files are only modified at the end of line.	2019-10-09 22:31:27 +08:00
yiguolei	f852f50acb	Improve unique id performance (#1911 ) Remove the default constructor for UniqueID Add a gen_uid method in UniqueId. If need to generate a new uid, users should call this api explicitly. Reuse boost random generator not generate a new one every time.	2019-09-29 18:20:02 +08:00
kangkaisen	0c22d8fa08	Add frame_of_reference page (#1818 )	2019-09-28 01:10:29 +08:00
Mingyu Chen	c643cbd30c	Optimize the load performance for large file (#1798 ) The current load process is: Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk In the path of Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk, the following operations are performed: Insert tuple into different memtables according to tablet ID When the memtable size reaches the threshold, it is written to disk. The above operations are equivalent to single thread execution for a single load task. In fact, the insertion of memtable and the flush of memtable can be executed synchronously. Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing. In the new implementation, I added a MemTableFlushExecutor class with a set of flush queues and corresponding worker threads. By default, each data directory uses two worker threads for flush, which can be modified by the parameter flush_thread_num_per_store of BE. DeltaWriter will push the full memtable to MemTableFlushExecutor for flush operation and generate a new memtable for receiving new data. This design can improve the performance of load large files. In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.	2019-09-25 13:49:32 +08:00
Dayue Gao	65dcabf1df	Use crc32c checksum for segment v2 (#1753 )	2019-09-06 15:23:57 +08:00
Dayue Gao	f76dad289e	Basic implementation for BetaRowsetReader (#1718 )	2019-09-03 13:52:16 +08:00
kangpinghuang	6865f4238b	Add limit to show tablet stmt (#1547 ) Also add some where predicates for filtering results ISSUE #1687	2019-08-28 16:25:12 +08:00
ZHAO Chun	58801c6ab0	Support converting RowBatch and RowBlockV2 to/from Arrow (#1699 )	2019-08-27 11:30:00 +08:00
ZHAO Chun	acf868c9d0	Support page compression and checksum in BetaRowset (#1646 )	2019-08-19 09:40:47 +08:00
ZHAO Chun	c0253a17fc	Add block compression codec and remove not used codec (#1622 )	2019-08-12 20:47:16 +08:00
lichaoyong	a9e8113b82	Fix heap-buffer-overflow in split_part() function in StringFunctions (#1482 )	2019-07-15 23:00:37 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
HangyuanLiu	a7390c03f4	Add percentile_approx aggregate function (#1432 )	2019-07-11 16:44:43 +08:00
worker24h	7eab12a40e	Support reading Parquet file when loading data (#1173 )	2019-07-01 18:39:27 +08:00
kangpinghuang	7f1720b632	Add rle encoding (#1326 )	2019-06-18 14:48:33 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
kangpinghuang	e9b2d30c6a	Add faststring and cpu util (#1281 )	2019-06-12 14:00:50 +08:00
ZHAO Chun	84632cd062	Add BitMapIterator (#1277 )	2019-06-11 09:23:02 +08:00
ZHAO Chun	3e1c70d1b7	Add coding function (#1264 )	2019-06-08 21:02:31 +08:00
Mingyu Chen	a08170fd50	Enhance the usabilities (#1100 ) * Enhence the usabilities 1. Add metrics to monitor transactions and steaming load process in BE. 2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s. 3. Modify FE config 'enable_metric_calculator' to true. 4. Add more log for tracing broker load process. 5. Modify the query report process, to cancel query immediately if some instance failed. * Fix bugs 1. Avoid NullPointer when enabling colocation join with broker load 2. Return immediately when pull load task coordinator execution failed	2019-05-07 15:55:04 +08:00
morningman	e8b360d193	Merge master and fix BE ut	2019-04-28 10:33:50 +08:00
Mingyu Chen	ff7d3e5878	Unify the print method of TUniqueId (#487 )	2018-12-29 16:22:38 +08:00
ZHAO Chun	e2bb86cf78	Add Md5Digest to util (#420 )	2018-12-12 20:06:35 +08:00
Mingyu Chen	9a2ad18428	Add path info of replica in catalog (#327 ) Add path info of replica in catalog Also fix a bug that when calling check_none_row_oriented_table, store is null, it cannot be used to create table. Instead, OLAPHeader can be used to get storage type information.	2018-11-19 17:42:46 +08:00
Zhao Chun	0aea149c0b	Fix core local value UT failed (#324 ) Issue: #323	2018-11-16 15:27:16 +08:00
kangpinghuang	c877b43013	Remove my aes and fix palo ns to doris (#277 )	2018-11-02 17:05:48 +08:00
kangpinghuang	d57e91db6e	Rewrite aes encryption (#264 ) Resolve #257	2018-11-02 15:26:31 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00
morningman	2868793b6b	Change license to Apache License 2.0 (#262 )	2018-11-01 09:06:01 +08:00
morningman	051aced48d	Missing many files in last commit In last commit, a lot of files has been missed	2018-10-31 16:19:21 +08:00
morningman	5d3fc80067	Added: * Add streaming load feature. You can execute 'help stream load;' to see more information. Changed: * Loading phase of a certain table can be parallelized, to reduce the load job execution time when multi load jobs to a single table. * Using RocksDB to save the header info of tablets in Backends, to reduce the IO operations and increate speeding of restarting. Fixed: * A lot of bugs fixed.	2018-10-31 14:46:22 +08:00
zhaochun	765c91bbc2	Added: change Doris build.sh to get environment variables from custom_env.sh, and add run-ut.sh and run-fe-ut.sh	2018-10-30 23:42:05 +08:00

1 2

59 Commits