doris

Author	SHA1	Message	Date
kangkaisen	aa540966c6	Output null for hll and bitmap column when select * (#2991 )	2020-03-13 11:59:30 +08:00
WingC	c5660fcb9d	[UT]Fix unit test for cgroup_util (#3094 ) Co-authored-by: wangcong18 <wangcong18@xiaomi.com>	2020-03-12 22:59:40 +08:00
Yingchun Lai	8276c6d7f8	Show BE version in 'show backends;' (#3074 ) In a large scale cluster, we may rolling upgrade BEs, this patch add a column named 'Version' for command 'show backends;', as well as website '/system?path=//backends', to provide a method to check whether there is any BE missing upgraded.	2020-03-12 22:15:13 +08:00
LingBin	905070f4da	[CodeStyle] Fix compile warning (#3076 ) ``` be/src/olap/rowset/segment_v2/ordinal_page_index.cpp:103:22: warning: ‘ordinal’ may be used uninitialized in this function [-Wmaybe-uninitialized] _ordinals[i] = ordinal; ```	2020-03-11 18:17:29 +08:00
LingBin	bf9612e28b	[CodeStyle] Remove unnecessary forward declaration of WritableFile (#3075 )	2020-03-11 18:17:11 +08:00
Youngwb	a77515fe03	[Backup] Fix backup job block at SNAPSHOTING phase (#3058 ) This bug occurred when BE make snapshot, the version required by fe had been merged into the cumulative version, so the snapshot task could not complete the task even if it retried. In order to solve this problem, the BackupJob could be set to CANCELLED, and the user could continue to retry the job. Fix #3057	2020-03-11 14:05:02 +08:00
LingBin	608917c04d	Use block layer to write files (#3064 ) This is the second patch following 58b8e3f574614433ea9e0c427961f2efb3476c2a, This patch use block-layer to write files.	2020-03-11 12:11:25 +08:00
WingC	b9b9a11eae	[Bug] Fix invalid rollback for stream load txn (#3054 )	2020-03-09 22:07:36 +08:00
caiconghui	a1f5b57011	Support sharding tablet_map_lock into more small map locks to make good performance for tablet manage task (#3051 ) Support sharding tablet_map_lock into more small map locks to make good performance for tablet manage task	2020-03-09 16:29:56 +08:00
yangzhg	dc07182bd4	[Intersect] Implements intersect node (#3034 ) imlement of the intersect node now can support statement like `select a from t intersect select b from t1 intersect select 1;`	2020-03-09 10:52:55 +08:00
lichaoyong	c83729435f	Write delete predicate into RowsetMeta upon upgrade from Doris-0.10 to Doris-0.11 (#3044 ) If delete predicate exists in meta in Doris-0.10, all of this predicates should be remained. There is an confused place in Doris-0.10. The delete predicate only exists in OLAPHeaderMessage and PPendingDelta, not in PDelta. This trick results this bug.	2020-03-07 11:16:48 +08:00
HangyuanLiu	1d296e907d	Fix orc load timestamp bug (#3047 ) The timestamp value load from orc file is error, the value has an offset with hive and spark. Becuase the time zone of orc's timestamp is stored inside orc's stripe information, so the timestamp obtained here is an offset timestamp, so parse timestamp with UTC is actual datetime literal.	2020-03-06 18:03:27 +08:00
kangkaisen	fca6c4e523	Fix bitmap null crash (#3042 )	2020-03-05 21:30:32 +08:00
Mingyu Chen	4ed99e3c0c	[Compile] Fix BE compile failure (#3040 ) fix BE compile failure because of BloomFilterIndexWriter bug.	2020-03-05 11:38:42 +08:00
Mingyu Chen	63051a3b37	[Bug] Fix int128 bloom filter write bug (#2995 ) std::set.insert(int128) core dump because segment fault. the reason is the __int128 is not aligned.	2020-03-05 09:15:11 +08:00
Mingyu Chen	cc1a5fb8ea	[Function] Support '%' in date format string (#3037 ) eg: select str_to_date('2014-12-21 12%3A34%3A56', '%Y-%m-%d %H%%3A%i%%3A%s'); select unix_timestamp('2007-11-30 10:30%3A19', '%Y-%m-%d %H:%i%%3A%s'); This also enable us to extract column fields from HDFS file path with contains '%'.	2020-03-05 08:56:02 +08:00
Mingyu Chen	50af594c66	[MemLimit] Normalize the setting of mem limit (#3033 ) Normalize the setting of mem limit to avoid some unexpected exception. For example, use may not setting query mem limit in query plan, which may cause BE crash.	2020-03-05 08:47:45 +08:00
kangpinghuang	f17924650f	[Config] Modify brpc max_body_size to 200M (#3030 ) The default max size per row is 100K, and default row batch size is 2048. So we change the default brpc max_body_size to 200MB to avoid query failure.	2020-03-04 15:30:27 +08:00
Yingchun Lai	aa58cd99d9	Fix disks_total_capacity metric bug (#2988 ) Now disks_total_capacity metric is a user specified capacity, but disks_avail_capacity is the disk's actual available capacity, so disks_total_capacity may be less than disks_avail_capacity, and UsedPct on FE may be a negative number as a result. We'd better to use disk actual capacity for disks_total_capacity metric.	2020-03-02 19:09:50 +08:00
Lishi	0d1e28746e	[Function] Support null_or_empty function (#2977 ) It returns true if the string is empty or NULL. Otherwise it returns false.	2020-03-01 17:35:45 +08:00
LingBin	58b8e3f574	[Fs Block] Add block layer to storage-engine (#2983 ) The abstraction of the Block layer, inspired by Kudu, lies between the "business layer" and the "underlying file storage layer" (`Env`), making them no longer strongly coupled. In this way, for the business layer (such as `SegmentWriter`), there is no need to directly do the file operation, which will bring better encapsulation. An ideal situation in the future is: when we need to support a new file storage system, we only need to add a corresponding type of BlockManager without modifying the business code (such as `SegmentWriter`). With the Block layer, there are some benefits: 1. First and foremost, the mapping relationship between data and `Env` is more flexible. For example, in the storage engine, the data of the tablet can be placed in multiple file systems (`Env`) at the same time. That is, one-to-many relationships can be supported. For example: one on the local and one on the remote storage. 2. The mapping relationship between blocks and files can be adjusted, for example, it may not be a one-to-one relationship. For example, the data of multiple blocks can be stored in a physical file, which can reduce the number of files that need to be opened during querying. It is like `LogBlockManager` in Kudu. 3. We can move the opened-file-cache under the Block layer, which can automatically close and open the files used by the upper layer, so that the upper business level does not need to be aware of the restrictions of the file handle at all (This problem is often encountered online now). 4. Better automatic cleanup logic when there are exceptions. For example, a block that is not closed explicitly can automatically clean up its corresponding file, thereby avoiding generating most garbage files. 5. More convenient for batch file creation and deletion. Some business operations create multiple files, such as compaction. At present, the processing flow that these files go through is executed one by one: 1) creation; 2) writing data; 3) fsync to disk. But in fact, this is not necessary, we only need to fsync this batch of files at the end. The advantage is that it can give the operating system more opportunities to perform IO merge, thereby improving performance. However, this operation is relatively tedious, there is no need to be coupled in the business code, it is an ideal place to put it in the Block layer. This is the first patch, just add related classes, laying the groundwork for later switching of read and write logic.	2020-03-01 10:48:00 +08:00
lichaoyong	f2d2e4bffd	[Unused] Remove unused GC function in DataDir (#3019 )	2020-02-28 21:47:41 +08:00
yangzhg	3b5a0b6060	[TPCDS] Implement the planner for set operation (#2957 ) Implement intersect and except planner. This CL does not implement intersect and except node in execution level.	2020-02-27 16:03:31 +08:00
Dayue Gao	d2d95bfa84	[segment_v2] Switch to Unified and Extensible Page Format (#2953 ) Fixes #2892 IMPORTANT NOTICE: this CL makes incompatible changes to V2 storage format, developers need to create new tables for test. This CL refactors the metadata and page format for segment_v2 in order to * make it easy to extend existing page type * make it easy to add new page type while not sacrificing code reuse * make it possible to use SIMD to speed up page decoding Here we summary the main code changes * Page and index metadata is redesigned, please see `segment_v2.proto` * The new class `PageIO` is the single place for reading and writing all pages. This removes lots of duplicated code. `PageCompressor` and `PageDecompressor` are now useless and removed. * The type of value ordinal is changed from `rowid_t` to 64-bits `ordinal_t`, this affects ordinal index as well. * Column's ordinal index is now implemented by IndexPage, the same with IndexedColumn. * Zone map index is now implemented by IndexedColumn	2020-02-27 15:09:57 +08:00
HuangWei	de4e621427	use canonical path in DiskInfo::get_disk_devices() (#3000 )	2020-02-27 11:00:50 +08:00
trueeyu	7b39d604c3	Remove unused LLVM related codes of CMakeLists (#2910 ) (#2993 ) Remove unused LLVM related codes (step 6, the last step): CMakeLists (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code : CMakeLists	2020-02-26 15:43:22 +08:00
HangyuanLiu	e23d735bac	Fix decimal bug in orc load (#2984 )	2020-02-26 10:58:18 +08:00
trueeyu	0f98f975c7	Remove unused LLVM related codes of directory:be/src/codegen (#2910 ) (#2987 ) Remove unused LLVM related codes of directory (step 5):be/src/codegen (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/codegen	2020-02-26 10:57:57 +08:00
trueeyu	a340bc7a00	Remove unused LLVM related codes of directory:be/src/runtime (#2910 ) (#2985 ) Remove unused LLVM related codes of directory (step 4):be/src/runtime (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/runtime	2020-02-25 13:47:20 +08:00
trueeyu	099e0f74bd	Remove unused LLVM related codes of directory:be/src/exprs (#2910 ) (#2972 ) Remove unused LLVM related codes of directory (step 3):be/src/exprs (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/exprs	2020-02-24 18:23:08 +08:00
Mingyu Chen	8eb413fa69	[Bug][RoutineLoad] Fix bug that routine Load encounter "label already used" exception (#2959 ) This CL modify 2 things: 1. When a routine load task submit failed, it will not be put back to the task queue. 2. The rpc timeout when executing a routine load task in BE is set to `query_timeout` of the task plan. ISSUE: #2964	2020-02-22 22:01:14 +08:00
yangzhg	3e6dfa31c4	[UnitTest] Fix BE unit test randomly failed (#2970 ) * fix http server related unit test failed due to http port has been used * fix unit test failed in DEBUG build type	2020-02-21 22:21:02 +08:00
trueeyu	30549ce8f7	Remove unused LLVM related codes of directory:be/src/util,be/src/udf (#2910 ) (#2968 ) Remove unused LLVM related codes of directory (step 2):be/src/util,be/src/udf (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/util,be/src/udf	2020-02-21 20:42:42 +08:00
trueeyu	3b8e9d8dcf	[UT] Fix the test case of SegmentReaderWriterTest::TestBitmapPredicate (#2961 ) function create_int_key() will create a TableColumn instance with data memger: _aggregation=(random value) if _aggregation==OLAP_FIELD_AGGREGATION_REPLACE SegmentWriter::init() will set opts.need_bitmap_index = false; so the test case TEST_F(SegmentReaderWriterTest, TestBitmapPredicate) of olap/rowset/segment_v2/segment_test.cpp will exec failed if the_aggregation of TableColumn == OLAP_FIELD_AGGREGATION_REPLACE. ``` TEST_F(SegmentReaderWriterTest, TestBitmapPredicate) { TabletSchema tablet_schema = create_schemate({ create_int_key(1, true, false, true), create_int_key(2, true, false, true), create_int_value(3), create_int_value(4)}); ... ASSERT_TRUE(segment->footer().columns(0).has_bitmap_index()); ... } ```	2020-02-21 17:16:49 +08:00
Mingyu Chen	35b09ecd66	[JDK] Support OpenJDK (#2804 ) Support compile and running Frontend process and Broker process with OpenJDK. OpenJDK 13 is tested.	2020-02-20 23:47:02 +08:00
wutiangan	ccc3412f13	Fix bug: Error of exporting double type data to hdfs (#2924 ) (#2925 )	2020-02-20 21:06:50 +08:00
trueeyu	839ec45197	Remove llvm relative code from be/src/exec (#2955 ) Remove unused LLVM related codes of directory:be/src/exec (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/exec.	2020-02-20 20:43:26 +08:00
LingBin	da945c8278	Add log to track problem in small_file_mgr_test (#2951 ) This case will occasionally fail in regression testing, so we add some logs to help to solve it.	2020-02-20 02:21:35 -06:00
HuangWei	ed299d5d8b	Create pprof_profile_dir before heap profiling (#2944 )	2020-02-20 10:41:04 +08:00
WingC	cc0d41277c	[Alter] Add more schema change to varchar type (#2777 )	2020-02-19 23:14:43 +08:00
LingBin	c617fc9064	Fix the flush_status bug in flush-executor (#2933 ) For a tablet, there may be multiple memtables, which will be flushed to disk one by one in the order of generation. If a memtable flush fails, then the load job will definitely fail, but the previous implementation will overwrite `_flush_status`, which may make the error can not be detected, leads to an error load job to be success. This patch also have two other changes: 1. Use `std::bind` to replace `boost::bind`; 2. Removes some unneeded headers.	2020-02-19 20:23:19 +08:00
kangkaisen	a76f2b8211	bitmap_union_count support window function (#2902 )	2020-02-19 14:33:05 +08:00
lichaoyong	1cf0fb9117	Use ThreadPool to refactor MemTableFlushExecutor (#2931 ) 1. MemTableFlushExecutor maintain a ThreadPool to receive FlushTask. 2. FlushToken is used to seperate different tasks from different tablets. Every DeltaWriter of tablet constructs a FlushToken, task in FlushToken are handle serially, task between FlushToken are handle concurrently. 3. I have remove thread limit on data_dir, because of I/O is not the main timer consumer of Flush thread. Much of time is consumed in CPU decoding and compress.	2020-02-18 18:39:04 +08:00
lichaoyong	3f4e18633d	[util] Add Apache License 2.0 to Thread (#2928 )	2020-02-18 15:36:49 +08:00
LingBin	32e998f6e9	[ut] Delete files generated by UT when teardown (#2930 ) If these residual files are not deleted, the UT will fail because the corresponding files already exist when running multiple times.	2020-02-18 15:35:11 +08:00
LingBin	b3c5f0fac7	Remove unneeded headers included in agent-util (#2929 )	2020-02-18 13:18:56 +08:00
kangkaisen	625411bd28	Doris support in memory olap table (#2847 )	2020-02-18 10:45:54 +08:00
worker24h	1f844946e9	Fixbug: Invalid memory address in doris::memory_copy (#2919 ) (#2923 ) When I change schema from char(20) to varchar(20), be will cause coredump.	2020-02-17 18:48:38 +08:00
LingBin	feef077520	Some refactors on `TabletManager` (#2918 ) 1. Add some comments to make the code easier to understand; 2. Make the metric `create_tablet_requests_failed` to be accurate; 3. Some internal methods use naked pointers directly instead of `shared_ptr`; 4. The `using` in `.h` files are contagious when included by other files, so we should only use it in `.cpp` files; 5. Some formatting changes: such as wrapping lines that are too long 6. Parameters that need to be modified, use pointers instead of references No functional changes in this patch.	2020-02-17 14:50:29 +08:00
lichaoyong	f20eb12457	[util] Import ThreadPool and Thread from KUDU (#2915 ) Thread pool design point: All tasks submitted directly to the thread pool enter a FIFO queue and are dispatched to a worker thread when one becomes free. Tasks may also be submitted via ThreadPoolTokens. The token wait() and shutdown() functions can then be used to block on logical groups of tasks. A token operates in one of two ExecutionModes, determined at token construction time: 1. SERIAL: submitted tasks are run one at a time. 2. CONCURRENT: submitted tasks may be run in parallel. This isn't unlike submitted without a token, but the logical grouping that tokens impart can be useful when a pool is shared by many contexts (e.g. to safely shut down one context, to derive context-specific metrics, etc.). Tasks submitted without a token or via ExecutionMode::CONCURRENT tokens are processed in FIFO order. On the other hand, ExecutionMode::SERIAL tokens are processed in a round-robin fashion, one task at a time. This prevents them from starving one another. However, tokenless (and CONCURRENT token-based) tasks can starve SERIAL token-based tasks. Thread design point: 1. It is a thin wrapper around pthread that can register itself with the singleton ThreadMgr (a private class implemented in thread.cpp entirely, which tracks all live threads so that they may be monitored via the debug webpages). This class has a limited subset of boost::thread's API. Construction is almost the same, but clients must supply a category and a name for each thread so that they can be identified in the debug web UI. Otherwise, join() is the only supported method from boost::thread. 2. Each Thread object knows its operating system thread ID (TID), which can be used to attach debuggers to specific threads, to retrieve resource-usage statistics from the operating system, and to assign threads to resource control groups. 3. Threads are shared objects, but in a degenerate way. They may only have up to two referents: the caller that created the thread (parent), and the thread itself (child). Moreover, the only two methods to mutate state (join() and the destructor) are constrained: the child may not join() on itself, and the destructor is only run when there's one referent left. These constraints allow us to access thread internals without any locks.	2020-02-17 11:22:09 +08:00

1 2 3 4 5 ...

798 Commits