doris

Author	SHA1	Message	Date
Yingchun Lai	aa58cd99d9	Fix disks_total_capacity metric bug (#2988 ) Now disks_total_capacity metric is a user specified capacity, but disks_avail_capacity is the disk's actual available capacity, so disks_total_capacity may be less than disks_avail_capacity, and UsedPct on FE may be a negative number as a result. We'd better to use disk actual capacity for disks_total_capacity metric.	2020-03-02 19:09:50 +08:00
Mingyu Chen	511c5eed50	[Doc] Modify format of some docs (#3021 ) Format of some docs are incorrect for building the doc website. * fix a bug that `gensrc` dir can not be built with -j. * fix ut bug of CreateFunctionTest	2020-03-02 19:07:52 +08:00
worker24h	21b87ee23a	[Bug] Access follower FE's website got exception (#3020 ) QualifiedUser field is not set in ConnectContext	2020-03-02 13:53:35 +08:00
worker24h	ef4bb0c011	[RoutineLoad] Auto Resume RoutineLoadJob (#2958 ) When all backends restart, the routine load job can be resumed.	2020-03-02 13:27:35 +08:00
Mingyu Chen	df56588bb5	[Temp Partition] Support add/drop/replace temp partitions (#2828 ) This CL implements 3 new operations: ``` ALTER TABLE tbl ADD TEMPORARY PARTITION ...; ALTER TABLE tbl DROP TEMPORARY PARTITION ...; ALTER TABLE tbl REPLACE TEMPORARY PARTITION (p1, p2, ...); ``` User manual can be found in document: `docs/documentation/cn/administrator-guide/alter-table/alter-table-temp-partition.md` I did not update the grammar manual of `alter-table.md`. This manual is too confusing and too big, I will reorganize this manual after. This is the first part to implement the "overwrite load" feature mentioned in issue #2663. I will implement the "load to temp partition" feature in next PR. This CL also add GSON serialization method for the following classes (But not used): ``` Partition.java MaterializedIndex.java Tablet.java Replica.java ```	2020-03-01 21:30:34 +08:00
Lishi	0d1e28746e	[Function] Support null_or_empty function (#2977 ) It returns true if the string is empty or NULL. Otherwise it returns false.	2020-03-01 17:35:45 +08:00
frwrdt	078e35a62e	Support Amazon S3 data source in Broker Load (#3004 )	2020-03-01 12:53:50 +08:00
LingBin	58b8e3f574	[Fs Block] Add block layer to storage-engine (#2983 ) The abstraction of the Block layer, inspired by Kudu, lies between the "business layer" and the "underlying file storage layer" (`Env`), making them no longer strongly coupled. In this way, for the business layer (such as `SegmentWriter`), there is no need to directly do the file operation, which will bring better encapsulation. An ideal situation in the future is: when we need to support a new file storage system, we only need to add a corresponding type of BlockManager without modifying the business code (such as `SegmentWriter`). With the Block layer, there are some benefits: 1. First and foremost, the mapping relationship between data and `Env` is more flexible. For example, in the storage engine, the data of the tablet can be placed in multiple file systems (`Env`) at the same time. That is, one-to-many relationships can be supported. For example: one on the local and one on the remote storage. 2. The mapping relationship between blocks and files can be adjusted, for example, it may not be a one-to-one relationship. For example, the data of multiple blocks can be stored in a physical file, which can reduce the number of files that need to be opened during querying. It is like `LogBlockManager` in Kudu. 3. We can move the opened-file-cache under the Block layer, which can automatically close and open the files used by the upper layer, so that the upper business level does not need to be aware of the restrictions of the file handle at all (This problem is often encountered online now). 4. Better automatic cleanup logic when there are exceptions. For example, a block that is not closed explicitly can automatically clean up its corresponding file, thereby avoiding generating most garbage files. 5. More convenient for batch file creation and deletion. Some business operations create multiple files, such as compaction. At present, the processing flow that these files go through is executed one by one: 1) creation; 2) writing data; 3) fsync to disk. But in fact, this is not necessary, we only need to fsync this batch of files at the end. The advantage is that it can give the operating system more opportunities to perform IO merge, thereby improving performance. However, this operation is relatively tedious, there is no need to be coupled in the business code, it is an ideal place to put it in the Block layer. This is the first patch, just add related classes, laying the groundwork for later switching of read and write logic.	2020-03-01 10:48:00 +08:00
lichaoyong	f2d2e4bffd	[Unused] Remove unused GC function in DataDir (#3019 )	2020-02-28 21:47:41 +08:00
yangzhg	2ac07a8c07	[Doc] Fix docs mixed Chinese and English (#3017 )	2020-02-28 16:36:37 +08:00
EmmyMiao87	bd23f2cda2	[MaterializedView] Fix bug that result is double when new mv selector is enable (#3012 ) The issue is #3011. Reset the tablet and scan range info before compute it. The old rollup selector has computed tablet and scan range info. Then the new mv selector maybe compute tablet and scan range info again sometimes. So, we need to reset those info in here. Before this commit, the result is double when query is "select k1 ,k2 from aggregate_table "	2020-02-27 18:19:34 +08:00
yangzhg	3b5a0b6060	[TPCDS] Implement the planner for set operation (#2957 ) Implement intersect and except planner. This CL does not implement intersect and except node in execution level.	2020-02-27 16:03:31 +08:00
Dayue Gao	d2d95bfa84	[segment_v2] Switch to Unified and Extensible Page Format (#2953 ) Fixes #2892 IMPORTANT NOTICE: this CL makes incompatible changes to V2 storage format, developers need to create new tables for test. This CL refactors the metadata and page format for segment_v2 in order to * make it easy to extend existing page type * make it easy to add new page type while not sacrificing code reuse * make it possible to use SIMD to speed up page decoding Here we summary the main code changes * Page and index metadata is redesigned, please see `segment_v2.proto` * The new class `PageIO` is the single place for reading and writing all pages. This removes lots of duplicated code. `PageCompressor` and `PageDecompressor` are now useless and removed. * The type of value ordinal is changed from `rowid_t` to 64-bits `ordinal_t`, this affects ordinal index as well. * Column's ordinal index is now implemented by IndexPage, the same with IndexedColumn. * Zone map index is now implemented by IndexedColumn	2020-02-27 15:09:57 +08:00
EmmyMiao87	54b7828c3f	[Doc] The doc of max_running_txn_num_per_db config (#3007 )	2020-02-27 14:57:46 +08:00
HuangWei	de4e621427	use canonical path in DiskInfo::get_disk_devices() (#3000 )	2020-02-27 11:00:50 +08:00
caiconghui	fe086ab92c	[Log] Change log level from warn to debug for unauthrorized exception (#2996 ) This PR is to remove some unused log for unauthorized exception, some unauthorized access such as LVS probe request may cause connection exception which we should ignore.	2020-02-27 09:29:06 +08:00
Yingchun Lai	1fbd34cd32	[Compile] Fix some build errors (#3005 ) `cmake` has been checked by `check_prerequest`, not needed to check it twice, and now `CMAKE_CMD` is a command not a file, check it by '-f' in shell will report an error.	2020-02-27 09:26:56 +08:00
Youngwb	e3d115af91	[Bug][Backup]Fix backup job block at UPLOAD_INFO phase (#3002 ) There is a case where the META upload succeeded but the upload INFO failed, in which case the UPLOAD_INFO task will try again, but the META file has succeeded and filename.part has been renamed to `filename.md5sum`. The retry task will keep failing with rename and cannot complete the backup job. Therefore, the `file.md5sum` file needs to be deleted in advance Fix #3001	2020-02-27 09:21:21 +08:00
EmmyMiao87	a3e588f39c	[MaterializedView] Implement new materialized view selector (#2821 ) This commit mainly implements the new materialized view selector which supports SPJ<->SPJG. Two parameters are currently used to regulate this function. 1. test_materialized_view: When this parameter is set to true, the user can create a materialized view for the duplicate table by using 'CREATE MATERIALIZED VIEW' command. At the same time, if the result of the new materialized views is different from the old version during the query, an error will be reported. This parameter is false by default, which means that the new version of the materialized view function cannot be enabled. 2. use_old_mv_selector: When this parameter is set to true, the result of the old version selector will be selected. If set to false, the result of the new version selector will be selected. This parameter is true by default, which means that the old selector is used. If the default values of the above two parameters do not change, there will be no behavior changes in the current version. The main steps for the new selector are as follows: 1. Predicates stage: This stage will mainly filter out all materialized views that do not meet the current query requirements. 2. Priorities stage: This stage will sort the results of the first stage and choose the best materialized view. The predicates phase is divided into 6 steps: 1. Calculate the predicate gap between the current query and view. 2. Whether the columns in the view can meet the needs of the compensating predicates. 3. Determine whether the group by columns of view match the group by columns of query. 4. Determine whether the aggregate columns of view match the aggregate columns of query. 5. Determine whether the output columns of view match the output columns of query. 6. Add partial materialized views The priorities phase is divided into two steps: 1. Find the materialized view that matches the best prefix index 2. Find the materialized view with the least amount of data The biggest difference between the current materialized view selector and the previous one is that it supports SPJ <-> SPJG.	2020-02-27 09:14:32 +08:00
trueeyu	7b39d604c3	Remove unused LLVM related codes of CMakeLists (#2910 ) (#2993 ) Remove unused LLVM related codes (step 6, the last step): CMakeLists (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code : CMakeLists	2020-02-26 15:43:22 +08:00
HangyuanLiu	e23d735bac	Fix decimal bug in orc load (#2984 )	2020-02-26 10:58:18 +08:00
trueeyu	0f98f975c7	Remove unused LLVM related codes of directory:be/src/codegen (#2910 ) (#2987 ) Remove unused LLVM related codes of directory (step 5):be/src/codegen (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/codegen	2020-02-26 10:57:57 +08:00
Yingchun Lai	57483ade00	[Doc] Fix typo in Chinese document (#2963 ) Fix some errors in Chinese document	2020-02-25 22:30:21 +08:00
Mingyu Chen	8f71b1025a	[Bug][Broker] Fix bug that Broker's alive status is inconsistent in different FEs In this CL, the isAlive field in FsBroker class will be persisted in metadata, to solve the problem describe in ISSUE: #2989 Notice: this CL update FeMetaVersion to 73	2020-02-25 22:27:27 +08:00
trueeyu	a340bc7a00	Remove unused LLVM related codes of directory:be/src/runtime (#2910 ) (#2985 ) Remove unused LLVM related codes of directory (step 4):be/src/runtime (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/runtime	2020-02-25 13:47:20 +08:00
trueeyu	099e0f74bd	Remove unused LLVM related codes of directory:be/src/exprs (#2910 ) (#2972 ) Remove unused LLVM related codes of directory (step 3):be/src/exprs (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/exprs	2020-02-24 18:23:08 +08:00
kangkaisen	fb5b58b75a	Add more constraints for bitmap column (#2966 )	2020-02-24 10:41:18 +08:00
Mingyu Chen	8eb413fa69	[Bug][RoutineLoad] Fix bug that routine Load encounter "label already used" exception (#2959 ) This CL modify 2 things: 1. When a routine load task submit failed, it will not be put back to the task queue. 2. The rpc timeout when executing a routine load task in BE is set to `query_timeout` of the task plan. ISSUE: #2964	2020-02-22 22:01:14 +08:00
wyb	fc2d92d68a	Update spark load doc (#2973 )	2020-02-22 12:00:50 +08:00
yangzhg	3e6dfa31c4	[UnitTest] Fix BE unit test randomly failed (#2970 ) * fix http server related unit test failed due to http port has been used * fix unit test failed in DEBUG build type	2020-02-21 22:21:02 +08:00
trueeyu	96248058a1	[Doc] Modify the default port of restore meta instance in document (#2971 )	2020-02-21 22:05:30 +08:00
trueeyu	30549ce8f7	Remove unused LLVM related codes of directory:be/src/util,be/src/udf (#2910 ) (#2968 ) Remove unused LLVM related codes of directory (step 2):be/src/util,be/src/udf (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/util,be/src/udf	2020-02-21 20:42:42 +08:00
trueeyu	3b8e9d8dcf	[UT] Fix the test case of SegmentReaderWriterTest::TestBitmapPredicate (#2961 ) function create_int_key() will create a TableColumn instance with data memger: _aggregation=(random value) if _aggregation==OLAP_FIELD_AGGREGATION_REPLACE SegmentWriter::init() will set opts.need_bitmap_index = false; so the test case TEST_F(SegmentReaderWriterTest, TestBitmapPredicate) of olap/rowset/segment_v2/segment_test.cpp will exec failed if the_aggregation of TableColumn == OLAP_FIELD_AGGREGATION_REPLACE. ``` TEST_F(SegmentReaderWriterTest, TestBitmapPredicate) { TabletSchema tablet_schema = create_schemate({ create_int_key(1, true, false, true), create_int_key(2, true, false, true), create_int_value(3), create_int_value(4)}); ... ASSERT_TRUE(segment->footer().columns(0).has_bitmap_index()); ... } ```	2020-02-21 17:16:49 +08:00
LingBin	8291a45267	Assign each status type an constant explicitly (#2960 ) The `TStatusCode` struct is used in all FEs and BEs. In order to be able to avoid errors when identifying status_codes in RPC when upgrading Doris (update and restart the servers one by one), we must ensure that each element always a fixed value. If each element is not explicitly assigned a constant, then the value of each element will be assigned from 0 in turn, which will need us to be very careful when adding and removing elements, to avoid the same element on different machines to be recognized as a different value. i.e., new elements can only be added to the end, and only elements at the end can be deleted. Unfortunately, this implicit constraint is likely to be ignored by programmers when coding, especially those who are new to Doris. No functional change in this patch.	2020-02-21 01:34:12 -06:00
kangpinghuang	70d2ccf384	Add spark load design (#2856 )	2020-02-21 14:32:18 +08:00
Mingyu Chen	35b09ecd66	[JDK] Support OpenJDK (#2804 ) Support compile and running Frontend process and Broker process with OpenJDK. OpenJDK 13 is tested.	2020-02-20 23:47:02 +08:00
wutiangan	ccc3412f13	Fix bug: Error of exporting double type data to hdfs (#2924 ) (#2925 )	2020-02-20 21:06:50 +08:00
kangkaisen	ece8740c1b	Fix some function DATE type priority (#2952 ) 1. Fix the bug introduced by https://github.com/apache/incubator-doris/pull/2947. The following sql result is 0000, which is wrong. The result should be 1601 ``` select date_format('2020-02-19 16:01:12','%H%i'); ``` 2. Add constant Express plan test, ensure the FE constant Express compute result is right. 3. Remove the `castToInt ` function in `FEFunctions`, which is duplicated with `CastExpr::getResultValue` 4. Implement `getNodeExplainString` method for `UnionNode`	2020-02-20 20:45:45 +08:00
trueeyu	839ec45197	Remove llvm relative code from be/src/exec (#2955 ) Remove unused LLVM related codes of directory:be/src/exec (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/exec.	2020-02-20 20:43:26 +08:00
LingBin	da945c8278	Add log to track problem in small_file_mgr_test (#2951 ) This case will occasionally fail in regression testing, so we add some logs to help to solve it.	2020-02-20 02:21:35 -06:00
Mingyu Chen	180bf0251e	[Bug] Missing `in memory` property for restore meta info (#2950 )	2020-02-20 11:46:36 +08:00
HuangWei	ed299d5d8b	Create pprof_profile_dir before heap profiling (#2944 )	2020-02-20 10:41:04 +08:00
WingC	cc0d41277c	[Alter] Add more schema change to varchar type (#2777 )	2020-02-19 23:14:43 +08:00
LingBin	c617fc9064	Fix the flush_status bug in flush-executor (#2933 ) For a tablet, there may be multiple memtables, which will be flushed to disk one by one in the order of generation. If a memtable flush fails, then the load job will definitely fail, but the previous implementation will overwrite `_flush_status`, which may make the error can not be detected, leads to an error load job to be success. This patch also have two other changes: 1. Use `std::bind` to replace `boost::bind`; 2. Removes some unneeded headers.	2020-02-19 20:23:19 +08:00
Mingyu Chen	cfcc29fb21	[Bug] Missing `in memory` property for old version of partition info (#2948 ) This bug is introduced by PR #2846	2020-02-19 20:19:00 +08:00
kangkaisen	147953f09e	Fix some function with date type bug (#2947 ) The logic chain is following: 1. `date_format(if(, NULL, `dt`), '%Y%m%d')` as HASH_PARTITIONED exprs，which is not right, we should use Agg intermediate materialized slot 2. we don't use Agg intermediate materialized slot as HASH_PARTITIONED exprs, becasue ``` // the parent fragment is partitioned on the grouping exprs; // substitute grouping exprs to reference the output of the agg, not the input partitionExprs = Expr.substituteList(partitionExprs, node.getAggInfo().getIntermediateSmap(), ctx_.getRootAnalyzer(), false); parentPartition = DataPartition.hashPartitioned(partitionExprs); ``` the partitionExprs substitute failed。 3. partitionExprs substitute failed because partitionExprs has a casttodate child,but agg info getIntermediateSmap has a cast in datetime child. 4. The cast to date or cast to datetime child exist because `TupleIsNullPredicate` insert a `if` Expr. we don't have `if date` fn, so Doris use `if int` Expr. 5. the `date` in the `catstodate` depend on slot dt date type. the `datetime` in the `catstodatetime` depend on datetime arg type in `date_format` function. So we could fix this issue by make if fn support date type or make date_format fn support date type	2020-02-19 20:16:44 +08:00
Mingyu Chen	a015cd0c8b	[Alter] Change table's state right after all rollup jobs being cancelled	2020-02-19 19:45:35 +08:00
yangzhg	ceaa790793	[Alter] Drop index when index column is dropped (#2941 )	2020-02-19 17:57:27 +08:00
WingC	3994b52f34	[Alter] Change max create replicas timeout configurable (#2945 )	2020-02-19 17:47:27 +08:00
kangkaisen	a76f2b8211	bitmap_union_count support window function (#2902 )	2020-02-19 14:33:05 +08:00

1 2 3 4 5 ...

1564 Commits