doris

Author	SHA1	Message	Date
caiconghui	fe086ab92c	[Log] Change log level from warn to debug for unauthrorized exception (#2996 ) This PR is to remove some unused log for unauthorized exception, some unauthorized access such as LVS probe request may cause connection exception which we should ignore.	2020-02-27 09:29:06 +08:00
Yingchun Lai	1fbd34cd32	[Compile] Fix some build errors (#3005 ) `cmake` has been checked by `check_prerequest`, not needed to check it twice, and now `CMAKE_CMD` is a command not a file, check it by '-f' in shell will report an error.	2020-02-27 09:26:56 +08:00
Youngwb	e3d115af91	[Bug][Backup]Fix backup job block at UPLOAD_INFO phase (#3002 ) There is a case where the META upload succeeded but the upload INFO failed, in which case the UPLOAD_INFO task will try again, but the META file has succeeded and filename.part has been renamed to `filename.md5sum`. The retry task will keep failing with rename and cannot complete the backup job. Therefore, the `file.md5sum` file needs to be deleted in advance Fix #3001	2020-02-27 09:21:21 +08:00
EmmyMiao87	a3e588f39c	[MaterializedView] Implement new materialized view selector (#2821 ) This commit mainly implements the new materialized view selector which supports SPJ<->SPJG. Two parameters are currently used to regulate this function. 1. test_materialized_view: When this parameter is set to true, the user can create a materialized view for the duplicate table by using 'CREATE MATERIALIZED VIEW' command. At the same time, if the result of the new materialized views is different from the old version during the query, an error will be reported. This parameter is false by default, which means that the new version of the materialized view function cannot be enabled. 2. use_old_mv_selector: When this parameter is set to true, the result of the old version selector will be selected. If set to false, the result of the new version selector will be selected. This parameter is true by default, which means that the old selector is used. If the default values of the above two parameters do not change, there will be no behavior changes in the current version. The main steps for the new selector are as follows: 1. Predicates stage: This stage will mainly filter out all materialized views that do not meet the current query requirements. 2. Priorities stage: This stage will sort the results of the first stage and choose the best materialized view. The predicates phase is divided into 6 steps: 1. Calculate the predicate gap between the current query and view. 2. Whether the columns in the view can meet the needs of the compensating predicates. 3. Determine whether the group by columns of view match the group by columns of query. 4. Determine whether the aggregate columns of view match the aggregate columns of query. 5. Determine whether the output columns of view match the output columns of query. 6. Add partial materialized views The priorities phase is divided into two steps: 1. Find the materialized view that matches the best prefix index 2. Find the materialized view with the least amount of data The biggest difference between the current materialized view selector and the previous one is that it supports SPJ <-> SPJG.	2020-02-27 09:14:32 +08:00
trueeyu	7b39d604c3	Remove unused LLVM related codes of CMakeLists (#2910 ) (#2993 ) Remove unused LLVM related codes (step 6, the last step): CMakeLists (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code : CMakeLists	2020-02-26 15:43:22 +08:00
HangyuanLiu	e23d735bac	Fix decimal bug in orc load (#2984 )	2020-02-26 10:58:18 +08:00
trueeyu	0f98f975c7	Remove unused LLVM related codes of directory:be/src/codegen (#2910 ) (#2987 ) Remove unused LLVM related codes of directory (step 5):be/src/codegen (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/codegen	2020-02-26 10:57:57 +08:00
Yingchun Lai	57483ade00	[Doc] Fix typo in Chinese document (#2963 ) Fix some errors in Chinese document	2020-02-25 22:30:21 +08:00
Mingyu Chen	8f71b1025a	[Bug][Broker] Fix bug that Broker's alive status is inconsistent in different FEs In this CL, the isAlive field in FsBroker class will be persisted in metadata, to solve the problem describe in ISSUE: #2989 Notice: this CL update FeMetaVersion to 73	2020-02-25 22:27:27 +08:00
trueeyu	a340bc7a00	Remove unused LLVM related codes of directory:be/src/runtime (#2910 ) (#2985 ) Remove unused LLVM related codes of directory (step 4):be/src/runtime (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/runtime	2020-02-25 13:47:20 +08:00
trueeyu	099e0f74bd	Remove unused LLVM related codes of directory:be/src/exprs (#2910 ) (#2972 ) Remove unused LLVM related codes of directory (step 3):be/src/exprs (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/exprs	2020-02-24 18:23:08 +08:00
kangkaisen	fb5b58b75a	Add more constraints for bitmap column (#2966 )	2020-02-24 10:41:18 +08:00
Mingyu Chen	8eb413fa69	[Bug][RoutineLoad] Fix bug that routine Load encounter "label already used" exception (#2959 ) This CL modify 2 things: 1. When a routine load task submit failed, it will not be put back to the task queue. 2. The rpc timeout when executing a routine load task in BE is set to `query_timeout` of the task plan. ISSUE: #2964	2020-02-22 22:01:14 +08:00
wyb	fc2d92d68a	Update spark load doc (#2973 )	2020-02-22 12:00:50 +08:00
yangzhg	3e6dfa31c4	[UnitTest] Fix BE unit test randomly failed (#2970 ) * fix http server related unit test failed due to http port has been used * fix unit test failed in DEBUG build type	2020-02-21 22:21:02 +08:00
trueeyu	96248058a1	[Doc] Modify the default port of restore meta instance in document (#2971 )	2020-02-21 22:05:30 +08:00
trueeyu	30549ce8f7	Remove unused LLVM related codes of directory:be/src/util,be/src/udf (#2910 ) (#2968 ) Remove unused LLVM related codes of directory (step 2):be/src/util,be/src/udf (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/util,be/src/udf	2020-02-21 20:42:42 +08:00
trueeyu	3b8e9d8dcf	[UT] Fix the test case of SegmentReaderWriterTest::TestBitmapPredicate (#2961 ) function create_int_key() will create a TableColumn instance with data memger: _aggregation=(random value) if _aggregation==OLAP_FIELD_AGGREGATION_REPLACE SegmentWriter::init() will set opts.need_bitmap_index = false; so the test case TEST_F(SegmentReaderWriterTest, TestBitmapPredicate) of olap/rowset/segment_v2/segment_test.cpp will exec failed if the_aggregation of TableColumn == OLAP_FIELD_AGGREGATION_REPLACE. ``` TEST_F(SegmentReaderWriterTest, TestBitmapPredicate) { TabletSchema tablet_schema = create_schemate({ create_int_key(1, true, false, true), create_int_key(2, true, false, true), create_int_value(3), create_int_value(4)}); ... ASSERT_TRUE(segment->footer().columns(0).has_bitmap_index()); ... } ```	2020-02-21 17:16:49 +08:00
LingBin	8291a45267	Assign each status type an constant explicitly (#2960 ) The `TStatusCode` struct is used in all FEs and BEs. In order to be able to avoid errors when identifying status_codes in RPC when upgrading Doris (update and restart the servers one by one), we must ensure that each element always a fixed value. If each element is not explicitly assigned a constant, then the value of each element will be assigned from 0 in turn, which will need us to be very careful when adding and removing elements, to avoid the same element on different machines to be recognized as a different value. i.e., new elements can only be added to the end, and only elements at the end can be deleted. Unfortunately, this implicit constraint is likely to be ignored by programmers when coding, especially those who are new to Doris. No functional change in this patch.	2020-02-21 01:34:12 -06:00
kangpinghuang	70d2ccf384	Add spark load design (#2856 )	2020-02-21 14:32:18 +08:00
Mingyu Chen	35b09ecd66	[JDK] Support OpenJDK (#2804 ) Support compile and running Frontend process and Broker process with OpenJDK. OpenJDK 13 is tested.	2020-02-20 23:47:02 +08:00
wutiangan	ccc3412f13	Fix bug: Error of exporting double type data to hdfs (#2924 ) (#2925 )	2020-02-20 21:06:50 +08:00
kangkaisen	ece8740c1b	Fix some function DATE type priority (#2952 ) 1. Fix the bug introduced by https://github.com/apache/incubator-doris/pull/2947. The following sql result is 0000, which is wrong. The result should be 1601 ``` select date_format('2020-02-19 16:01:12','%H%i'); ``` 2. Add constant Express plan test, ensure the FE constant Express compute result is right. 3. Remove the `castToInt ` function in `FEFunctions`, which is duplicated with `CastExpr::getResultValue` 4. Implement `getNodeExplainString` method for `UnionNode`	2020-02-20 20:45:45 +08:00
trueeyu	839ec45197	Remove llvm relative code from be/src/exec (#2955 ) Remove unused LLVM related codes of directory:be/src/exec (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/exec.	2020-02-20 20:43:26 +08:00
LingBin	da945c8278	Add log to track problem in small_file_mgr_test (#2951 ) This case will occasionally fail in regression testing, so we add some logs to help to solve it.	2020-02-20 02:21:35 -06:00
Mingyu Chen	180bf0251e	[Bug] Missing `in memory` property for restore meta info (#2950 )	2020-02-20 11:46:36 +08:00
HuangWei	ed299d5d8b	Create pprof_profile_dir before heap profiling (#2944 )	2020-02-20 10:41:04 +08:00
WingC	cc0d41277c	[Alter] Add more schema change to varchar type (#2777 )	2020-02-19 23:14:43 +08:00
LingBin	c617fc9064	Fix the flush_status bug in flush-executor (#2933 ) For a tablet, there may be multiple memtables, which will be flushed to disk one by one in the order of generation. If a memtable flush fails, then the load job will definitely fail, but the previous implementation will overwrite `_flush_status`, which may make the error can not be detected, leads to an error load job to be success. This patch also have two other changes: 1. Use `std::bind` to replace `boost::bind`; 2. Removes some unneeded headers.	2020-02-19 20:23:19 +08:00
Mingyu Chen	cfcc29fb21	[Bug] Missing `in memory` property for old version of partition info (#2948 ) This bug is introduced by PR #2846	2020-02-19 20:19:00 +08:00
kangkaisen	147953f09e	Fix some function with date type bug (#2947 ) The logic chain is following: 1. `date_format(if(, NULL, `dt`), '%Y%m%d')` as HASH_PARTITIONED exprs，which is not right, we should use Agg intermediate materialized slot 2. we don't use Agg intermediate materialized slot as HASH_PARTITIONED exprs, becasue ``` // the parent fragment is partitioned on the grouping exprs; // substitute grouping exprs to reference the output of the agg, not the input partitionExprs = Expr.substituteList(partitionExprs, node.getAggInfo().getIntermediateSmap(), ctx_.getRootAnalyzer(), false); parentPartition = DataPartition.hashPartitioned(partitionExprs); ``` the partitionExprs substitute failed。 3. partitionExprs substitute failed because partitionExprs has a casttodate child,but agg info getIntermediateSmap has a cast in datetime child. 4. The cast to date or cast to datetime child exist because `TupleIsNullPredicate` insert a `if` Expr. we don't have `if date` fn, so Doris use `if int` Expr. 5. the `date` in the `catstodate` depend on slot dt date type. the `datetime` in the `catstodatetime` depend on datetime arg type in `date_format` function. So we could fix this issue by make if fn support date type or make date_format fn support date type	2020-02-19 20:16:44 +08:00
Mingyu Chen	a015cd0c8b	[Alter] Change table's state right after all rollup jobs being cancelled	2020-02-19 19:45:35 +08:00
yangzhg	ceaa790793	[Alter] Drop index when index column is dropped (#2941 )	2020-02-19 17:57:27 +08:00
WingC	3994b52f34	[Alter] Change max create replicas timeout configurable (#2945 )	2020-02-19 17:47:27 +08:00
kangkaisen	a76f2b8211	bitmap_union_count support window function (#2902 )	2020-02-19 14:33:05 +08:00
caiconghui	87a84a793e	Add more doc content about hdfs broker auth and config detail (#2935 )	2020-02-18 21:15:33 +08:00
lichaoyong	1cf0fb9117	Use ThreadPool to refactor MemTableFlushExecutor (#2931 ) 1. MemTableFlushExecutor maintain a ThreadPool to receive FlushTask. 2. FlushToken is used to seperate different tasks from different tablets. Every DeltaWriter of tablet constructs a FlushToken, task in FlushToken are handle serially, task between FlushToken are handle concurrently. 3. I have remove thread limit on data_dir, because of I/O is not the main timer consumer of Flush thread. Much of time is consumed in CPU decoding and compress.	2020-02-18 18:39:04 +08:00
lichaoyong	3f4e18633d	[util] Add Apache License 2.0 to Thread (#2928 )	2020-02-18 15:36:49 +08:00
LingBin	32e998f6e9	[ut] Delete files generated by UT when teardown (#2930 ) If these residual files are not deleted, the UT will fail because the corresponding files already exist when running multiple times.	2020-02-18 15:35:11 +08:00
yangzhg	7be2871c36	[GroupingSet] Disable column both in select list and aggregate functions when using GROUPING SETS/CUBE/ROLLUP (#2921 )	2020-02-18 13:56:56 +08:00
LingBin	b3c5f0fac7	Remove unneeded headers included in agent-util (#2929 )	2020-02-18 13:18:56 +08:00
kangkaisen	625411bd28	Doris support in memory olap table (#2847 )	2020-02-18 10:45:54 +08:00
wangbo	11b43700b9	[Alter] Fix pending AlterJobV2 replay bug (#2922 ) Call replayPending method when load pending status AlterJobV2. So that the tablet and replica won't missing in TabletInvertedIndex.	2020-02-17 23:02:18 +08:00
Mingyu Chen	0fb52c514b	[UDF] Fix bug that UDF can't handle constant null value (#2914 ) This CL modify the `evalExpr()` of ExpressionFunctions, so that it won't change the `FunctionCallExpr` to `NullLiteral` when there is null parameter in UDF. Which will fix the problem described in ISSUE: #2913	2020-02-17 22:13:50 +08:00
yangzhg	1089f09d26	[Syntax] Fix bug introduced by #2906 (#2917 )	2020-02-17 21:41:03 +08:00
worker24h	1f844946e9	Fixbug: Invalid memory address in doris::memory_copy (#2919 ) (#2923 ) When I change schema from char(20) to varchar(20), be will cause coredump.	2020-02-17 18:48:38 +08:00
LingBin	feef077520	Some refactors on `TabletManager` (#2918 ) 1. Add some comments to make the code easier to understand; 2. Make the metric `create_tablet_requests_failed` to be accurate; 3. Some internal methods use naked pointers directly instead of `shared_ptr`; 4. The `using` in `.h` files are contagious when included by other files, so we should only use it in `.cpp` files; 5. Some formatting changes: such as wrapping lines that are too long 6. Parameters that need to be modified, use pointers instead of references No functional changes in this patch.	2020-02-17 14:50:29 +08:00
lichaoyong	f20eb12457	[util] Import ThreadPool and Thread from KUDU (#2915 ) Thread pool design point: All tasks submitted directly to the thread pool enter a FIFO queue and are dispatched to a worker thread when one becomes free. Tasks may also be submitted via ThreadPoolTokens. The token wait() and shutdown() functions can then be used to block on logical groups of tasks. A token operates in one of two ExecutionModes, determined at token construction time: 1. SERIAL: submitted tasks are run one at a time. 2. CONCURRENT: submitted tasks may be run in parallel. This isn't unlike submitted without a token, but the logical grouping that tokens impart can be useful when a pool is shared by many contexts (e.g. to safely shut down one context, to derive context-specific metrics, etc.). Tasks submitted without a token or via ExecutionMode::CONCURRENT tokens are processed in FIFO order. On the other hand, ExecutionMode::SERIAL tokens are processed in a round-robin fashion, one task at a time. This prevents them from starving one another. However, tokenless (and CONCURRENT token-based) tasks can starve SERIAL token-based tasks. Thread design point: 1. It is a thin wrapper around pthread that can register itself with the singleton ThreadMgr (a private class implemented in thread.cpp entirely, which tracks all live threads so that they may be monitored via the debug webpages). This class has a limited subset of boost::thread's API. Construction is almost the same, but clients must supply a category and a name for each thread so that they can be identified in the debug web UI. Otherwise, join() is the only supported method from boost::thread. 2. Each Thread object knows its operating system thread ID (TID), which can be used to attach debuggers to specific threads, to retrieve resource-usage statistics from the operating system, and to assign threads to resource control groups. 3. Threads are shared objects, but in a degenerate way. They may only have up to two referents: the caller that created the thread (parent), and the thread itself (child). Moreover, the only two methods to mutate state (join() and the destructor) are constrained: the child may not join() on itself, and the destructor is only run when there's one referent left. These constraints allow us to access thread internals without any locks.	2020-02-17 11:22:09 +08:00
HangyuanLiu	43583e7bd2	Fix orc load bug (#2912 )	2020-02-16 19:14:42 +08:00
kangkaisen	6c33f80544	Add disable_storage_page_cache config (#2890 ) 1. when read column data page: for compaction, schema_change, check_sum: we don't use page cache for query and config::disable_storage_page_cache is false, we use page cache 2. when read column index page if config::disable_storage_page_cache is false, we use page cache	2020-02-16 19:13:30 +08:00

... 319 320 321 322 323 ...

17549 Commits