Commit Graph

780 Commits

Author SHA1 Message Date
aa58cd99d9 Fix disks_total_capacity metric bug (#2988)
Now disks_total_capacity metric is a user specified capacity, but
disks_avail_capacity is the disk's actual available capacity, so
disks_total_capacity may be less than disks_avail_capacity, and
UsedPct on FE may be a negative number as a result.
We'd better to use disk actual capacity for disks_total_capacity metric.
2020-03-02 19:09:50 +08:00
0d1e28746e [Function] Support null_or_empty function (#2977)
It returns true if the string is empty or NULL. Otherwise it returns false.
2020-03-01 17:35:45 +08:00
58b8e3f574 [Fs Block] Add block layer to storage-engine (#2983)
The abstraction of the Block layer, inspired by Kudu, lies between the "business
layer" and the "underlying file storage layer" (`Env`), making them no longer
strongly coupled.

In this way, for the business layer (such as `SegmentWriter`),
there is no need to directly do the file operation, which will bring better
encapsulation. An ideal situation in the future is: when we need to support a
new file storage system, we only need to add a corresponding type of
BlockManager without modifying the business code (such as `SegmentWriter`).

With the Block layer, there are some benefits:

1. First and foremost, the mapping relationship between data and `Env` is more
   flexible. For example, in the storage engine, the data of the tablet can be
   placed in multiple file systems (`Env`) at the same time. That is, one-to-many
   relationships can be supported. For example: one on the local and one on the
   remote storage.
2. The mapping relationship between blocks and files can be adjusted, for example,
   it may not be a one-to-one relationship. For example, the data of multiple
   blocks can be stored in a physical file, which can reduce the number of files
   that need to be opened during querying. It is like `LogBlockManager` in Kudu.
3. We can move the opened-file-cache under the Block layer, which can automatically
   close and open the files used by the upper layer, so that the upper business
   level does not need to be aware of the restrictions of the file handle at all
   (This problem is often encountered online now).
4. Better automatic cleanup logic when there are exceptions. For example, a block
   that is not closed explicitly can automatically clean up its corresponding file,
   thereby avoiding generating most garbage files.
5. More convenient for batch file creation and deletion. Some business operations
   create multiple files, such as compaction. At present, the processing flow that
   these files go through is executed one by one: 1) creation; 2) writing data;
   3) fsync to disk. But in fact, this is not necessary, we only need to fsync this
   batch of files at the end. The advantage is that it can give the operating system
   more opportunities to perform IO merge, thereby improving performance. However,
   this operation is relatively tedious, there is no need to be coupled in the
   business code, it is an ideal place to put it in the Block layer.

This is the first patch, just add related classes, laying the groundwork for later
switching of read and write logic.
2020-03-01 10:48:00 +08:00
f2d2e4bffd [Unused] Remove unused GC function in DataDir (#3019) 2020-02-28 21:47:41 +08:00
3b5a0b6060 [TPCDS] Implement the planner for set operation (#2957)
Implement intersect and except planner.
This CL does not implement intersect and except node in execution level.
2020-02-27 16:03:31 +08:00
d2d95bfa84 [segment_v2] Switch to Unified and Extensible Page Format (#2953)
Fixes #2892 

IMPORTANT NOTICE: this CL makes incompatible changes to V2 storage format, developers need to create new tables for test.

This CL refactors the metadata and page format for segment_v2 in order to
* make it easy to extend existing page type
* make it easy to add new page type while not sacrificing code reuse
* make it possible to use SIMD to speed up page decoding

Here we summary the main code changes
* Page and index metadata is redesigned, please see `segment_v2.proto`
* The new class `PageIO` is the single place for reading and writing all pages. This removes lots of duplicated code. `PageCompressor` and `PageDecompressor` are now useless and removed. 
* The type of value ordinal is changed from `rowid_t` to 64-bits `ordinal_t`, this affects ordinal index as well.
* Column's ordinal index is now implemented by IndexPage, the same with IndexedColumn.
* Zone map index is now implemented by IndexedColumn
2020-02-27 15:09:57 +08:00
de4e621427 use canonical path in DiskInfo::get_disk_devices() (#3000) 2020-02-27 11:00:50 +08:00
7b39d604c3 Remove unused LLVM related codes of CMakeLists (#2910) (#2993)
Remove unused LLVM related codes (step 6, the last step): CMakeLists (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code : CMakeLists
2020-02-26 15:43:22 +08:00
e23d735bac Fix decimal bug in orc load (#2984) 2020-02-26 10:58:18 +08:00
0f98f975c7 Remove unused LLVM related codes of directory:be/src/codegen (#2910) (#2987)
Remove unused LLVM related codes of directory (step 5):be/src/codegen (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/codegen
2020-02-26 10:57:57 +08:00
a340bc7a00 Remove unused LLVM related codes of directory:be/src/runtime (#2910) (#2985)
Remove unused LLVM related codes of directory (step 4):be/src/runtime (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/runtime
2020-02-25 13:47:20 +08:00
099e0f74bd Remove unused LLVM related codes of directory:be/src/exprs (#2910) (#2972)
Remove unused LLVM related codes of directory (step 3):be/src/exprs (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/exprs
2020-02-24 18:23:08 +08:00
8eb413fa69 [Bug][RoutineLoad] Fix bug that routine Load encounter "label already used" exception (#2959)
This CL modify 2 things:

1. When a routine load task submit failed, it will not be put back to the task queue.
2. The rpc timeout when executing a routine load task in BE is set to `query_timeout` of the task plan.

ISSUE: #2964
2020-02-22 22:01:14 +08:00
3e6dfa31c4 [UnitTest] Fix BE unit test randomly failed (#2970)
* fix http server related unit test failed due to http port has been used
* fix unit test failed in DEBUG build type
2020-02-21 22:21:02 +08:00
30549ce8f7 Remove unused LLVM related codes of directory:be/src/util,be/src/udf (#2910) (#2968)
Remove unused LLVM related codes of directory (step 2):be/src/util,be/src/udf (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/util,be/src/udf
2020-02-21 20:42:42 +08:00
3b8e9d8dcf [UT] Fix the test case of SegmentReaderWriterTest::TestBitmapPredicate (#2961)
function create_int_key() will create a TableColumn instance with data memger: _aggregation=(random value)

if _aggregation==OLAP_FIELD_AGGREGATION_REPLACE SegmentWriter::init() will set opts.need_bitmap_index = false;

so the test case TEST_F(SegmentReaderWriterTest, TestBitmapPredicate)  of olap/rowset/segment_v2/segment_test.cpp will exec failed if the_aggregation of TableColumn == OLAP_FIELD_AGGREGATION_REPLACE.

```
TEST_F(SegmentReaderWriterTest, TestBitmapPredicate) {
    TabletSchema tablet_schema = create_schemate({
        create_int_key(1, true, false, true),
        create_int_key(2, true, false, true),
        create_int_value(3),
        create_int_value(4)});
       ...
      ASSERT_TRUE(segment->footer().columns(0).has_bitmap_index());
      ...
}
```
2020-02-21 17:16:49 +08:00
35b09ecd66 [JDK] Support OpenJDK (#2804)
Support compile and running Frontend process and Broker process with OpenJDK.
OpenJDK 13 is tested.
2020-02-20 23:47:02 +08:00
ccc3412f13 Fix bug: Error of exporting double type data to hdfs (#2924) (#2925) 2020-02-20 21:06:50 +08:00
839ec45197 Remove llvm relative code from be/src/exec (#2955)
Remove unused LLVM related codes of directory:be/src/exec (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/exec.
2020-02-20 20:43:26 +08:00
da945c8278 Add log to track problem in small_file_mgr_test (#2951)
This case will occasionally fail in regression testing, so we add
some logs to help to solve it.
2020-02-20 02:21:35 -06:00
ed299d5d8b Create pprof_profile_dir before heap profiling (#2944) 2020-02-20 10:41:04 +08:00
cc0d41277c [Alter] Add more schema change to varchar type (#2777) 2020-02-19 23:14:43 +08:00
c617fc9064 Fix the flush_status bug in flush-executor (#2933)
For a tablet, there may be multiple memtables, which will be
flushed to disk one by one in the order of generation.

If a memtable flush fails, then the load job will definitely
fail, but the previous implementation will overwrite `_flush_status`,
which may make the error can not be detected, leads to an error
load job to be success.

This patch also have two other changes:
1. Use `std::bind` to replace `boost::bind`;
2. Removes some unneeded headers.
2020-02-19 20:23:19 +08:00
a76f2b8211 bitmap_union_count support window function (#2902) 2020-02-19 14:33:05 +08:00
1cf0fb9117 Use ThreadPool to refactor MemTableFlushExecutor (#2931)
1. MemTableFlushExecutor maintain a ThreadPool to receive FlushTask.
2. FlushToken is used to seperate different tasks from different tablets.
   Every DeltaWriter of tablet constructs a FlushToken,
   task in FlushToken are handle serially, task between FlushToken are
   handle concurrently.
3. I have remove thread limit on data_dir, because of I/O is not the main
   timer consumer of Flush thread. Much of time is consumed in CPU decoding
   and compress.
2020-02-18 18:39:04 +08:00
3f4e18633d [util] Add Apache License 2.0 to Thread (#2928) 2020-02-18 15:36:49 +08:00
32e998f6e9 [ut] Delete files generated by UT when teardown (#2930)
If these residual files are not deleted, the UT will fail because
the corresponding files already exist when running multiple times.
2020-02-18 15:35:11 +08:00
b3c5f0fac7 Remove unneeded headers included in agent-util (#2929) 2020-02-18 13:18:56 +08:00
625411bd28 Doris support in memory olap table (#2847) 2020-02-18 10:45:54 +08:00
1f844946e9 Fixbug: Invalid memory address in doris::memory_copy (#2919) (#2923)
When I change schema from char(20) to varchar(20), be will cause coredump.
2020-02-17 18:48:38 +08:00
feef077520 Some refactors on TabletManager (#2918)
1. Add some comments to make the code easier to understand;
2. Make the metric `create_tablet_requests_failed` to be accurate;
3. Some internal methods use naked pointers directly instead of `shared_ptr`;
4. The `using` in `.h` files are contagious when included by other files,
    so we should only use it in `.cpp` files;
5. Some formatting changes: such as wrapping lines that are too long
6. Parameters that need to be modified, use pointers instead of references

No functional changes in this patch.
2020-02-17 14:50:29 +08:00
f20eb12457 [util] Import ThreadPool and Thread from KUDU (#2915)
Thread pool design point:
  All tasks submitted directly to the thread pool enter a FIFO queue and are
dispatched to a worker thread when one becomes free. Tasks may also be
submitted via ThreadPoolTokens. The token wait() and shutdown() functions
can then be used to block on logical groups of tasks.
  A token operates in one of two ExecutionModes, determined at token
construction time:
  1. SERIAL: submitted tasks are run one at a time.
  2. CONCURRENT: submitted tasks may be run in parallel.
     This isn't unlike submitted without a token, but the logical grouping that tokens
     impart can be useful when a pool is shared by many contexts (e.g. to
     safely shut down one context, to derive context-specific metrics, etc.).
Tasks submitted without a token or via ExecutionMode::CONCURRENT tokens are
processed in FIFO order. On the other hand, ExecutionMode::SERIAL tokens are
processed in a round-robin fashion, one task at a time. This prevents them
from starving one another. However, tokenless (and CONCURRENT token-based)
tasks can starve SERIAL token-based tasks.

Thread design point:
  1. It is a thin wrapper around pthread that can register itself with the singleton ThreadMgr
(a private class implemented in thread.cpp entirely, which tracks all live threads so
that they may be monitored via the debug webpages). This class has a limited subset of
boost::thread's API. Construction is almost the same, but clients must supply a
category and a name for each thread so that they can be identified in the debug web
UI. Otherwise, join() is the only supported method from boost::thread.
  2. Each Thread object knows its operating system thread ID (TID), which can be used to
attach debuggers to specific threads, to retrieve resource-usage statistics from the
operating system, and to assign threads to resource control groups.
  3. Threads are shared objects, but in a degenerate way. They may only have
up to two referents: the caller that created the thread (parent), and
the thread itself (child). Moreover, the only two methods to mutate state
(join() and the destructor) are constrained: the child may not join() on
itself, and the destructor is only run when there's one referent left.
These constraints allow us to access thread internals without any locks.
2020-02-17 11:22:09 +08:00
43583e7bd2 Fix orc load bug (#2912) 2020-02-16 19:14:42 +08:00
6c33f80544 Add disable_storage_page_cache config (#2890)
1. when read column data page:
    for compaction, schema_change, check_sum: we don't use page cache
    for query and config::disable_storage_page_cache is false, we use page cache
2. when read column index page
    if config::disable_storage_page_cache is false, we use page cache
2020-02-16 19:13:30 +08:00
9ee1704859 [util] Import util tools from KUDU (#2905)
1. MonoTime/MonoDelta
   MonoTime: The MonoTime represents a particular point in time, relative to some fixed but unspecified reference point.
   MonoDelta: The MonoDelta class represents an elapsed duration of time, the delta between two MonoTime instances.

2. CountDownLatch
   This is a C++ implementation of the Java CountDownLatch
2020-02-14 18:01:16 +08:00
09a4d3e50a [gutil] import scoped_refptr smart pointer from KUDU (#2899)
scoped_refptr is used to replace std::shared_ptr, is generally faster and smaller.
advantage
  (1) only requires a single allocation, and ref count is on the same cache line as the object
  (2) the pointer only requires 8 bytes (since the ref count is within the object)
  (3) you can manually increase or decrease reference counts when more control is required
  (4) you can convert from a raw pointer back to a scoped_refptr safely without worrying about double freeing
  (5) since we control the implementation, we can implement features, such as debug builds that capture the stack trace of every referent to help debug leaks.
disadvantage
  (1) the referred-to object must inherit from RefCounted
  (2) does not support the weak_ptr use cases
2020-02-14 13:32:03 +08:00
d2625a26aa [env] Add env-util class (#2898)
The code submitted later will use this utility class.

Currently only factory methods for various file types are provided.
In the future, tool methods that are common to all Env types can
be added here.
2020-02-14 10:04:51 +08:00
fd492e3b6f [Doris on ES] Support escape character (#2865) 2020-02-13 11:32:48 +08:00
3c539aac54 [Refactor] Some tiny refactor on streaming-load related code (#2891)
Mainly contains the following modifications:
1. Use `std::unique_ptr` to replace some naked pointers
2. Modify some methods from member-method to local-static-function
3. Modify some methods do not need to be public to private
4. Some formatting changes: such as wrapping lines that are too long
5. Remove some useless variables
6. Add or modify some comments for easier understanding

No functional changes in this patch.
2020-02-13 10:42:52 +08:00
3e160aeb66 [GroupingSet] fix a bug when using grouping set without all column in a grouping set item (#2877)
fix a bug when using grouping sets without all column in a grouping set item will produce wrong value.
fix grouping function check will not work in group by clause
2020-02-12 21:50:12 +08:00
e9ff40f07f Add sync_dir interface to Env (#2884)
when we need to ensure that **a newly-created file** is fully
synchronized back to disk, we should call `fsync()` on the parent
directory—that is, the directory containing the newly-created file.
That is to say, In this situation, we should call `fsync()` on
both the newly-created file and its parent directory.

Unfortunately, currently in Doris, in any scenario, directories
are not fsynced.

This patch adds `sync_dir()` interface first, laying the groundwork
for future fixes.

This patch also removes unneeded private method `dir_exists()`.
2020-02-12 13:55:17 +08:00
5440e19d01 Improve the triggering strategy of BE report (#2881)
Currently, the report from BE to FE is completed in the background
threads of `AgentServer` (`report_tablet_thread` and
`report_disk_stat_thread`).  These two threads will sleep and be in
a standby state after each report, if there is any need to report
immediately, they will be notified and wake up immediately to report.

For example, when background thread (`disk_monitor_thread`) in
`StorageEngine` finds some tablets were deleted, it will notify
`AgentServer` to trigger a report immediately.

In the current implementation, in order to report ASAP, a local variable
(`_is_drop_tables`) and two other flags are used to record whether
reporting is needed, and then `StorageEngine::disk_monitor_thread` checks
the value of this variable every time it runs, to determine whether it
needs to be triggered Reporting. This is actually superfluous, and it
may result in untimely notifications, as shown below:

```
(thread_1)        (thread_2)
disk-monitor     disk-stat-reporter
    |                  |
    |               reporting
    |                  |
  notify_1             |
    |                  |
    |                wait_for_notify(will wait until timeout or next notification)
    |                  |
    V                  V
```

When `report_tablet_thread` has not started waiting,
`StorageEngine::disk_monitor_thread` triggers a notification, so this
notification will not be received by `report_tablet_thread`,
resulting in the BE not reporting to the FE until the lock times out
or the next round of `disk_monitor_thread` detection.

This change restructures the triggering implementation, and solves the above problem.

This change also changes some methods(that do not need to be public) to private.
2020-02-11 20:38:44 +08:00
3a8e783444 Compatible with python3 in build (#2876) 2020-02-10 21:50:42 +08:00
4e151b1551 Remove boost exception when parse store path (#2861) 2020-02-10 17:50:52 +08:00
c89d0a090c Fix bug that _min_percentage_of_error_disk was not initialized (#2867)
In StorageEngine, the variable _min_percentage_of_error_disk was not
initialized (so it defaults to 0), which causes the process to exit
whenever one disk fails.
What we expect is that exit the process only when the number of
failed disks reach a certain percentage.
Also, this variable should mean the maximum percentage of
error disks allowed, not the minimum, so change the configuration
name to max_percentage_of_error_disk.
2020-02-10 16:58:24 +08:00
7037754978 Fix a bug that TabletsChannel may be written after cancel (#2870)
TabletsChannel may be written after cancelation, leading to core at DeltaWriter::write. We should check the state of TabletsChannel at the beginning of each operations.
2020-02-10 14:49:00 +08:00
77805e85d2 Fix lock type when clear trash (#2868)
In `TabletManager::start_trash_swee`, the modification of `_tablet_map`
should be protected by `write-lock` of `_tablet_map_lock`
2020-02-10 13:14:17 +08:00
502fa2eb50 [GroupingSet] Fix core when using grouping sets in large data (#2858)
dst_tuples memory size to Allocate is wrong
2020-02-07 21:40:29 +08:00
e7817053cc [Uitls] ParseUtil::parse_mem_spec support K and T suffix (#2854) 2020-02-07 09:31:35 +08:00
b35e8153c0 [Doris on Es] Fix lte and gte error expression (#2851)
LE should LTE
GE should GTE
2020-02-06 20:52:14 +08:00