Commit Graph

259 Commits

Author SHA1 Message Date
d4c1938b5c Open datetime min value limit (#3158)
the min_value in olap/type.h of datetime is 0000-01-01 00:00:00, so we don't need restrict datetime min in tablet_sink
2020-03-24 10:52:57 +08:00
47a3d5000b [UnitTest] Fix unit test bug in BetaRowset and PageCacheTest (#3157)
1. BlockManager has been added into StorageEngine.
   So StorageEngine should be initialized when starting BetaRowset unit test.

2. Cache should not use the same buf to store value, otherwise the address
   will be freed twice and crash.
2020-03-20 20:37:50 +08:00
5a8fcd263f [CodeStyle] Delete obsolete code of partition_aggregation_node and partitioned_hash_table (#3162) 2020-03-20 16:25:29 +08:00
c08d6e4708 [tablet meta] Do some refactor on TabletMeta (#3136)
remove some functions' return value which always return OLAP_SUCCESS
optimize some loops
2020-03-20 15:03:22 +08:00
d01b58bff6 Support 64 bit timestamp in from_unixtime (#3069)
Support 64 bit timestamp in from_unixtime
2020-03-17 17:30:42 +08:00
64a06ea9d4 [UT] Fix some BE unit tests (#3110)
And also support graceful exit for StorageEngine to avoid hang too long
time in unit test.
2020-03-16 13:31:44 +08:00
608917c04d Use block layer to write files (#3064)
This is the second patch following 58b8e3f574614433ea9e0c427961f2efb3476c2a,

This patch use block-layer to write files.
2020-03-11 12:11:25 +08:00
a1f5b57011 Support sharding tablet_map_lock into more small map locks to make good performance for tablet manage task (#3051)
Support sharding tablet_map_lock into more small map locks to make good performance for tablet manage task
2020-03-09 16:29:56 +08:00
1d296e907d Fix orc load timestamp bug (#3047)
The timestamp value load from orc file is error, the value has an offset with hive and spark.
Becuase the time zone of orc's timestamp is stored inside orc's stripe information, so the timestamp obtained here is an offset timestamp, so parse timestamp with UTC is actual datetime literal.
2020-03-06 18:03:27 +08:00
fca6c4e523 Fix bitmap null crash (#3042) 2020-03-05 21:30:32 +08:00
cc1a5fb8ea [Function] Support '%' in date format string (#3037)
eg:
select str_to_date('2014-12-21 12%3A34%3A56', '%Y-%m-%d %H%%3A%i%%3A%s');
select unix_timestamp('2007-11-30 10:30%3A19', '%Y-%m-%d %H:%i%%3A%s');

This also enable us to extract column fields from HDFS file path with contains '%'.
2020-03-05 08:56:02 +08:00
0d1e28746e [Function] Support null_or_empty function (#2977)
It returns true if the string is empty or NULL. Otherwise it returns false.
2020-03-01 17:35:45 +08:00
58b8e3f574 [Fs Block] Add block layer to storage-engine (#2983)
The abstraction of the Block layer, inspired by Kudu, lies between the "business
layer" and the "underlying file storage layer" (`Env`), making them no longer
strongly coupled.

In this way, for the business layer (such as `SegmentWriter`),
there is no need to directly do the file operation, which will bring better
encapsulation. An ideal situation in the future is: when we need to support a
new file storage system, we only need to add a corresponding type of
BlockManager without modifying the business code (such as `SegmentWriter`).

With the Block layer, there are some benefits:

1. First and foremost, the mapping relationship between data and `Env` is more
   flexible. For example, in the storage engine, the data of the tablet can be
   placed in multiple file systems (`Env`) at the same time. That is, one-to-many
   relationships can be supported. For example: one on the local and one on the
   remote storage.
2. The mapping relationship between blocks and files can be adjusted, for example,
   it may not be a one-to-one relationship. For example, the data of multiple
   blocks can be stored in a physical file, which can reduce the number of files
   that need to be opened during querying. It is like `LogBlockManager` in Kudu.
3. We can move the opened-file-cache under the Block layer, which can automatically
   close and open the files used by the upper layer, so that the upper business
   level does not need to be aware of the restrictions of the file handle at all
   (This problem is often encountered online now).
4. Better automatic cleanup logic when there are exceptions. For example, a block
   that is not closed explicitly can automatically clean up its corresponding file,
   thereby avoiding generating most garbage files.
5. More convenient for batch file creation and deletion. Some business operations
   create multiple files, such as compaction. At present, the processing flow that
   these files go through is executed one by one: 1) creation; 2) writing data;
   3) fsync to disk. But in fact, this is not necessary, we only need to fsync this
   batch of files at the end. The advantage is that it can give the operating system
   more opportunities to perform IO merge, thereby improving performance. However,
   this operation is relatively tedious, there is no need to be coupled in the
   business code, it is an ideal place to put it in the Block layer.

This is the first patch, just add related classes, laying the groundwork for later
switching of read and write logic.
2020-03-01 10:48:00 +08:00
d2d95bfa84 [segment_v2] Switch to Unified and Extensible Page Format (#2953)
Fixes #2892 

IMPORTANT NOTICE: this CL makes incompatible changes to V2 storage format, developers need to create new tables for test.

This CL refactors the metadata and page format for segment_v2 in order to
* make it easy to extend existing page type
* make it easy to add new page type while not sacrificing code reuse
* make it possible to use SIMD to speed up page decoding

Here we summary the main code changes
* Page and index metadata is redesigned, please see `segment_v2.proto`
* The new class `PageIO` is the single place for reading and writing all pages. This removes lots of duplicated code. `PageCompressor` and `PageDecompressor` are now useless and removed. 
* The type of value ordinal is changed from `rowid_t` to 64-bits `ordinal_t`, this affects ordinal index as well.
* Column's ordinal index is now implemented by IndexPage, the same with IndexedColumn.
* Zone map index is now implemented by IndexedColumn
2020-02-27 15:09:57 +08:00
e23d735bac Fix decimal bug in orc load (#2984) 2020-02-26 10:58:18 +08:00
a340bc7a00 Remove unused LLVM related codes of directory:be/src/runtime (#2910) (#2985)
Remove unused LLVM related codes of directory (step 4):be/src/runtime (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/runtime
2020-02-25 13:47:20 +08:00
099e0f74bd Remove unused LLVM related codes of directory:be/src/exprs (#2910) (#2972)
Remove unused LLVM related codes of directory (step 3):be/src/exprs (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/exprs
2020-02-24 18:23:08 +08:00
3e6dfa31c4 [UnitTest] Fix BE unit test randomly failed (#2970)
* fix http server related unit test failed due to http port has been used
* fix unit test failed in DEBUG build type
2020-02-21 22:21:02 +08:00
da945c8278 Add log to track problem in small_file_mgr_test (#2951)
This case will occasionally fail in regression testing, so we add
some logs to help to solve it.
2020-02-20 02:21:35 -06:00
cc0d41277c [Alter] Add more schema change to varchar type (#2777) 2020-02-19 23:14:43 +08:00
c617fc9064 Fix the flush_status bug in flush-executor (#2933)
For a tablet, there may be multiple memtables, which will be
flushed to disk one by one in the order of generation.

If a memtable flush fails, then the load job will definitely
fail, but the previous implementation will overwrite `_flush_status`,
which may make the error can not be detected, leads to an error
load job to be success.

This patch also have two other changes:
1. Use `std::bind` to replace `boost::bind`;
2. Removes some unneeded headers.
2020-02-19 20:23:19 +08:00
a76f2b8211 bitmap_union_count support window function (#2902) 2020-02-19 14:33:05 +08:00
1cf0fb9117 Use ThreadPool to refactor MemTableFlushExecutor (#2931)
1. MemTableFlushExecutor maintain a ThreadPool to receive FlushTask.
2. FlushToken is used to seperate different tasks from different tablets.
   Every DeltaWriter of tablet constructs a FlushToken,
   task in FlushToken are handle serially, task between FlushToken are
   handle concurrently.
3. I have remove thread limit on data_dir, because of I/O is not the main
   timer consumer of Flush thread. Much of time is consumed in CPU decoding
   and compress.
2020-02-18 18:39:04 +08:00
32e998f6e9 [ut] Delete files generated by UT when teardown (#2930)
If these residual files are not deleted, the UT will fail because
the corresponding files already exist when running multiple times.
2020-02-18 15:35:11 +08:00
625411bd28 Doris support in memory olap table (#2847) 2020-02-18 10:45:54 +08:00
feef077520 Some refactors on TabletManager (#2918)
1. Add some comments to make the code easier to understand;
2. Make the metric `create_tablet_requests_failed` to be accurate;
3. Some internal methods use naked pointers directly instead of `shared_ptr`;
4. The `using` in `.h` files are contagious when included by other files,
    so we should only use it in `.cpp` files;
5. Some formatting changes: such as wrapping lines that are too long
6. Parameters that need to be modified, use pointers instead of references

No functional changes in this patch.
2020-02-17 14:50:29 +08:00
f20eb12457 [util] Import ThreadPool and Thread from KUDU (#2915)
Thread pool design point:
  All tasks submitted directly to the thread pool enter a FIFO queue and are
dispatched to a worker thread when one becomes free. Tasks may also be
submitted via ThreadPoolTokens. The token wait() and shutdown() functions
can then be used to block on logical groups of tasks.
  A token operates in one of two ExecutionModes, determined at token
construction time:
  1. SERIAL: submitted tasks are run one at a time.
  2. CONCURRENT: submitted tasks may be run in parallel.
     This isn't unlike submitted without a token, but the logical grouping that tokens
     impart can be useful when a pool is shared by many contexts (e.g. to
     safely shut down one context, to derive context-specific metrics, etc.).
Tasks submitted without a token or via ExecutionMode::CONCURRENT tokens are
processed in FIFO order. On the other hand, ExecutionMode::SERIAL tokens are
processed in a round-robin fashion, one task at a time. This prevents them
from starving one another. However, tokenless (and CONCURRENT token-based)
tasks can starve SERIAL token-based tasks.

Thread design point:
  1. It is a thin wrapper around pthread that can register itself with the singleton ThreadMgr
(a private class implemented in thread.cpp entirely, which tracks all live threads so
that they may be monitored via the debug webpages). This class has a limited subset of
boost::thread's API. Construction is almost the same, but clients must supply a
category and a name for each thread so that they can be identified in the debug web
UI. Otherwise, join() is the only supported method from boost::thread.
  2. Each Thread object knows its operating system thread ID (TID), which can be used to
attach debuggers to specific threads, to retrieve resource-usage statistics from the
operating system, and to assign threads to resource control groups.
  3. Threads are shared objects, but in a degenerate way. They may only have
up to two referents: the caller that created the thread (parent), and
the thread itself (child). Moreover, the only two methods to mutate state
(join() and the destructor) are constrained: the child may not join() on
itself, and the destructor is only run when there's one referent left.
These constraints allow us to access thread internals without any locks.
2020-02-17 11:22:09 +08:00
43583e7bd2 Fix orc load bug (#2912) 2020-02-16 19:14:42 +08:00
9ee1704859 [util] Import util tools from KUDU (#2905)
1. MonoTime/MonoDelta
   MonoTime: The MonoTime represents a particular point in time, relative to some fixed but unspecified reference point.
   MonoDelta: The MonoDelta class represents an elapsed duration of time, the delta between two MonoTime instances.

2. CountDownLatch
   This is a C++ implementation of the Java CountDownLatch
2020-02-14 18:01:16 +08:00
3c539aac54 [Refactor] Some tiny refactor on streaming-load related code (#2891)
Mainly contains the following modifications:
1. Use `std::unique_ptr` to replace some naked pointers
2. Modify some methods from member-method to local-static-function
3. Modify some methods do not need to be public to private
4. Some formatting changes: such as wrapping lines that are too long
5. Remove some useless variables
6. Add or modify some comments for easier understanding

No functional changes in this patch.
2020-02-13 10:42:52 +08:00
4e151b1551 Remove boost exception when parse store path (#2861) 2020-02-10 17:50:52 +08:00
e7817053cc [Uitls] ParseUtil::parse_mem_spec support K and T suffix (#2854) 2020-02-07 09:31:35 +08:00
b35e8153c0 [Doris on Es] Fix lte and gte error expression (#2851)
LE should LTE
GE should GTE
2020-02-06 20:52:14 +08:00
a27e89065b Add file cache for v2 (#2782)
Add file descriptor cache for segment v2 to solve too many open file problems
2020-02-04 00:16:01 +08:00
99ad56d1bf Support bitmap index for more type (#2630)
For #2589

1. date(uint24_t)/datetime(int64_t)/largeint(int128_t) use frame of reference code as dict.
2. decimal(decimal12_t) also uses frame of reference code as dict.
3. float/double use bitshuffle code as dict.
2020-01-31 21:09:29 +08:00
89c7234c1c Support starts_with (str, prefix) function (#2813)
Support starts_with function
2020-01-21 14:09:08 +08:00
634928e4d0 Fix typo and remove tmp file in ut (#2789) 2020-01-19 21:33:48 +08:00
7c4149cf27 Improve comparison and printing of Version (#2796)
* Improve comparison and printing of Version

There are two members in `Version`:` first` and `second`.
There are many places where we need to print one `Version` object  and
compare two `Version` objects, but in the current code, these two members
are accessed directly, which makes the code very tedious.

This patch mainly do:
1. Adds overloaded methods for `operator<<()` for `Version`, so
   we can directly print a Version object;
2. Adds the `cantains()` method to determine whether it is an containment
   relationship;
3. Uses `operator==()` to determine if two `Version` objects are equal.

Because there are too many places need to be modified, there are still some
naked codes left, which will be modified later.

This patch also removes some necessary header file references.

No functional changes in this patch.
2020-01-19 18:04:28 +08:00
c71eefa2ac Add path util (#2747)
Note that the methods in path_util are only related to path processing,
and do not involve any file and IO operations

The upcoming patch will use these util methods, used to extract operations
such as concatenation of directory strings from processing logic.
2020-01-18 00:05:00 +08:00
3b24287251 Support 64 bits integers for BITMAP type (#2772)
Fixes #2771 

Main changes in this CL
* RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
* leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
* introduces two new format (SINGLE64 and BITMAP64) for BITMAP type

So far we have three storage format for BITMAP type

```
EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)
```

In order to support BIGINT element and keep backward compatibility, introduce two new format

```
SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64
```

Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.

Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are

1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.
2020-01-17 14:13:38 +08:00
0ddca59d36 Add timestampadd/timestampdiff function (#2725) 2020-01-15 21:47:07 +08:00
7768629f08 Add bitmap_contains and bitmap_has_any functions (#2752) 2020-01-15 14:31:44 +08:00
a36193dfab Support decimal and timestamp type in orc load (#2759) 2020-01-15 07:40:30 +08:00
f071d5a307 Support ends_with function (#2746) 2020-01-14 22:37:20 +08:00
a99a49a444 Add bitamp_to_string function (#2731)
This CL changes:

1. add function bitmap_to_string and bitmap_from_string, which will
 convert a bitmap to/from string which contains all bit in bitmap
2. add function murmur_hash3_32, which will compute murmur hash for
input strings
3. make the function cast float to string the same with user result
logic
2020-01-13 12:31:37 +08:00
4b8f7f9c32 Use cgroups memory limit and cpu cores in container (#2710) 2020-01-10 00:45:50 +08:00
1c9cfa7e0f Fix invalid to_bitmap input lead to BE core (#2706) 2020-01-08 22:14:37 +08:00
a028c52edd Add BE function bitmap_or and bitmap_and (#2707) 2020-01-08 19:59:44 +08:00
13e5fdd512 [AlphaRowset] set num_segments field in rowset meta if missing (#2658)
the num segments should be read from rowset meta pb.
But the previous code error caused this value not to be set in some cases.
So when init the rowset meta and find that the num_segments is 0(not set),
we will try to calculate the num segments from AlphaRowsetExtraMetaPB,
and then set the num_segments field.
This should only happen in some rowsets converted from old version.
and for all newly created rowsets, the num_segments field must be set.
2020-01-07 21:46:02 +08:00
2326b478b6 Support load orc format in Apache Doris (#2554)
Support load orc format in Apache Doris
2020-01-07 14:22:43 +08:00