Commit Graph

121 Commits

Author SHA1 Message Date
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
7cfcddd8df [fix] brpc will check required field in proto and need_gen_rollup is moved will throw exception (#8420) 2022-03-11 00:28:33 +08:00
d880559214 [refactor] remove old schema change code on BE (#8342) 2022-03-09 13:05:44 +08:00
f52d479cbc [fix](ut) fix be ut fragment_mgr_test compile failed (#8344) 2022-03-05 14:43:20 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
aea3e4e59b [refactor] Remove version hash from BE and related test in BE (#8027) 2022-02-14 09:29:27 +08:00
82f421a019 [fix](brpc-attachment) Fix bug that may cause BE crash when enable transfer_data_by_brpc_attachment (#7921)
This PR mainly changes:

1. Fix bug when enable `transfer_data_by_brpc_attachment`

    In `data_stream_sender`, we will send a serialized PRowBatch data to multiple Channels.
    And if `transfer_data_by_brpc_attachment` is enabled, we will mistakenly clear the data in PRowBatch
    after sending PRowBatch to the first Channel.
    As a result, the following Channel cannot receive the correct data, causing an error.

    So I use a separate buffer instead of `tuple_data` in PRowBatch to store the serialized data
    and reuse it in multiple channels.

2. Fix bug that the the offset in serialized row batch may overflow

    Use int64 to replace int32 offset. And for compatibility, add a new field `new_tuple_offsets` in PRowBatch.
2022-02-01 08:51:16 +08:00
fb6e22f4ca [Fix] fix memory leak in be unit test (#7857)
1. fix be unit test memory leak
2. ignore mindump test with ASAN test
2022-01-29 01:00:38 +08:00
ef984a6a72 [improvement](load) Improve load fault tolerance (#7674)
Currently, if we encounter a problem with a replica of a tablet during the load process,
such as a write error, rpc error, -235, etc., it will cause the entire load job to fail,
which results in a significant reduction in Doris' fault tolerance.

This PR mainly changes:

1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job.
2. fix a bug introduced from #7754 that may cause BE coredump
2022-01-20 09:23:21 +08:00
948a2a738d [performance] Improve DeltaWriter's performance. (#7216)
1. Support batch write for DeltaWriter.
2. Use mutex instead of SpinLock.
2021-11-26 10:15:27 +08:00
e2d3d0134e dd a method to get doris current memory usage (#6979)
Add all memory usage check when TryConsume memory
2021-11-24 10:07:54 +08:00
a81f4da4e4 [feat](minidump) Add minidump support (#7124)
Now minidump file will be created when BE crashes.
And user can manually trigger a minidump by sending SIGUSR1 to BE process.

More details can be found in minidump.md documents
2021-11-20 21:41:26 +08:00
6c6380969b [refactor] replace boost smart ptr with stl (#6856)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
2021-11-17 10:18:35 +08:00
521fb15a9b [Bug] Fix some memory bugs (#6699)
1. Fix a memory leak in `collect_iterator.cpp` (Fix #6700)
2. Add a new BE config `max_segment_num_per_rowset` to limit the num of segment in new rowset.(Fix #6701)
3. Make the error msg of stream load more friendly.
2021-09-22 12:30:14 +08:00
3f2fdd236f Add scan thread token (#6443) 2021-08-27 10:56:17 +08:00
7e30b28f3a [Optimize] Speed up converting the data of other types to string in mysql_result_writer (#6384)
Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-24 22:30:58 +08:00
9216735cfa [New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329)
1. FE vectorized plan code
2. Function register vec function
3. Diff function nullable type
4. New thirdparty code and new thrift struct
2021-08-11 14:54:06 +08:00
d1007afe80 Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient (#6361)
* [Optimize] optimize the speed of converting integer to string

* Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-04 10:55:19 +08:00
02a00cdf35 [Bug] Fix the bug in from_date_format_str function (#6273) 2021-07-21 12:31:37 +08:00
fae3eff2e6 [Bug] Fix the bug of cast string to datetime return not null (#6228) 2021-07-17 10:55:08 +08:00
ed3ff470ce [ARRAY] Support array type load and select not include access by index (#5980)
This is part of the array type support and has not been fully completed. 
The following functions are implemented
1. fe array type support and implementation of array function, support array syntax analysis and planning
2. Support import array type data through insert into
3. Support select array type data
4. Only the array type is supported on the value lie of the duplicate table

this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979
2021-07-13 14:02:39 +08:00
739c0268ff [refactor] Remove decimal v1 related code from code base (#6079)
remove ALL DECIMAL V1 type code , this is a part of #6073
2021-07-07 10:26:32 +08:00
1999a0c26b [optimization] open gcc strict-aliasing optimization (#6034)
* open gcc strict-aliasing optimization

* use -Werror=strick-alias
2021-06-18 11:39:24 +08:00
63c99eb4cb [Cache][Enhancement] Assure sql cache only one version (#5793)
For PR #5792. This patch add a new param `cache type` to distinguish sql cache and partition cache.
When update sql cache,  we make assure one sql key only has one version cache.
2021-05-28 13:45:47 +08:00
d0462f4383 [Bug] Fix Backend UT Problem (#5784) (#5785)
1. relocation R_X86_64_32 against `__gxx_personality_v0' can not be used when making a shared object; recompile with -fPIC
2. warning: the use of `tmpnam' is dangerous, better use `mkstemp'
3. Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test couldn't detect the number of threads.
2021-05-17 11:51:59 +08:00
01a45e8691 add read buffer when use s3 reader (#5791) 2021-05-17 11:46:38 +08:00
b686205b97 [Optimize] Reduce lock conflicts in ThreadResourceMgr of be (#5772)
Removed some useless code that caused lock conflicts in ThreadResourceMgr of be.
2021-05-12 10:59:53 +08:00
98e80aa65e [refactor] Replace boost::function with std::function (#5700)
Replace boost::function with std::function
2021-05-09 22:00:48 +08:00
a803ceea86 [refactor] Remove boost mutex, use std::mutex instead (#5684)
* Remove boost mutex, use std::mutex instead

* replace shared_mutex
2021-04-22 11:29:36 +08:00
c4cc681d14 remove boost_foreach, using c++ foreach instead (#5611) 2021-04-15 10:52:29 +08:00
d641a26490 [Refactor] Remove boost filesystem (#5579)
* use std::filesystem instead of boost
Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2021-04-08 09:11:59 +08:00
c9a25aa29e [UT] fix memory tracker ut (#5501)
* [UT] fix memory tracker ut

* Update mem_limit_test.cpp
2021-03-12 13:45:04 +08:00
0131c33966 [Enhance] Improve the readability of memtrackers' name (#5455)
Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker
2021-03-11 22:33:31 +08:00
c38a1c799f [Config] Support config validating when BE bootstrap and update BE's config by API (#5379)
Some invalid config value may cause BE work in an unexpected behavior,
this patch aim to support config validating when BE bootstrap and update BE's config by API
to reject invalid value.
This is a work to accomplish PR #4423
2021-03-04 22:21:49 +08:00
51ccd44865 [Load Parallel][3/3] Support parallel delta writer (#5369)
In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel,
and because of the lock granularity problem, LoadChannel could only process these requests serially,
which made it impossible to make full use of cluster resources.

This CL modifies the related locks so that LoadChannel can process these requests in parallel.

In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been
increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min.

Also modify the profile of load job.
2021-02-07 22:42:18 +08:00
wyb
128752b4f9 [Routine load] Fix kafka load too many task bug (#5327) 2021-02-03 13:23:30 +08:00
11c0aafa5c [UT] Speed up BE unit test (#5131)
There are some long loops and sleeps in unit tests, it will cost a
very long time to run all unit tests, especially run in TSAN mode.
This patch speed up unit tests by shortening long loops and sleeps,
on my environment all unit tests finished in 1 minite. It's useful
to do basic functional unit tests.
You can switch to run in this mode by adding a new environment variable
'DORIS_ALLOW_SLOW_TESTS'. For example, you can set:
export DORIS_ALLOW_SLOW_TESTS=1
and also you can disable it by setting:
export DORIS_ALLOW_SLOW_TESTS=0
2020-12-27 22:19:56 +08:00
85076b5678 [UT] fix test_env & add a sample (#5085)
Easily create tests.
2020-12-27 22:14:30 +08:00
9ddf434f6b [Bug-Fix] Fix partition cache match bug (#5060)
When partition cache is not cached continuely, range query may fail.
For example, partition key 20201011 and 20201013 is cached,
but rang query is between 20201011 and 20201013, the query will not hit the cache.
issue:#5059
2020-12-19 11:17:44 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
2682712349 [Bug] Fix be ut compile failed and core in delta_writer_test when ulimit < 60000. (#4941) 2020-11-24 22:21:19 +08:00
f1b57c4418 [Optimize] Avoid repeated sending of common components in Fragments (#4904)
This CL mainly changes:

1. Avoid repeated sending of common components in Fragments

    In the previous implementation, a query may generate multiple Fragments,
these Fragments contain some common information, such as DescriptorTable.
Fragment will be sent to BE in a certain order, so these public information will be sent repeatedly
and generated repeatedly on the BE side.

    In some complex SQL, these public information may be very large,
thereby increasing the execution time of Fragment.

    So I improved this. For multiple Fragments sent to the same BE, only the first Fragment will carry
these public information, and it will be cached on the BE side, and subsequent Fragments
no longer need to carry this information.

    In the local test, the execution time of some complex SQL can be reduced from 3 seconds to 1 second.

2. Add the time-consuming part of FE logic in Profile

    Including SQL analysis, planning, Fragment scheduling and sending on the FE side, and the time to fetch data.
2020-11-22 20:38:05 +08:00
09f97f8a05 [Refactor] Fixes some be typo part 2 (#4747) 2020-10-20 09:28:57 +08:00
75e0ba32a1 Fixes some be typo (#4714) 2020-10-13 09:37:15 +08:00
5199a17a4b [cache][be]Fix the bug of cross-border access cache (#4639)
* When the different partition of the table is updated frequently, the partition key list of the cache is discontinuous,
and the partition key in the request cannot hit the key list in the cache, resulting in the access overrun,the BE will crash.

* Add some unit test case,add test cases that fail to hit the boundary value of cache
2020-09-28 13:35:52 +08:00
5f43fb3bde [Cache][BE] LRU cache for sql/partition cache #2581 (#4005)
1. Find the cache node by SQL Key, then find the corresponding partition data by Partition Key, and then decide whether to hit Cache by LastVersion and LastVersionTime
2. Refers to the classic cache algorithm LRU, which is the least recently used algorithm, using a three-layer data structure to achieve
3. The Cache elimination algorithm is implemented by ensuring the range of the partition as much as possible, to avoid the situation of partition discontinuity, which will reduce the hit rate of the Cache partition,
4. Use the two thresholds of maximum memory and elastic memory to control to avoid frequent elimination of data
2020-09-20 20:50:51 +08:00
065b979f35 [Bug] behavior of function str_to_date() and date_format() on BE and FE is inconsistent (#4612)
1. add date range check in `DateLiteral` for `FEFunctions`
2. `select str_to_date(202009,'%Y%m')` and `select str_to_date(str,'%Y%m') from tb where tb.str = '202009'` will return same output `2020-09-00`.
3. add support of zero-date to function `str_to_date()`,`date_format()` 
4. fix FE can calculate negative value bug, eg: `select str_to_date('-2020', '%Y')` will return `NULL` instead of date value.

current behavior is same as MySQL **without** sql_mode `NO_ZERO_IN_DATE` and `NO_ZERO_DATE`.

**current behavior**
```
mysql> select siteid,str_to_date(siteid,'%Y%m%d') from table2  order by siteid;
+------------+---------------------------------+
| siteid     | str_to_date(`siteid`, '%Y%m%d') |
+------------+---------------------------------+
|          1 | 2001-00-00                      |
|          2 | 2002-00-00                      |
|          2 | 2002-00-00                      |
|          3 | 2003-00-00                      |
|          4 | 2004-00-00                      |
|          5 | 2005-00-00                      |
|         20 | 2020-00-00                      |
|        202 | 0202-00-00                      |
|       2020 | 2020-00-00                      |
|      20209 | 2020-09-00                      |
|     202008 | 2020-08-00                      |
|     202009 | 2020-09-00                      |
|    2020009 | 2020-00-09                      |
|   20200009 | 2020-00-09                      |
|   20201309 | NULL                            |
| 2020090909 | 2020-09-09                      |
+------------+---------------------------------+

mysql> select str_to_date('2','%Y%m%d'),str_to_date('20','%Y%m%d'),str_to_date('202','%Y%m%d'),str_to_date('2020','%Y%m%d'),str_to_date('20209','%Y%m%d'),str_to_date('202009','%Y%m%d'),str_to_date('2020099','%Y%m%d'),str_to_date('20200909','%Y%m%d'),str_to_date('2020090909','%Y%m%d'),str_to_date('2020009','%Y%m%d'),str_to_date('20200009','%Y%m%d'),str_to_date('20201309','%Y%m%d');
+----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+
| str_to_date('2', '%Y%m%d') | str_to_date('20', '%Y%m%d') | str_to_date('202', '%Y%m%d') | str_to_date('2020', '%Y%m%d') | str_to_date('20209', '%Y%m%d') | str_to_date('202009', '%Y%m%d') | str_to_date('2020099', '%Y%m%d') | str_to_date('20200909', '%Y%m%d') | str_to_date('2020090909', '%Y%m%d') | str_to_date('2020009', '%Y%m%d') | str_to_date('20200009', '%Y%m%d') | str_to_date('20201309', '%Y%m%d') |
+----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+
| 2002-00-00                 | 2020-00-00                  | 0202-00-00                   | 2020-00-00                    | 2020-09-00                     | 2020-09-00                      | 2020-09-09                       | 2020-09-09                        | 2020-09-09                          | 2020-00-09                       | 2020-00-09                        | NULL                              |
+----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+
```
2020-09-17 10:10:19 +08:00
b780df697a [refactor] Optimize threads usage mode in BE (#4440)
BE can not graceful exit because some threads are running in endless
loop. This patch do the following optimization:
- Use the well encapsulated Thread and ThreadPool instead of std::thread
  and std::vector<std::thread>
- Use CountDownLatch in thread's loop condition to avoid endless loop
- Introduce a new class Daemon for daemon works, like tcmalloc_gc,
  memory_maintenance and calculate_metrics
- Decouple statistics type TaskWorkerPool and StorageEngine notification
  by submit tasks to TaskWorkerPool's queue
- Reorder objects' stop and deconstruct in main(), i.e. stop network
  services at first, then internal services
- Use libevent in pthreads mode, by calling evthread_use_pthreads(),
  then EvHttpServer can exit gracefully in multi-threads
- Call brpc::Server's Stop() and ClearServices() explicitly
2020-09-06 20:19:14 +08:00
5166a6c6bc [Bug] function str_to_date()'s behavior on BE and FE is inconsistent (#4495)
Main CL:
1. Copy the code from BE to implement the `str_to_date()` function in FE. 
2. `str_to_date("2020-08-08", "%Y-%m-%d %H:%i:%s")` will return `2020-08-08 00:00:00` instead of `2020-08-08`.
2020-09-03 17:16:19 +08:00