Commit Graph

1111 Commits

Author SHA1 Message Date
00f25c2b77 [Bug] Tablet and Disk report thread not work (#4597)
The tablet and disk information reporting threads need to report to the FE periodically.
At the same time these two reporting threads will also be triggered by certain events.

The modification in PR #4440 caused these two threads to be triggered only by events,
and could not report regularly.
2020-09-20 20:51:52 +08:00
5f43fb3bde [Cache][BE] LRU cache for sql/partition cache #2581 (#4005)
1. Find the cache node by SQL Key, then find the corresponding partition data by Partition Key, and then decide whether to hit Cache by LastVersion and LastVersionTime
2. Refers to the classic cache algorithm LRU, which is the least recently used algorithm, using a three-layer data structure to achieve
3. The Cache elimination algorithm is implemented by ensuring the range of the partition as much as possible, to avoid the situation of partition discontinuity, which will reduce the hit rate of the Cache partition,
4. Use the two thresholds of maximum memory and elastic memory to control to avoid frequent elimination of data
2020-09-20 20:50:51 +08:00
065b979f35 [Bug] behavior of function str_to_date() and date_format() on BE and FE is inconsistent (#4612)
1. add date range check in `DateLiteral` for `FEFunctions`
2. `select str_to_date(202009,'%Y%m')` and `select str_to_date(str,'%Y%m') from tb where tb.str = '202009'` will return same output `2020-09-00`.
3. add support of zero-date to function `str_to_date()`,`date_format()` 
4. fix FE can calculate negative value bug, eg: `select str_to_date('-2020', '%Y')` will return `NULL` instead of date value.

current behavior is same as MySQL **without** sql_mode `NO_ZERO_IN_DATE` and `NO_ZERO_DATE`.

**current behavior**
```
mysql> select siteid,str_to_date(siteid,'%Y%m%d') from table2  order by siteid;
+------------+---------------------------------+
| siteid     | str_to_date(`siteid`, '%Y%m%d') |
+------------+---------------------------------+
|          1 | 2001-00-00                      |
|          2 | 2002-00-00                      |
|          2 | 2002-00-00                      |
|          3 | 2003-00-00                      |
|          4 | 2004-00-00                      |
|          5 | 2005-00-00                      |
|         20 | 2020-00-00                      |
|        202 | 0202-00-00                      |
|       2020 | 2020-00-00                      |
|      20209 | 2020-09-00                      |
|     202008 | 2020-08-00                      |
|     202009 | 2020-09-00                      |
|    2020009 | 2020-00-09                      |
|   20200009 | 2020-00-09                      |
|   20201309 | NULL                            |
| 2020090909 | 2020-09-09                      |
+------------+---------------------------------+

mysql> select str_to_date('2','%Y%m%d'),str_to_date('20','%Y%m%d'),str_to_date('202','%Y%m%d'),str_to_date('2020','%Y%m%d'),str_to_date('20209','%Y%m%d'),str_to_date('202009','%Y%m%d'),str_to_date('2020099','%Y%m%d'),str_to_date('20200909','%Y%m%d'),str_to_date('2020090909','%Y%m%d'),str_to_date('2020009','%Y%m%d'),str_to_date('20200009','%Y%m%d'),str_to_date('20201309','%Y%m%d');
+----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+
| str_to_date('2', '%Y%m%d') | str_to_date('20', '%Y%m%d') | str_to_date('202', '%Y%m%d') | str_to_date('2020', '%Y%m%d') | str_to_date('20209', '%Y%m%d') | str_to_date('202009', '%Y%m%d') | str_to_date('2020099', '%Y%m%d') | str_to_date('20200909', '%Y%m%d') | str_to_date('2020090909', '%Y%m%d') | str_to_date('2020009', '%Y%m%d') | str_to_date('20200009', '%Y%m%d') | str_to_date('20201309', '%Y%m%d') |
+----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+
| 2002-00-00                 | 2020-00-00                  | 0202-00-00                   | 2020-00-00                    | 2020-09-00                     | 2020-09-00                      | 2020-09-09                       | 2020-09-09                        | 2020-09-09                          | 2020-00-09                       | 2020-00-09                        | NULL                              |
+----------------------------+-----------------------------+------------------------------+-------------------------------+--------------------------------+---------------------------------+----------------------------------+-----------------------------------+-------------------------------------+----------------------------------+-----------------------------------+-----------------------------------+
```
2020-09-17 10:10:19 +08:00
4f7cfee908 [compaction][config] Change default config policy to size_based (#4599)
(1) change default compaction config policy to size_based
(2) change missed version check policy when delete stale rowsets
2020-09-16 15:04:06 +08:00
9419c73472 [Bug] Fix bug that BE will crash when querying information_schema.columns (#4595) 2020-09-14 15:47:08 +08:00
e8e5f350fe [BUG] ReAgg when adding agg mv on dup base table (#4587)
When the keystype of mv and base table is difference, Doris should execute
sorting schema change instead of linked schema change.
If doesn't, the data size of mv actually is same as base table.
This will cause mv to have no pre-aggregation effect at all.
The query will not choose mv.

This commit fixed this problem. Fixed #4586
2020-09-13 19:17:35 +08:00
4571b09dd6 [storage][compatibility] Add meta format detection to prevent data loss. (#4539)
After 0.12 version, doris remove the format convert functiion which can convert from hdr_ format
to tabletmeta_ format when loading metas, the commit link: 3bca253

When we update doris version and there are old format meta in storage,
BE will not read the old format tablet. It can lead to data loss.

So we add meta format detection function to prevent data loss.
When there are old format meta in olap_meta, BE can find and print log or exit.
2020-09-13 11:58:22 +08:00
2c24fe80fa [SparkDpp] Support complete types (#4524)
For[Spark Load]
1 support decimal andl largeint
2 add validate logic for char/varchar/decimal
3 check data load from hive with strict mode
4 support decimal/date/datetime aggregator
2020-09-13 11:57:33 +08:00
4caa6f9b33 [Bug] fix get_parsed_paths() subscript out of range (#4585) 2020-09-12 16:04:21 +08:00
e26d5d0da0 [MemTracker] show all MemTrackers on BE's website (#4580)
We can show all MemTrackers on BE's website by calling MemTracker::ListTrackers().
2020-09-12 11:18:50 +08:00
704bcec9d3 [Bug] add_batch check state fix (#4575) 2020-09-12 11:18:10 +08:00
d29bf30f74 [BUG] Fix stale path delete checking logic when current main path is missing. (#4549)
Fix stale path delete checking logic.
When current main path is version missing, then delete checking logic is always core dumped. So we fix the checking logic to tolerate current main version missing.
2020-09-08 18:52:53 +08:00
e55327bbc7 [Bug] Fix bug that task_worker_pool not work (#4543)
The number of thread initialized in task worker pool is not right.
This bug is introduced from #4440
2020-09-08 09:25:36 +08:00
64ebea2e43 [Feature] Support gzip compression for http response (#4533)
After tablet level metrics is supported, the http metrics API may response
a very large body when a BE holds a large number of tablets, and cause heavy
network traffic.
This patch introduce http content compression to reduce network traffic.
2020-09-06 20:30:12 +08:00
69bd91b617 [BUG] Tablet is not readable and delete handler report -1903 error, when condition value contains \n (#4531) 2020-09-06 20:29:44 +08:00
b780df697a [refactor] Optimize threads usage mode in BE (#4440)
BE can not graceful exit because some threads are running in endless
loop. This patch do the following optimization:
- Use the well encapsulated Thread and ThreadPool instead of std::thread
  and std::vector<std::thread>
- Use CountDownLatch in thread's loop condition to avoid endless loop
- Introduce a new class Daemon for daemon works, like tcmalloc_gc,
  memory_maintenance and calculate_metrics
- Decouple statistics type TaskWorkerPool and StorageEngine notification
  by submit tasks to TaskWorkerPool's queue
- Reorder objects' stop and deconstruct in main(), i.e. stop network
  services at first, then internal services
- Use libevent in pthreads mode, by calling evthread_use_pthreads(),
  then EvHttpServer can exit gracefully in multi-threads
- Call brpc::Server's Stop() and ClearServices() explicitly
2020-09-06 20:19:14 +08:00
068707484d Support sequence column for UNIQUE_KEYS Table (#4256)
* add sequence  col

Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>
2020-09-04 10:10:17 +08:00
8d60352737 [BUG] Fix segment group add zone map bug when schema change. (#4526)
Fix segment group add zone map bug when schema change.
(1) WrapperField null point check
(2) in DUP_KEYS, let _zone_maps index consistent with _schema column index
2020-09-04 09:30:52 +08:00
15f3e5a775 [Bug] Fix bug of core local value (#4523)
When creating core local value from CoreDataAllocator,
A lock is needed to protect the modification of _blocks.
2020-09-04 09:30:30 +08:00
5166a6c6bc [Bug] function str_to_date()'s behavior on BE and FE is inconsistent (#4495)
Main CL:
1. Copy the code from BE to implement the `str_to_date()` function in FE. 
2. `str_to_date("2020-08-08", "%Y-%m-%d %H:%i:%s")` will return `2020-08-08 00:00:00` instead of `2020-08-08`.
2020-09-03 17:16:19 +08:00
1a30bcbf36 [SQL Function][Bug] Fix parse_url() bug (#4429)
The parameter 'part' of parse_url function does not support lower case, and parse protocol not right.
And This function does not support parse 'port'. 
This PR tries to make parse_url function case insensitive and support parse 'port'.

The issue: #4451
2020-09-03 17:06:09 +08:00
c29d41f675 [BUG] Fix recover persistent stale rowsets bug from multi-single version rowsets in stale rowsets (#4513)
(1) fix recover persistent stale rowsets bug from multi-single version rowset in stale rowsets
(2) delete_expired_inc_rowsets check consistent version convert to [0, max_version]
2020-09-03 16:59:18 +08:00
d7ac44ac79 [Bug] Fix bug that BE will crash when querying information_schema.columns (#4511)
This bug is introduced from #4364
2020-09-03 16:57:56 +08:00
498b06fbe2 [Metrics] Support tablet level metrics (#4428)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. 
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-09-02 10:39:41 +08:00
a864db03fe [Bug] Fix bug of load error hub and schema change (#4486)
1. When WITH_MYSQL is off, load error hub does not suport MySQL load error hub,
   we should check its return value.

2. misjudge the return value of `change_row_block` in schema_change.cpp
2020-08-31 23:21:50 +08:00
1d93ba027a [Compaction] Compaction show policy type and disk format (#4466)
Add more information in compaction show api
1、add cumulative policy type
2、format rowset total disk size
2020-08-30 21:09:47 +08:00
65cacbff7c [Bug] Fix bug that memory copy may overflow in MemIndex::load_segment (#4458)
Segment index file content is not set as 0 when it is constructed in write procedure, 
so when load index from this file, and meet a null VARCHAR cell,
the null field of this cell is 0, but the length field which is not initialized maybe a large random number,
then memory copy may cause overflow.
This patch fix this bug, and also skip useless memory copy to improve a bit of performance.
2020-08-30 21:08:55 +08:00
123237afb7 [Compaction] Persistence stale rowsets meta (#4454)
Persistence stale rowsets meta. When BE reboots, stale rowsets meta
can resume and the stale version can also be readable before stale gc time.

ISSUE: #4453
2020-08-30 21:05:48 +08:00
004b955ca4 [Bug] Fix a null pointer bug in PlanFragmentExecutor. (#4473)
Fix a null pointer bug in PlanFragmentExecutor. Add null check operation before it is used.
Detail: #4472
2020-08-28 09:28:23 +08:00
84c63f1350 [Bug] replace libltdl.so when compile the unixodbc library (#4461) 2020-08-27 20:53:28 +08:00
ad738fa198 Add OLAP_ERR_DATE_QUALITY_ERR error status to display schema change failure (#4388)
In the process of historical data transformation of materialized views, it may occur that the transformation fails due to data quality.
Add an error status code :OLAP_ERR_DATE_QUALITY_ERR to determine if a data problem is causing the failure

#3344
2020-08-27 17:52:53 +08:00
b85bb0e2e9 [Bug-Fix] Some deleted tablets are not recycled on BE (#4401) 2020-08-27 12:09:19 +08:00
f218327dd9 [Mysql Compatibility] Support convert() and signed/unsigned interger cast (#4364)
1. Support convert(expr, target_type) function, which is same as CastExpr
2. Support cast (expr as signed/unsigned int)
   This is just for compatibility, the signed/unsigned specification is meaningless.
2020-08-27 12:07:58 +08:00
8b0b120aca [Profile] Add 2 Segment related metrics in query profile (#4348)
Total number of segments and filterd number of segment
2020-08-27 12:07:21 +08:00
e4e9af4577 This PR contain three things (#4448)
1. Fix core bug wild pointer in PlanFragmentExecutor, fix issue #4447
2. Fix core bug wild pointer json load, fix issue #4452
3. Change the declare order of ODBC type in thrift for compatibility
2020-08-26 10:53:53 +08:00
97d963468a [Code Cleanup] Template nest convert to c++11 syntax and style (#4442) 2020-08-26 10:51:52 +08:00
b4d8b3d9ba Forbidden the illegal column types on BITMAP_UNION OR HLL_UNION mv (#4432)
1. The base column of bitmap_union could must be integer. The largeint is not supported too.
2. The base column of hll_union could not be decimal.

Check error msg of const expr in Union Node

If user wants to insert a negative number into bitmap mv, Doris will thrown exception 'invalid input'.
The const value in Union Node is checked in this commit.
2020-08-26 10:49:32 +08:00
664e6a5898 [Storage] "align_tag_path" and ALIGN_TAG_PREFIX is needless (#4410) 2020-08-26 10:47:21 +08:00
613c44e889 [Optimize]Optimize the disk selection strategy on BE for tablet creation (#4373)
When creating a tablet, it is necessary to select a disk from all disks that
meet the requirements on the BE node to store the tablet.

In Doris, the current disk selection strategy is to randomly select a disk
from all disks that meet the requirements for tablet creation.

After the cluster has been running for a long time, we found that the
distribution of the number of tablets on different disks in a BE node is unbalanced.

In order to solve this problem, we introduced the algorithm of "two random choices"
for disk selection when creating the tablet:
(1) Select two disks from all disks that meet the requirements on the BE node randomly;
(2) Choose the disk with a smaller number of tablet from the two disks selected in (1) for tablet creation.
2020-08-26 10:35:33 +08:00
c201cf6e4f Support batch delete[part 2] (#4425)
support batch delete for read compaction
2020-08-25 14:05:04 +08:00
67b842ce04 [License] Organize and modify the license of the code (#4371)
1. Disable the MySQL client and LZO library by default when building the Doris.

    MySQL client library is used for MySQL external table feature.
    This feature will be replaced by the new ODBC external table soon.

    LZO library is used to compress/decompress data of some old data format of Doris,
    which is no longer used anymore.

2. Add missing license to some files.

3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared.

4. Remove the js source code from webroot, it will be downloaded as thirdparty
2020-08-24 21:51:55 +08:00
976820ba20 [SegmentV2] Change the default storage format to SegmentV2 (#4387)
Since the Segment V2 has been released for a long time, we should make it as default storage format for newly created table.

This CL mainly changes:
1. For all newly created tables, their default storage format is Segment V2.
2. For all already exist tablets, their storage format remain unchanged.
3. Fix  bugs described in Fix #4384 and Fix #4385
2020-08-24 21:51:17 +08:00
5fc79561d7 [MemTracker][Bug-Fix] Fix core in DECHECK in memory tracker (#4421)
Fix DECHECK failed in mem_tracker, issue #4420
2020-08-23 22:41:02 +08:00
d61c10b761 [Delete] Support batch delete [part 1] (#4310)
* Implements the grammar of the batch delete #4051 
* Process create, alter table when table has delete sign column
* Support the syntax for enabling the delete column
* Automatically filtered deleted data in the select statement.
* Automatically add delete sign when create  rollup table
TODO:
 * Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction
2020-08-21 22:57:16 +08:00
a8fe54b7b9 [ODBC SCAN NODE] 1/4 Add unix odbc library. (#4377) 2020-08-21 21:26:14 +08:00
5976395bb6 [BUG] Remove the deduplication of LEFT SEMI/ANTI JOIN with not equal predicate (#4417)
```
SELECT *
FROM
  (SELECT cs_order_number,
          cs_warehouse_sk
   FROM catalog_sales
   WHERE cs_order_number = 125005
     AND cs_warehouse_sk = 4) cs1
LEFT SEMI JOIN
  (SELECT cs_order_number,
          cs_warehouse_sk
   FROM catalog_sales
   WHERE cs_order_number = 125005) cs2
ON cs1.cs_order_number = cs2.cs_order_number
AND cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk;
```

The above query has an equal predicate and a not equal predicate.
If there exists not equal preidcate, the build table should be remained
as it is. So the deduplication should be removed.
2020-08-21 19:55:09 +08:00
a7422ee142 [UT][Bug-Fix] Resolve UT memory leak problem (#4406)
Fix ut memory leak on Fix #4164
2020-08-21 10:41:54 +08:00
b6859f1bd4 [JsonLoad] Fix bug that row num stat is not correct when loading json (#4379)
When all fields are null, the row is invalid, it should be filtered
2020-08-20 09:30:19 +08:00
60d9d31ec1 [Optimize] Optimize coding bit operation in BE (#4366)
Optimize bit operation in variable length coding. Remove unnecessary bit operation.
2020-08-20 09:29:53 +08:00
bfb39a2826 [SQL][Function] Add replace() function (#4347)
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
2020-08-20 09:28:53 +08:00