Commit Graph

1601 Commits

Author SHA1 Message Date
4c98596283 [MysqlProtocol] Support MySQL multiple statements protocol (#3050)
2 Changes in this CL:

## Support multiple statements in one request like:

```
select 10; select 20; select 30;
```
ISSUE: #3049 

For simple testing this CL, you can using mysql-client shell command tools:

```
mysql> delimiter //
mysql> select 1; select 2; //
+------+
| 1    |
+------+
|    1 |
+------+
1 row in set (0.01 sec)

+------+
| 2    |
+------+
|    2 |
+------+
1 row in set (0.02 sec)

Query OK, 0 rows affected (0.02 sec)
```

I add a new class called `OriginStatement.java`, to save the origin statement in string format with an index. This class is mainly for the following cases:

1. User send a multi-statement to the non-master FE:
      `DDL1; DDL2; DDL3`

2. Currently we cannot separate the original string of a single statement from multiple statements. So we have to forward the entire statement to the Master FE. So I add an index in the forward request. `DDL1`'s index is 0,  `DDL2`'s index is 1,...

3. When the Master FE handle the forwarded request, it will parse the entire statement, got 3 DDL statements, and using the `index` to get the  specified the statement.

## Optimized the display of syntax errors
I have also optimized the display of syntax errors so that longer syntax errors can be fully displayed.
2020-03-13 22:21:40 +08:00
9832024995 [Insert] Fix bug that insert meet unexpected "label already exists" exception (#3103)
This CL will abort the transaction of an insert operation when encountering exception thrown in analysis phase.

ISSUE: #3102
2020-03-13 20:51:44 +08:00
5f18e99cdb [Doc] Update add fe node description (#3100) 2020-03-13 18:05:09 +08:00
aa540966c6 Output null for hll and bitmap column when select * (#2991) 2020-03-13 11:59:30 +08:00
d8c756260b Rewrite count distinct to bitmap and hll (#3096) 2020-03-13 11:44:40 +08:00
c5660fcb9d [UT]Fix unit test for cgroup_util (#3094)
Co-authored-by: wangcong18 <wangcong18@xiaomi.com>
2020-03-12 22:59:40 +08:00
8276c6d7f8 Show BE version in 'show backends;' (#3074)
In a large scale cluster, we may rolling upgrade BEs, this patch add a
column named 'Version' for command 'show backends;', as well as website
'/system?path=//backends', to provide a method to check whether there
is any BE missing upgraded.
2020-03-12 22:15:13 +08:00
905070f4da [CodeStyle] Fix compile warning (#3076)
```
be/src/olap/rowset/segment_v2/ordinal_page_index.cpp:103:22: warning: ‘ordinal’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
    _ordinals[i] = ordinal;
```
2020-03-11 18:17:29 +08:00
bf9612e28b [CodeStyle] Remove unnecessary forward declaration of WritableFile (#3075) 2020-03-11 18:17:11 +08:00
c8705ccf12 [MaterializedView] Support dropping materialized view (#3068)
`DROP MATERIALIZE VIEW [ IF EXISTS ] <mv_name> ON [db_name].<table_name>`

Parameters:

  IF EXISTS: Do not throw an error if the materialized view does not exist. A notice is issued in this case.
  mv_name: The name of the materialized view to remove.
  db_name: The name of db to which materialized view belongs.
  table_name: The name of table to which materialized view belongs.
2020-03-11 18:16:24 +08:00
a77515fe03 [Backup] Fix backup job block at SNAPSHOTING phase (#3058)
This bug occurred when BE make snapshot, the version required by fe had been merged into the cumulative version, so the snapshot task could not complete the task even if it retried. In order to solve this problem, the BackupJob could be set to CANCELLED, and the user could continue to retry the job.

Fix #3057
2020-03-11 14:05:02 +08:00
608917c04d Use block layer to write files (#3064)
This is the second patch following 58b8e3f574614433ea9e0c427961f2efb3476c2a,

This patch use block-layer to write files.
2020-03-11 12:11:25 +08:00
cf219ddf18 [ConsistencyCheck] Support checking replica consistency of tablet manually (#3067) 2020-03-10 15:25:25 +08:00
7400535b37 [Doc] Update compaction-action_EN.md (#3060)
fix typo
2020-03-09 22:09:43 +08:00
b9b9a11eae [Bug] Fix invalid rollback for stream load txn (#3054) 2020-03-09 22:07:36 +08:00
6e46dccd39 [Doc] Update compaction-action.md (#3059)
fix typo
2020-03-09 21:14:09 +08:00
fdcbfbb793 [Bug] Fix bug that coalesce() function return null when there is constant value in parameter. (#3062)
select coalesce(1, null);

RETURNS:    NULL
EXPECTED:   1
2020-03-09 16:38:50 +08:00
a1f5b57011 Support sharding tablet_map_lock into more small map locks to make good performance for tablet manage task (#3051)
Support sharding tablet_map_lock into more small map locks to make good performance for tablet manage task
2020-03-09 16:29:56 +08:00
dc07182bd4 [Intersect] Implements intersect node (#3034)
imlement of the intersect node
now can support statement like `select a from t intersect select b from t1 intersect select 1;`
2020-03-09 10:52:55 +08:00
172838175f [Bug] Fix bug that index name in MaterializedViewMeta is not changed after schema change (#3048)
The index name in MaterializedViewMeta is still with `__doris_shadow` prefix
after schema change finished.

In this CL, I just remove the index name field in MaterializedViewMeta,
so that it would makes managing change of names less error-prone.
2020-03-09 10:11:16 +08:00
765f284dcd [Doc] Add Downloads page to Doris website (#3039) 2020-03-09 09:42:46 +08:00
c8054ebe13 [Function] ifnull function supports new args (date,datetime) and (datetime, date) (#3043) 2020-03-09 09:37:26 +08:00
c83729435f Write delete predicate into RowsetMeta upon upgrade from Doris-0.10 to Doris-0.11 (#3044)
If delete predicate exists in meta in Doris-0.10, all of this predicates should
be remained. There is an confused place in Doris-0.10. The delete predicate
only exists in OLAPHeaderMessage and PPendingDelta, not in PDelta.
This trick results this bug.
2020-03-07 11:16:48 +08:00
1d296e907d Fix orc load timestamp bug (#3047)
The timestamp value load from orc file is error, the value has an offset with hive and spark.
Becuase the time zone of orc's timestamp is stored inside orc's stripe information, so the timestamp obtained here is an offset timestamp, so parse timestamp with UTC is actual datetime literal.
2020-03-06 18:03:27 +08:00
fca6c4e523 Fix bitmap null crash (#3042) 2020-03-05 21:30:32 +08:00
7b30bbea42 [MaterializedView] Support different keys type between MVs and base table (#3036)
Firstly, add materialized index meta in olap table

The materialized index meta include index name, schema, schemahash, keystype etc.
The information itself scattered in each map is encapsulated into MaterializedIndexMeta.

Also the keys type of index meta maybe not same as keys type of base index after materialized view enabled.

Secondly, support the deduplicate mv.
If there is group by or aggregation function in create mv stmt, the keys type of mv is agg.
At the same time, the keys type of base table is duplicate.
For example
Duplicate table (k1, k2, v1)
MV (k1, k2) group by k1, k2
It should be aggregated during executing mv.
2020-03-05 18:19:18 +08:00
cd7207c869 Add ORC help doc (#3041) 2020-03-05 12:44:47 +08:00
c731c8b9bc [Bug] Fix bug of NPE when get replication number from olap table (#3029)
The default replication number of an olap table may not be set.
Every time we call `getReplicationNum()`, we have to check if it returns null, 
which is inconvenience and may cause problem

So in this PR, I set a default value to table's replication number.

This bug is introduced by #2958
2020-03-05 12:18:38 +08:00
4ed99e3c0c [Compile] Fix BE compile failure (#3040)
fix BE compile failure because of BloomFilterIndexWriter bug.
2020-03-05 11:38:42 +08:00
63051a3b37 [Bug] Fix int128 bloom filter write bug (#2995)
std::set.insert(int128) core dump because segment fault.
the reason is the __int128 is not aligned.
2020-03-05 09:15:11 +08:00
cc1a5fb8ea [Function] Support '%' in date format string (#3037)
eg:
select str_to_date('2014-12-21 12%3A34%3A56', '%Y-%m-%d %H%%3A%i%%3A%s');
select unix_timestamp('2007-11-30 10:30%3A19', '%Y-%m-%d %H:%i%%3A%s');

This also enable us to extract column fields from HDFS file path with contains '%'.
2020-03-05 08:56:02 +08:00
50af594c66 [MemLimit] Normalize the setting of mem limit (#3033)
Normalize the setting of mem limit to avoid some unexpected exception.
For example, use may not setting query mem limit in query plan, which
may cause BE crash.
2020-03-05 08:47:45 +08:00
f17924650f [Config] Modify brpc max_body_size to 200M (#3030)
The default max size per row is 100K, and default row batch size is 2048.
So we change the default brpc max_body_size to 200MB to avoid query failure.
2020-03-04 15:30:27 +08:00
c032d634f4 [FsBroker] Fix bug that broker cannot read file with %3A in name (#3028)
The hdfs support file with name like: "2018-01-01 00%3A00%3A00",
we should support it.

Also change the default broker log level to INFO.
2020-03-04 11:03:01 +08:00
70cc6df415 [Doc] Fix some typo (#3024) 2020-03-02 22:13:47 +08:00
54aa0ed26b [SetOperation] Change set operation from random shuffle to hash shuffle (#3015)
use hash shuffle instead of random shuffle in set operation, prepare for intersect and except operation
2020-03-02 19:34:41 +08:00
d151718e98 [MaterializedView] Fix bug that preAggregation is different between old and new selector (#3018)
If there is no aggregated column in aggregate index, the index will be deduplicate table.
For example:

    aggregate table (k1, k2, v1 sum)
    mv index (k1, k2)

This kind of index is SPJG which same as `select k1, k2 from aggregate_table group by k1, k2`.
It also need to check the grouping column using following steps.

If there is no aggregated column in duplicate index, the index will be SPJ which passes the grouping verification directly.

Also after the supplement of index, the new candidate index should be checked the output columns also.
2020-03-02 19:11:10 +08:00
aa58cd99d9 Fix disks_total_capacity metric bug (#2988)
Now disks_total_capacity metric is a user specified capacity, but
disks_avail_capacity is the disk's actual available capacity, so
disks_total_capacity may be less than disks_avail_capacity, and
UsedPct on FE may be a negative number as a result.
We'd better to use disk actual capacity for disks_total_capacity metric.
2020-03-02 19:09:50 +08:00
511c5eed50 [Doc] Modify format of some docs (#3021)
Format of some docs are incorrect for building the doc website.
* fix a bug that `gensrc` dir can not be built with -j.
* fix ut bug of CreateFunctionTest
2020-03-02 19:07:52 +08:00
21b87ee23a [Bug] Access follower FE's website got exception (#3020)
QualifiedUser field is not set in ConnectContext
2020-03-02 13:53:35 +08:00
ef4bb0c011 [RoutineLoad] Auto Resume RoutineLoadJob (#2958)
When all backends restart, the routine load job can be resumed.
2020-03-02 13:27:35 +08:00
df56588bb5 [Temp Partition] Support add/drop/replace temp partitions (#2828)
This CL implements 3 new operations:

```
ALTER TABLE tbl ADD TEMPORARY PARTITION ...;
ALTER TABLE tbl DROP TEMPORARY PARTITION ...;
ALTER TABLE tbl REPLACE TEMPORARY PARTITION (p1, p2, ...);
```

User manual can be found in document:
`docs/documentation/cn/administrator-guide/alter-table/alter-table-temp-partition.md`

I did not update the grammar manual of `alter-table.md`.
This manual is too confusing and too big, I will reorganize this manual after.

This is the first part to implement the "overwrite load" feature mentioned in issue #2663.
I will implement the "load to temp partition" feature in next PR.

This CL also add GSON serialization method for the following classes (But not used):

```
Partition.java
MaterializedIndex.java
Tablet.java
Replica.java
```
2020-03-01 21:30:34 +08:00
0d1e28746e [Function] Support null_or_empty function (#2977)
It returns true if the string is empty or NULL. Otherwise it returns false.
2020-03-01 17:35:45 +08:00
078e35a62e Support Amazon S3 data source in Broker Load (#3004) 2020-03-01 12:53:50 +08:00
58b8e3f574 [Fs Block] Add block layer to storage-engine (#2983)
The abstraction of the Block layer, inspired by Kudu, lies between the "business
layer" and the "underlying file storage layer" (`Env`), making them no longer
strongly coupled.

In this way, for the business layer (such as `SegmentWriter`),
there is no need to directly do the file operation, which will bring better
encapsulation. An ideal situation in the future is: when we need to support a
new file storage system, we only need to add a corresponding type of
BlockManager without modifying the business code (such as `SegmentWriter`).

With the Block layer, there are some benefits:

1. First and foremost, the mapping relationship between data and `Env` is more
   flexible. For example, in the storage engine, the data of the tablet can be
   placed in multiple file systems (`Env`) at the same time. That is, one-to-many
   relationships can be supported. For example: one on the local and one on the
   remote storage.
2. The mapping relationship between blocks and files can be adjusted, for example,
   it may not be a one-to-one relationship. For example, the data of multiple
   blocks can be stored in a physical file, which can reduce the number of files
   that need to be opened during querying. It is like `LogBlockManager` in Kudu.
3. We can move the opened-file-cache under the Block layer, which can automatically
   close and open the files used by the upper layer, so that the upper business
   level does not need to be aware of the restrictions of the file handle at all
   (This problem is often encountered online now).
4. Better automatic cleanup logic when there are exceptions. For example, a block
   that is not closed explicitly can automatically clean up its corresponding file,
   thereby avoiding generating most garbage files.
5. More convenient for batch file creation and deletion. Some business operations
   create multiple files, such as compaction. At present, the processing flow that
   these files go through is executed one by one: 1) creation; 2) writing data;
   3) fsync to disk. But in fact, this is not necessary, we only need to fsync this
   batch of files at the end. The advantage is that it can give the operating system
   more opportunities to perform IO merge, thereby improving performance. However,
   this operation is relatively tedious, there is no need to be coupled in the
   business code, it is an ideal place to put it in the Block layer.

This is the first patch, just add related classes, laying the groundwork for later
switching of read and write logic.
2020-03-01 10:48:00 +08:00
f2d2e4bffd [Unused] Remove unused GC function in DataDir (#3019) 2020-02-28 21:47:41 +08:00
2ac07a8c07 [Doc] Fix docs mixed Chinese and English (#3017) 2020-02-28 16:36:37 +08:00
bd23f2cda2 [MaterializedView] Fix bug that result is double when new mv selector is enable (#3012)
The issue is #3011.
Reset the tablet and scan range info before compute it.
The old rollup selector has computed tablet and scan range info.
Then the new mv selector maybe compute tablet and scan range info again sometimes.
So, we need to reset those info in here.

Before this commit, the result is double when query is "select k1 ,k2 from aggregate_table "
2020-02-27 18:19:34 +08:00
3b5a0b6060 [TPCDS] Implement the planner for set operation (#2957)
Implement intersect and except planner.
This CL does not implement intersect and except node in execution level.
2020-02-27 16:03:31 +08:00
d2d95bfa84 [segment_v2] Switch to Unified and Extensible Page Format (#2953)
Fixes #2892 

IMPORTANT NOTICE: this CL makes incompatible changes to V2 storage format, developers need to create new tables for test.

This CL refactors the metadata and page format for segment_v2 in order to
* make it easy to extend existing page type
* make it easy to add new page type while not sacrificing code reuse
* make it possible to use SIMD to speed up page decoding

Here we summary the main code changes
* Page and index metadata is redesigned, please see `segment_v2.proto`
* The new class `PageIO` is the single place for reading and writing all pages. This removes lots of duplicated code. `PageCompressor` and `PageDecompressor` are now useless and removed. 
* The type of value ordinal is changed from `rowid_t` to 64-bits `ordinal_t`, this affects ordinal index as well.
* Column's ordinal index is now implemented by IndexPage, the same with IndexedColumn.
* Zone map index is now implemented by IndexedColumn
2020-02-27 15:09:57 +08:00