Files
doris/fe
Mingyu Chen bb00f7e656 [Load] Fix bug of wrong file group aggregation when handling broker load job (#2824)
**Describe the bug**

**First**, In the broker load, we allow users to add multiple data descriptions. Each data description
 represents a description of a file (or set of files). Including file path, delimiter, table and 
partitions to be loaded, and other information.

When the user specifies multiple data descriptions, Doris currently aggregates the data 
descriptions belonging to the same table and generates a unified load task.

The problem here is that although different data descriptions point to the same table, 
they may specify different partitions. Therefore, the aggregation of data description
 should not only consider the table level, but also the partition level.

Examples are as follows:

data description 1 is: 
```
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1")
INTO TABLE `tbl1`
PARTITION (p1, p2)
```

data description 2 is:
```
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2")
INTO TABLE `tbl1`
PARTITION (p3, p4)
```
What user expects is to load file1 into partition p1 and p2 of tbl1, and load file2 into paritition
p3 and p4 of same table. But currently, it will be aggregated together, which result in loading
file1 and file2 into all partitions p1, p2, p3 and p4.

**Second**, the following 2 data descriptions are not allowed:

```
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1")
INTO TABLE `tbl1`
PARTITION (p1, p2)
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2")
INTO TABLE `tbl1`
PARTITION (p2, p3)
```

They have overlapping partition(p2), which is not support yet. And we should throw an Exception
to cancel this load job.

**Third**, there is a problem with the code implementation. In the constructor of 
`OlapTableSink.java`, we pass in a string of partition names separated by commas. 
But at the `OlapTableSink` level, we should be able to pass in a list of partition ids directly,
 instead of names.


ISSUE: #2823
2020-02-03 20:15:13 +08:00
..