This commit enable the avg operator in fe instead of converting the avg function into sum/count.
Also, this commit fix the bug of deciamlv2 avg which cause the core in be.
The int128 could not be assinged directly.
The speed of avg function is similar to sum function after enhancement.
In this change list
1. validate HLL column when loading data, if data is invalid, this row
will be filtered.
2. seems as empty HLL when serializing invalid type of HLL data, with
this change, all ingested data will be valid.
3. seems as empty HLL when deserializing nullptr or invalid type of HLL data.
With this change, dirty data can be handled normally.
4. rename function empty_hll to hll_empty.
5. disable memtable_flush_execute_test because this will fails
sometimes. When tearing down, some thread is not joined, and they will
visit destroyed resource, which is invalid.
Sometime the broker writer failed to close, but we do not handle this failure.
This may create an empty file on remote storage but be treated as normal.
Also enhance some usabilities:
1. getting latest 2000 transactions instead of getting the earliest.
2. Show backend which download and upload tasks are being executed.
Each load job has several load tasks, and each task is a query plan
with serveral plan fragments. Each plan fragment report query profile
independently.
So we need to collect each plan fragment's report, separately.
The current load process is:
Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk
In the path of Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk, the following operations are performed:
Insert tuple into different memtables according to tablet ID
When the memtable size reaches the threshold, it is written to disk.
The above operations are equivalent to single thread execution for a single load task.
In fact, the insertion of memtable and the flush of memtable can be executed synchronously.
Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing.
In the new implementation, I added a MemTableFlushExecutor class with a set of flush queues and corresponding worker threads.
By default, each data directory uses two worker threads for flush, which can be modified by the parameter flush_thread_num_per_store of BE.
DeltaWriter will push the full memtable to MemTableFlushExecutor for flush operation and generate a new memtable for receiving new data.
This design can improve the performance of load large files.
In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.
The function named assign conjuncts has been invoked before creating aggregation plan node.
If the empty set node is the child of aggregation node, the conjuncts will be assign to empty set node which could not be executed correctly in Backend.
It will thrown the exception "couldn't resolve slot descriptor" for query which has both empty set node and aggregation node.
For example: select sum(pv) from test where type != 1 and 1=0 group by type;
This commit fix this bug. It remove conjuncts for empty set node.
Reproduce:
1. start a routine load, send a routine load task to BE
2. BE executes task successfully and commit to FE.
3. Commit request failed on FE because database is renamed(throw db not found exception)
4. After commit failed, BE will send rollback request to FE.
5. FE receive this rollback request and mistakenly update the routine load progress,
because the number of loaded rows in this rollback request's attachment is larger than 0
Random distribution is no longer supported since version 0.9.
And we need a way to convert the random distribution to hash distribution.
ALTER TABLE db.tbl SET ("distribution_type" = "hash");
1. Support specifying label to Insert Into stmt.
INSERT INTO tbl1 WITH LABEL label1 ...;
2. Return job' state corresponding to the existing label in result of stream load.
...
"Status": "Label Already Exists",
"ExistingJobStatus": "FINISHED"
...
3. Return the recent 2000 transactions in SHOW PROC '/transactions'
ISSUES-1725: The result of union stmt whose child is outer join stmt is incorrect.
Example:
sql: (select k1 from empty) union all (select b.k1 k1 from left_table a left join empty b on a.k2 = b.k2);
context: the empty table has no data.
error result: 0
expect result: null
Reason:
The judgment (columns k1 who belongs to union tuple is nullable ) is incorrect.
It could not be determined by slot attribute of children when the slot is produced by the outer join.
The slot A is not nullable while the result of outer join is nullable which is same as slot A.
So, the judgment needs to consider if the slot is come from the outer join.