Commit Graph

17549 Commits

Author SHA1 Message Date
b246d93128 Avoid SerDe for aggregation query with object pool (#1854) 2019-09-26 13:51:13 +08:00
7df1418ff4 Check transaction_id in TClearTransactionTaskRequest (#1872) 2019-09-26 10:15:43 +08:00
5d1165fad2 Fix direct compilation failed #1862 (#1875)
Fix direct compilation failed:

fix compile thirdparty in ubuntu will install libs to lib dir instead of lib64
fix compile error in gcc5 due to the defect of c++11 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60970)
fix gcc version check will not work on some OS
2019-09-26 09:34:41 +08:00
f3bbdfe7d3 Fix bug that load statistic in show load result is incorrect (#1871)
Each load job has several load tasks, and each task is a query plan
with serveral plan fragments. Each plan fragment report query profile
independently.
So we need to collect each plan fragment's report, separately.
2019-09-25 22:56:59 +08:00
ce6fb1cfba Fix bug: broker load not support inline function in hll_hash (#1873)
hll_hash should support the inline function in broker load and should not support the inline function in hadoop load.
2019-09-25 22:00:02 +08:00
09482c9f52 Take segments in singleton rowset into consideration upon cumulative compaction (#1866)
In previous compaction, only rowsets will be taken into consideration.
Doing streaming load, the singleton rowset may is made up of many overlapping segments.
Scanning these overlapping segments will result in read amplification.
To address this problem, overlapping segments should be taken into consideration
when doing cumulative compaction to reduce read amplification.
2019-09-25 15:27:44 +08:00
e43f1a2766 Fix NPE error when creating table with bool column (#1864) 2019-09-25 14:40:13 +08:00
eb840ecca8 Support boolean/date/datetime/decimal types in segment V2 (#1863) 2019-09-25 13:53:00 +08:00
c643cbd30c Optimize the load performance for large file (#1798)
The current load process is:

Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk

In the path of Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk, the following operations are performed:

Insert tuple into different memtables according to tablet ID
When the memtable size reaches the threshold, it is written to disk.
The above operations are equivalent to single thread execution for a single load task.
In fact, the insertion of memtable and the flush of memtable can be executed synchronously.
Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing.

In the new implementation, I added a MemTableFlushExecutor class with a set of flush queues and corresponding worker threads.
By default, each data directory uses two worker threads for flush, which can be modified by the parameter flush_thread_num_per_store of BE.
DeltaWriter will push the full memtable to MemTableFlushExecutor for flush operation and generate a new memtable for receiving new data.

This design can improve the performance of load large files.
In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.
2019-09-25 13:49:32 +08:00
dd02382abd Check buckets limit: buckets > 0 when adding partition (#1855) 2019-09-25 13:02:09 +08:00
c2de62d6a1 Collect scanner's status when es_http_scan_node close (#1861) 2019-09-25 12:20:13 +08:00
40b9c3571b Support hll_empty function (#1825) 2019-09-25 09:28:02 +08:00
533a2e0f94 Optimize memory usage in wrapper field #1852 (#1853) 2019-09-25 09:25:54 +08:00
0b15d26b6c Fix segment V2 estimate size inaccuracy (#1858) 2019-09-24 20:13:15 +08:00
8d0fee7e64 Add default value column iterator #1834 (#1835) 2019-09-24 14:39:10 +08:00
fe27969978 add delete predicate filter(#1636) (#1745)
Delete predicate can be used to prune data by zone map.
2019-09-24 14:38:19 +08:00
b756dfd90b Fix bug: compare column with equals rather than == (#1850) 2019-09-24 09:40:11 +08:00
c3fccb7a49 Support cast datetime to decimal (#1849) 2019-09-23 19:56:20 +08:00
fded13e3cd Fix bug: Enable StringLiteral cast to Varchar (#1846)
StringLiteral could be cast to VARCHAR or CHAR.
The default value of lead and lag function could be 'String' when the column type is CHAR or VARCHAR.
2019-09-23 18:42:25 +08:00
4c7b52d077 Fix bug: Remove conjuncts for empty set node (#1840)
The function named assign conjuncts has been invoked before creating aggregation plan node.
If the empty set node is the child of aggregation node, the conjuncts will be assign to empty set node which could not be executed correctly in Backend.
It will thrown the exception "couldn't resolve slot descriptor" for query which has both empty set node and aggregation node.
For example: select sum(pv) from test where type != 1 and 1=0 group by type;

This commit fix this bug. It remove conjuncts for empty set node.
2019-09-23 15:09:04 +08:00
93fe10a268 Reduce size of HyperLogLog struct (#1845)
Now size of HyperLogLog struct is so large that it lead the rowset is
too small when ingesting data. In this CL, registers in HyperLogLog are
only created when it is needed. When ingesting data, it's normal case
that there are only few values in one HyperLogLog.
2019-09-21 14:38:58 +08:00
74d6d04e01 Fix two digit year bug in to_days function (#1839) 2019-09-20 22:59:05 +08:00
9036014954 Add schema change check for DUPLICATE KEY table (#1844) 2019-09-20 22:33:08 +08:00
cc36905aea Fix write file crash when using segment V2 in debug mode (#1841) 2019-09-20 20:37:29 +08:00
abd27dfcca Remove unused debug (#1836) 2019-09-20 09:31:56 +08:00
e8da855cd2 Support setting timezone for stream load and routine load (#1831) 2019-09-20 07:55:05 +08:00
7bf02d0ae7 Fix bug that routine load may mistakenly skipped some data (#1832)
Reproduce:
1. start a routine load, send a routine load task to BE
2. BE executes task successfully and commit to FE.
3. Commit request failed on FE because database is renamed(throw db not found exception)
4. After commit failed, BE will send rollback request to FE.
5. FE receive this rollback request and mistakenly update the routine load progress,
   because the number of loaded rows in this rollback request's attachment is larger than 0
2019-09-20 07:54:11 +08:00
720808fda5 Remove config::max_file_descriptor_number (#1833) 2019-09-20 07:50:57 +08:00
315f762523 Seek block when starts a ScanKey (#1828)
In Doris, one block has 1024 rows.
1. If the previous ScanKey scan rows multiple blocks,
   and also the final block has 1024 rows just right.
2. The current ScanKey scan rows with number less than one block.
Under the two conditions, if not seek block, the position of prefix shortkey columns is wrong.
2019-09-19 20:08:03 +08:00
aaabf97471 Split channel close operation into two phase (#1830)
In this change, channel close is finished into two phases. So we can
close channels parallel, which can make query faster.
2019-09-19 18:14:30 +08:00
17e52a4bac Improve LRUCache to get better performance (#1826)
In this CL, I move the entry's deleter out of LRUCache's mutex block,
which can let others access this cache without waiting free cache entry.
2019-09-19 17:37:02 +08:00
e516eba940 Remove the "author" tag (#1829) 2019-09-19 16:59:08 +08:00
d1676c3c3d Check file descriptor number is larger than 65536 upon start (#1819) 2019-09-19 12:48:36 +08:00
e70e48c01e Add a ALTER operation to change distribution type from RANDOM to HASH (#1823)
Random distribution is no longer supported since version 0.9.
And we need a way to convert the random distribution to hash distribution.

    ALTER TABLE db.tbl SET ("distribution_type" = "hash");
2019-09-18 14:16:26 +08:00
714dca8699 Support table comment and column comment for view (#1799) 2019-09-18 09:45:28 +08:00
3f63bde5cb Fix 'Invalid Column Name' error when loading parquet file (#1820) 2019-09-17 21:17:55 +08:00
c4e28f0d13 Update FeConstants meta version to VERSION_62 (#1822)
This should be modified along with commit a232a56c0
2019-09-17 17:30:22 +08:00
dc813e6c61 Limit the max version to cumulative compaction (#1813) 2019-09-17 14:10:05 +08:00
054a3f48bc Add where expr in broker load (#1812)
The where predicate in broker load is responsible for filtering transformed data.
The docs of help and operator has been changed.
2019-09-17 11:32:40 +08:00
ede51da777 Resolve reduce/reduce conflict in our syntax (#1811) 2019-09-16 20:25:05 +08:00
973eff26cd Fix tablet meta tool command argument bug (#1810) 2019-09-16 17:40:23 +08:00
a232a56c06 Add parallel_exchange_instance_num to set parallel after exchange (#1788) 2019-09-16 16:41:14 +08:00
86feddb5d7 Fix bug that dead lock may happen when drop table during alter table process (#1800)
the cancel() function will try get database's write lock, while its caller may already
hold the database's read lock.
2019-09-16 00:12:00 +08:00
dcea6daf4f Fix Cluster meta write error (#1802) 2019-09-13 22:06:55 +08:00
11eafe524f Add ChunkAllocator to accelerate chunk allocation (#1792)
I add ChunkAllocator in this CL to put unused memory chunk to a chunk
pool other than return it to system allocator. Now we only change
MemPool's chunk allocation and free to this.

And two configuration are introduduced too. 'chunk_reserved_bytes_limit'
is the limit of how many bytes this chunk pool can reserve in total and
its default value is 2147483648(2GB). 'use_mmap_allocate_chunk': if
chunk is allocated via mmap and default value is false.

And in my test case with default configuration a simple like
"select * from table limit 10", this can improve throughput from 280 QPS
to to 650 QPS. And when I config 'chunk_reserved_bytes_limit' to 0,
which means this is disabled, the throughput is the same with origin's.
2019-09-13 08:27:24 +08:00
9aa2045987 Refactor alter job (#1695) 2019-09-12 16:31:29 +08:00
dad4def708 Support estimate size for v2 segment writer (#1787) 2019-09-12 15:15:39 +08:00
f58a222da7 Fix bug that the calculation of disk usage percent is wrong (#1791)
This bug may cause unable to load data
2019-09-12 14:37:20 +08:00
c354f30767 Fix mistake in docs (#1796) 2019-09-12 14:15:06 +08:00
348e2129b7 Initialize tablet uid not using default constructor for performance reason (#1795) 2019-09-12 12:59:16 +08:00