Commit Graph

1021 Commits

Author SHA1 Message Date
4e8d728e75 Remove unused code and unnecessary check (#1918) 2019-09-30 18:35:30 +08:00
8aa8e08f27 v2 segment support string encode(#1766) (#1816)
major change

change data format of binary dict page, appending (dict page data) and (dict page offset) to binary dict page;
add new decoding method for new binary dict page format
add ut for segment test
set the elements of initial array to 0 ,when calling arena.AllocateNewBlock
hard code way to choose dict coding for string
0919 commit major change

change dict file format:when saving binary dict page, separate dict page from dict page,one dict page may have multi data pages;when reading a binary dict page,one ColumnReader keeps one dict page
loading dict when calling column_reader._read_page
3.rollback BinaryDictPage
no longer using memset(0) to inital column_zonemap.max_value
0926 17 commit major change

init column_zone_map min value column_zone_map slice's data array;
set char/varchar column_zone_map'max value size to 0
add ut for char column zone map query hit/miss
0929 10 commit major change

allocate mem for column_zone_map 's max and min value
direct copy content to column_zone_map's max and min value
2019-09-30 16:25:31 +08:00
69d0a34bfd Remove unused _request_columns_size from olap_scanner (#1916) 2019-09-30 15:25:10 +08:00
2cecf5901f Fix segment v2 bug (#1904) 2019-09-30 13:50:39 +08:00
262c7f4834 Make All BE UT pass in debug mode (#1913)
Fix OrdinalPageIndexTest
Fix ColumnReaderWriterTest
Fix binary_dict_page_test
Fix routine_load_task_executor_test
2019-09-29 19:37:51 +08:00
eca3b4bb8e Fix BetaRowsetTest in debug mode (#1912) 2019-09-29 18:20:20 +08:00
f852f50acb Improve unique id performance (#1911)
Remove the default constructor for UniqueID
Add a gen_uid method in UniqueId. If need to generate a new uid, users should call this api explicitly.
Reuse boost random generator not generate a new one every time.
2019-09-29 18:20:02 +08:00
8f016d3ab2 Make HLL be able to handle invalid data (#1908)
In this change list
1. validate HLL column when loading data, if data is invalid, this row
will be filtered.
2. seems as empty HLL when serializing invalid type of HLL data, with
this change, all ingested data will be valid.
3. seems as empty HLL when deserializing nullptr or invalid type of HLL data.
With this change, dirty data can be handled normally.
4. rename function empty_hll to hll_empty.
5. disable memtable_flush_execute_test because this will fails
sometimes. When tearing down, some thread is not joined, and they will
visit destroyed resource, which is invalid.
2019-09-29 10:55:23 +08:00
58f1d79597 Make batchEndId default value to zero instead (#1907) 2019-09-28 23:12:59 +08:00
bdd9c31766 Remove default value for HLL column (#1901)
1.fixed hll default column to no default value (#1901)
2. Don't allow insert stmt insert default values into Doris except hll_empty
2019-09-28 11:19:25 +08:00
de8f273217 Add hardware info in fe httpserver home page #1894 (#1896) 2019-09-28 11:17:08 +08:00
d3a445ee09 Fix memory_scratch_sink_test in debug mode (#1906) 2019-09-28 10:33:24 +08:00
1131f53420 Fix parquet_scanner_test in debug mode (#1900) 2019-09-28 01:15:33 +08:00
cafb9f1e62 Replace Arena with MemPool first step (#1899) 2019-09-28 01:12:22 +08:00
0c22d8fa08 Add frame_of_reference page (#1818) 2019-09-28 01:10:29 +08:00
e67b398916 Fix bug that backup may create an empty file on remote storage. (#1869)
Sometime the broker writer failed to close, but we do not handle this failure.
This may create an empty file on remote storage but be treated as normal.

Also enhance some usabilities:
1. getting latest 2000 transactions instead of getting the earliest.
2. Show backend which download and upload tasks are being executed.
2019-09-28 00:11:43 +08:00
1c229fbd92 Fix es_scan_reader_test in debug mode (#1905) 2019-09-28 00:02:30 +08:00
ec3aa03c45 Add more routine load example (#1902) 2019-09-27 20:42:52 +08:00
2f0808137a Refactor FrontendHelper (#1888) 2019-09-27 13:21:14 +08:00
ee59b18daa Change atomic_int64_t to atomic<int64_t> (#1890)
atomic_int64_t is not available in gcc5
2019-09-26 20:57:13 +08:00
b970290ae4 Reduce memory usage of View object (#1878) 2019-09-26 14:57:46 +08:00
2ea7de8b5e Update some docs (#1882) 2019-09-26 14:43:55 +08:00
b246d93128 Avoid SerDe for aggregation query with object pool (#1854) 2019-09-26 13:51:13 +08:00
7df1418ff4 Check transaction_id in TClearTransactionTaskRequest (#1872) 2019-09-26 10:15:43 +08:00
5d1165fad2 Fix direct compilation failed #1862 (#1875)
Fix direct compilation failed:

fix compile thirdparty in ubuntu will install libs to lib dir instead of lib64
fix compile error in gcc5 due to the defect of c++11 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60970)
fix gcc version check will not work on some OS
2019-09-26 09:34:41 +08:00
f3bbdfe7d3 Fix bug that load statistic in show load result is incorrect (#1871)
Each load job has several load tasks, and each task is a query plan
with serveral plan fragments. Each plan fragment report query profile
independently.
So we need to collect each plan fragment's report, separately.
2019-09-25 22:56:59 +08:00
ce6fb1cfba Fix bug: broker load not support inline function in hll_hash (#1873)
hll_hash should support the inline function in broker load and should not support the inline function in hadoop load.
2019-09-25 22:00:02 +08:00
09482c9f52 Take segments in singleton rowset into consideration upon cumulative compaction (#1866)
In previous compaction, only rowsets will be taken into consideration.
Doing streaming load, the singleton rowset may is made up of many overlapping segments.
Scanning these overlapping segments will result in read amplification.
To address this problem, overlapping segments should be taken into consideration
when doing cumulative compaction to reduce read amplification.
2019-09-25 15:27:44 +08:00
e43f1a2766 Fix NPE error when creating table with bool column (#1864) 2019-09-25 14:40:13 +08:00
eb840ecca8 Support boolean/date/datetime/decimal types in segment V2 (#1863) 2019-09-25 13:53:00 +08:00
c643cbd30c Optimize the load performance for large file (#1798)
The current load process is:

Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk

In the path of Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk, the following operations are performed:

Insert tuple into different memtables according to tablet ID
When the memtable size reaches the threshold, it is written to disk.
The above operations are equivalent to single thread execution for a single load task.
In fact, the insertion of memtable and the flush of memtable can be executed synchronously.
Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing.

In the new implementation, I added a MemTableFlushExecutor class with a set of flush queues and corresponding worker threads.
By default, each data directory uses two worker threads for flush, which can be modified by the parameter flush_thread_num_per_store of BE.
DeltaWriter will push the full memtable to MemTableFlushExecutor for flush operation and generate a new memtable for receiving new data.

This design can improve the performance of load large files.
In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.
2019-09-25 13:49:32 +08:00
dd02382abd Check buckets limit: buckets > 0 when adding partition (#1855) 2019-09-25 13:02:09 +08:00
c2de62d6a1 Collect scanner's status when es_http_scan_node close (#1861) 2019-09-25 12:20:13 +08:00
40b9c3571b Support hll_empty function (#1825) 2019-09-25 09:28:02 +08:00
533a2e0f94 Optimize memory usage in wrapper field #1852 (#1853) 2019-09-25 09:25:54 +08:00
0b15d26b6c Fix segment V2 estimate size inaccuracy (#1858) 2019-09-24 20:13:15 +08:00
8d0fee7e64 Add default value column iterator #1834 (#1835) 2019-09-24 14:39:10 +08:00
fe27969978 add delete predicate filter(#1636) (#1745)
Delete predicate can be used to prune data by zone map.
2019-09-24 14:38:19 +08:00
b756dfd90b Fix bug: compare column with equals rather than == (#1850) 2019-09-24 09:40:11 +08:00
c3fccb7a49 Support cast datetime to decimal (#1849) 2019-09-23 19:56:20 +08:00
fded13e3cd Fix bug: Enable StringLiteral cast to Varchar (#1846)
StringLiteral could be cast to VARCHAR or CHAR.
The default value of lead and lag function could be 'String' when the column type is CHAR or VARCHAR.
2019-09-23 18:42:25 +08:00
4c7b52d077 Fix bug: Remove conjuncts for empty set node (#1840)
The function named assign conjuncts has been invoked before creating aggregation plan node.
If the empty set node is the child of aggregation node, the conjuncts will be assign to empty set node which could not be executed correctly in Backend.
It will thrown the exception "couldn't resolve slot descriptor" for query which has both empty set node and aggregation node.
For example: select sum(pv) from test where type != 1 and 1=0 group by type;

This commit fix this bug. It remove conjuncts for empty set node.
2019-09-23 15:09:04 +08:00
93fe10a268 Reduce size of HyperLogLog struct (#1845)
Now size of HyperLogLog struct is so large that it lead the rowset is
too small when ingesting data. In this CL, registers in HyperLogLog are
only created when it is needed. When ingesting data, it's normal case
that there are only few values in one HyperLogLog.
2019-09-21 14:38:58 +08:00
74d6d04e01 Fix two digit year bug in to_days function (#1839) 2019-09-20 22:59:05 +08:00
9036014954 Add schema change check for DUPLICATE KEY table (#1844) 2019-09-20 22:33:08 +08:00
cc36905aea Fix write file crash when using segment V2 in debug mode (#1841) 2019-09-20 20:37:29 +08:00
abd27dfcca Remove unused debug (#1836) 2019-09-20 09:31:56 +08:00
e8da855cd2 Support setting timezone for stream load and routine load (#1831) 2019-09-20 07:55:05 +08:00
7bf02d0ae7 Fix bug that routine load may mistakenly skipped some data (#1832)
Reproduce:
1. start a routine load, send a routine load task to BE
2. BE executes task successfully and commit to FE.
3. Commit request failed on FE because database is renamed(throw db not found exception)
4. After commit failed, BE will send rollback request to FE.
5. FE receive this rollback request and mistakenly update the routine load progress,
   because the number of loaded rows in this rollback request's attachment is larger than 0
2019-09-20 07:54:11 +08:00
720808fda5 Remove config::max_file_descriptor_number (#1833) 2019-09-20 07:50:57 +08:00