Commit Graph

354 Commits

Author SHA1 Message Date
2cb82c57bb Fix bug that <=> operator and in operator get wrong result (#1516)
* Fix bug that <=> operator and in operator get wrong result

* Add some comment to get_result_for_null

* Add an new Binary Operator to replace is_safe_for_null for handleing '<=>' operator

* Add EQ_FOR_NULL to TExprOpcode

* Remove macro definition last backslash
2019-07-30 11:17:53 +08:00
97718a35a2 Do not get file size in Broker openReader() method (#1560)
The file is already got when listing files.
Get file size in openReader() again is unnecessary and inefficient.
2019-07-29 23:05:01 +08:00
0694b6a6fa Fix bugs of Broker load (#1546)
Use same UUID as query ID and load ID of a load execution plan.
Each load execution plan has a load ID, and as a plan, there is also a query ID.
We can use same UUID as query ID and load ID, for tracing the load process more easily.

Change the load ID when retrying a load execution plan.
When a load execution plan retry, the load ID should be changed, otherwise BE can not
distinguish the old and new load requests.

Cancel the running loading task when cancelling the broker load.
When user cancel a broker load, the running loading task should also be cancelled, or
it may occupies the worker thread for a long time.

Remove the unnecessary query report when doing load execution plan.
Only the last query report is needed.

Add a new BE config tablet_writer_rpc_timeout_sec.
It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing
about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading.

Use streaming_load_max_mb instead of mini_load_max_mb in BE config.

Add more logs for tracing a broker load process easily.
2019-07-27 20:17:05 +08:00
6c8d34fa70 Fix bug which make BE crash when load HLL type (#1552) 2019-07-26 11:22:08 +08:00
e8561d71a6 Add dict page (#1409)
Add dict encoding page for binary/string type data. 
Construct a dict for original data, and save encoded id instead of 
origin data to save space. If the dict is too big, then will automatically fall
back to plain encoding.
2019-07-26 09:47:11 +08:00
dbc912d2df Unify ColumnSchemaV2 and ColumnSchema to one (#1545)
Currently, we have two versions of ColumnSchema, in this patch, we unify
these two classes to one class.
2019-07-25 10:48:16 +08:00
8160232097 Fix miss delete predicate when clone (#1541)
related issue #1539
2019-07-25 09:16:44 +08:00
0805b05d81 Remove unused FieldInfo (#1540) 2019-07-24 19:33:30 +08:00
fde3941185 Remove unused code (#1537) 2019-07-24 14:48:01 +08:00
a6f0b5c789 Change RowsetWriter num_rows() return int64_t (#1535) 2019-07-24 10:44:38 +08:00
9e2b93a8e2 Fix rowset build validate failure (#1532)
The reason for validate failure is the cloned file's names
may conflict and load segment read file througth cache and
cache key is file name, so index may read wrong file. The
solution is load index without use file handle cache.
2019-07-24 10:08:39 +08:00
68782be7a6 Refactor storage aggregate framework (#1529)
Add AggregateInfo to enclose all functions that used to aggregate value
column.
2019-07-24 10:02:35 +08:00
a88b55e649 Add more logs and metrics to trace the broker load process (#1530)
The Operator wants to known when the job being scheduled as PENDING
and LOADING. And how long it takes to finish these sub states.

Also add 2 metrics on BE to monitor the memtable's flush time.
`memtable_flush_total` and `memtable_flush_duration_us`
2019-07-23 21:42:44 +08:00
69040572fb Use different ID instead of table ID for base index of an OLAP table (#1524) 2019-07-23 15:48:45 +08:00
c34b35e6c4 Add ALTER_TABLET task in be (#1497)
This a for the new implementation of alter table process.
2019-07-23 15:16:21 +08:00
4aedaea84e Support TIME type and timediff function (#1505) 2019-07-23 13:42:39 +08:00
0c8e91adf4 Add storage rowwise iterator (#1515)
Use RowwiseIterator to uniform all data fetch in storage engine.
All objects in storage engine can be read in iterator format.
For example: Segment, Rowset.

This patch implement two generic iterators: UnionRowwiseIterator,
MergeRowwiseIterator. These two class will add iterator as its inputs.

To implement iterators, we define a new class RowBlockV2, all data read
from iterator is in this format. We define a new class other than use
old version's RowBlock is because we want to keep old code work
normally.
2019-07-22 14:35:11 +08:00
7b019ab37f Fix bug that WrapperField does not consider HLL column type when creating (#1514)
This bug may cause BE crash when handling HLL column in some process.
This bug is introduced by code merge. Version 0.10 does not has this bug.
2019-07-19 18:19:23 +08:00
74eb43206d Fix segment group add zone check bug and remove unused meta log (#1513) 2019-07-19 17:03:19 +08:00
227af49331 Fix rollup bug when init RowCursor in MergeContext (#1510)
When doing rollup, seek_columns equals to the complete set of tablet's columns.
There is no necessity to set it.
Related to commit 36df6ebe4e5f0abd3f07c1e454710590f1de23c7
2019-07-19 14:32:58 +08:00
6c1f95c3a0 Fix bug that BE may crash when closing OlapTableSink (#1507)
The `_profile` in OlapTableSink may not be initialized if `prepare()`
method is not called. So when close the OlapTableSink, we should
check if `_profile` is initialized.
2019-07-19 10:30:44 +08:00
556299aae9 Remove query status report from BE when query is cancelled normally (#1489)
When query result reach limit, the Coordinator in FE will send a cancel
request to BE to cancel the query. And when being cancelled, BE will report
query status to FE for debug purpose. But actually it is not necessary
and will generate too many logs.

So I add a CancelReason to distinguish the difference between 'normally'
cancellation and 'internal error' cancellation. if 'normally' cancelled,
no status will be reported from BE.

When query reach limit, or user cancel it actively, it is being cancelled 'normally'.
Otherwise, the query is cancelled due to internal error, which will need
a report from BE.
2019-07-19 09:36:01 +08:00
36df6ebe4e Fix rollup bug when init RowCursor (#1502)
When doing rollup, seek_columns equals to the complete set of tablet's columns.
There is no necessity to set it.
2019-07-18 18:06:17 +08:00
41499061ac Refactor types.h to reduce code and add UT (#1498) 2019-07-18 12:24:41 +08:00
24592e1124 Add log to trace writer validate failure (#1496)
AlphaRowsetWriter validate rowset failed when build rowset
because rowset's num_rows is not equal to segment groups'
num_rows when add_rowset api is called. So add some log to
trace the process to debug the problems. The logs will be
deleted in the future.
2019-07-18 11:24:41 +08:00
755b12cd75 Add partition id to tablet meta in be (#1490)
FE uses partition_id to publish version. BE should check whether all tablets related with this partition have the version. But Tablet in BE does not have partition id in its metadata. So that BE could not check it.

This patch will add partition id to tablet meta during report task.
Sync at most 10k tablets during set tablet meta.
2019-07-17 14:07:55 +08:00
4e043e66e2 Modify the result json format of mini load (#1487)
Mini load is now using stream load framework. But we should keep the
mini load return behavior and result json format be same as old.
So PUBLISH_TIMEOUT error should be treated as OK in mini load.

Also add 2 counters for OlapTableSink profile:
SerializeBatchTime: time of serializing all row batch.
WaitInFlightPacketTime: time of waiting last send packet
2019-07-16 19:15:41 +08:00
a9e8113b82 Fix heap-buffer-overflow in split_part() function in StringFunctions (#1482) 2019-07-15 23:00:37 +08:00
6c246418fb Add timeout in stream load planner (#1480)
Mini load timeout needs to be added in plan options.
The timeout property has been added in request of process put.
Otherwise, the timeout of mini load is useless.

Add log of label, txn and query id in mini load
2019-07-15 22:14:59 +08:00
d61a2daeea Remove unused code (#1483) 2019-07-15 21:59:06 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
ae6f2d99c5 Fix bug when use SELECT * FROM TABLE LIMIT 1 (#1469) 2019-07-13 23:57:14 +08:00
aff1559c4d FixBug: if columns of doris table less than parquet file columns , BE will be crash (#1464) 2019-07-12 15:23:13 +08:00
734032d917 Fix the error unit of create timestamp in mini load (#1460)
The unit of old create timestamp is micros while the unit of create timestamp in fe is millisecond.
2019-07-11 19:29:18 +08:00
a7390c03f4 Add percentile_approx aggregate function (#1432) 2019-07-11 16:44:43 +08:00
b9c79d4b1b Fix importing non-parquet format file causing be crash (#1454) 2019-07-11 16:04:36 +08:00
941dec215b Add utc_timestamp function (#1456) 2019-07-11 11:09:08 +08:00
51c92a0bec Validate the UTF-8 encode of loading data (#1457)
Currently, Doris only support UTF-8 encoded data. All data will be
shown to user in UTF-8 format. So if data loaded in Doris does not
UTF-8 encoded, user will see garbled data when querying.

I introduce a fast UTF-8 validator from

    https://github.com/lemire/fastvalidate-utf-8

This validator is highly optimized that it only takes 0.7 CPU cycles
to validata a 64k string. And by testing 1GB data load to Doris, the
validator has no impact on performance.
2019-07-11 09:46:38 +08:00
98bd4b4565 Add string function split_part (#1451) 2019-07-10 09:47:33 +08:00
615c979727 Fix bug that BE crashes when inserting null value to non-nullable columns (#1447) 2019-07-10 09:20:09 +08:00
67b370a1ed Add ColumnBlock (#1450)
Use ColumnBlock to read data from Page.
2019-07-09 21:52:27 +08:00
ded60e59f9 Add a configuration to modify the reverse time of load error log (#1433)
Currently, the load error log on BE will be cleaned along with the
intermediate data of load, configured by 'load_data_reserve_hours'.
Sometimes user want to reserve the error log for longer time.
2019-07-09 10:36:13 +08:00
7eab12a40e Support reading Parquet file when loading data (#1173) 2019-07-01 18:39:27 +08:00
b0af97d8aa Change error msg of mini load when PUBLISH_TIMEOUT (#1415) 2019-07-01 16:05:49 +08:00
8a10bf0f89 Fix binary plain page relocate bug (#1410) 2019-06-29 11:19:43 +08:00
1ff1722d93 Fix the core in dpp sink by sum of int128 (#1412) 2019-06-28 23:30:33 +08:00
5c1b4f641e Add report version for publish task (#1401) 2019-06-28 20:15:08 +08:00
4747bed306 Add rle page (#1379) 2019-06-27 22:25:19 +08:00
b17d1c5348 Fix a bug of v2 ColumnReader when reading not-null column (#1398) 2019-06-26 22:58:30 +08:00
e046f7b05a Add plain page (#1341) 2019-06-26 00:50:50 +08:00