Commit Graph

116 Commits

Author SHA1 Message Date
3f22238012 Add check for to_bitmap function argument (#1747) 2019-09-05 18:11:38 +08:00
0dc0dadad1 Reduce unnecessary memory allocat and copy in OlapScanNode (#1742) 2019-09-04 21:05:12 +08:00
9f5e5717d4 Unify the msg of 'Memory exceed limit' (#1737)
The new msg of limit exceed: "Memory exceed limit. %msg, Backend:%ip, fragment:%id Used:% , Limit:%. xxx".
This commit unifies the msg of 'Memory exceed limit' such as check_query_state, RETURN_IF_LIMIT_EXCEEDED and LIMIT_EXCEEDED.
2019-09-03 10:42:16 +08:00
8034d83e20 Add scroll keepalive and http timeout configuration (#1731) 2019-09-02 19:04:30 +08:00
81ca3e3abf Free olap scanner out of lock (#1733)
Close scanner out of OlapScanner's batch lock,
which will lead all scanners wait for one scanner to finish.
2019-09-02 16:49:28 +08:00
3a33f3d350 Make bitmap_union agg column support insert into and broker load (#1721) 2019-08-30 14:44:51 +08:00
c6dfe83b6d Add particular log info for doris on es (#1711) 2019-08-27 22:16:28 +08:00
dc2d49fe07 Make StringValue's memory layout same with Slice (#1712)
In our storage engine's code, we cast StringValue to Slice. Because
their memory layout is different, it may cause BE process crash.

We make their memory layout same in this patch to resolve this problem
temporary. We should improve it some day.
2019-08-27 22:15:46 +08:00
a1b92768dd Add a loaded rows in SHOW LOAD result (#1686)
Loaded rows will be updated periodically by query report. So that
user can see that a load job is still running or being blocked.
2019-08-27 14:13:47 +08:00
1e4dd77d2a Add bitmap agg type and udaf (#1610) 2019-08-26 14:24:42 +08:00
4449316d85 Add error msg when memory limit exceeded (#1685) 2019-08-23 11:13:01 +08:00
0a27ef030b Reduce the number of partition info in BrokerScanNode param (#1675)
And we should reduce the number of partition info in BrokerScanNode param if user already
set target partitions to load, instead of adding all partitions' info.
It will cause the size of RPC packet too large.
2019-08-20 19:30:57 +08:00
cd2b8373c2 Fix Stream load double NumberTotalRows (#1664) 2019-08-19 12:23:43 +08:00
ba6d728f26 Enable parsing columns from file path for Broker Load (#1582) (#1635)
Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv

This patch is able to parse columns from file path like in Spark(Partition Discovery).

This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.
2019-08-19 09:39:21 +08:00
6d73658207 Support checking error data row when doing INSERT (#1597)
If strict mode is true, and at least one row is filtered, the insert operation will fail and a url will be given to get the error rows.

```
ERROR 1064 (HY000): all partitions have no load data. url: http://host:ip/api/_load_error_log?file=__shard_2/error_log_insert_stmt_e0a620e93dc54461-b89ec64768367d25_e0a620e93dc54461_b89ec64768367d25
```

 If all rows are good, insert will return OK with affected rows:

```
Query OK, 1 row affected (0.26 sec)
```

If strict mode is false, and at least one row is good, the insert operation will return OK with affected rows and warnings. If has error row num, a label will be returned:

```
Query OK, 1 row affected, 1 warning (0.32 sec)
{'label':'7d66c457-658b-4a3e-bdcf-8beee872ef2c'}
```
2019-08-16 21:40:29 +08:00
57a1a718c7 print logs when parse scroll result failed (#1661) 2019-08-16 17:48:23 +08:00
85e89b79d5 Print src tuple in error_sample file (#1641)
The src tuple could not be print in error_sample file when the value is filtered by strict mode.
This commit fix this issue.
2019-08-14 19:58:09 +08:00
69af50aa8c Time zone related BE function (#1598)
Details can be found in time-zone.md document
2019-08-12 20:57:59 +08:00
e3348c46a9 Expose data pruned-filter-scan ability (#1527) 2019-08-11 12:59:24 +08:00
a6d3099a68 Fix bug: localtime is not thread-safe,then changed to localtime_r. (#1614) 2019-08-08 22:00:43 +08:00
f4ad2381e6 Fix error DCHECK for partition_columns (#1606) 2019-08-08 16:29:08 +08:00
dc4a5e6c10 Support Decimal Type when load Parquet File (#1595) 2019-08-07 19:52:23 +08:00
9402456f5b Fix parquet directory have empty file (#1593) 2019-08-07 15:08:22 +08:00
93a3577baa Support multi partition column when creating table (#1574)
When creating table with OLAP engine, use can specify multi parition columns.
eg:

PARTITION BY RANGE(`date`, `id`)
(
    PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
    PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
    PARTITION `p201703_all`  VALUES LESS THAN ("2017-04-01")
)

Notice that load by hadoop cluster does not support multi parition column table.
2019-08-05 16:16:43 +08:00
9128af6499 Broker load hang when rpc failed (#1567)
Broker load hang on broker reader when the thrift request between broker and be is failed.
2019-07-31 19:03:38 +08:00
c5edf9dae0 Unify Field and ColumnSchema in Storage (#1561)
Currently, we have Field and ColumnSchema to access column data in a
row. These two classes are mostly the same. So we should unify these to
one class. Now, Field has offset information, which is an row attribute,
so we remove offset in Field.

RowCursor now has some logic which belong to Schema, so in this patch I
add Schema attribute to RowCursor to make RowCursor simple. After this
change, only Schema will handle Field/ColumnSchema.

I extract some logic from RowCursor to be/src/olap/row.h, then we can
use same logic to handle different types of row. Each type of row has
same function that to get Cell of this row. A cell represent a column
content with a null indicator.
2019-07-30 14:01:57 +08:00
97718a35a2 Do not get file size in Broker openReader() method (#1560)
The file is already got when listing files.
Get file size in openReader() again is unnecessary and inefficient.
2019-07-29 23:05:01 +08:00
0694b6a6fa Fix bugs of Broker load (#1546)
Use same UUID as query ID and load ID of a load execution plan.
Each load execution plan has a load ID, and as a plan, there is also a query ID.
We can use same UUID as query ID and load ID, for tracing the load process more easily.

Change the load ID when retrying a load execution plan.
When a load execution plan retry, the load ID should be changed, otherwise BE can not
distinguish the old and new load requests.

Cancel the running loading task when cancelling the broker load.
When user cancel a broker load, the running loading task should also be cancelled, or
it may occupies the worker thread for a long time.

Remove the unnecessary query report when doing load execution plan.
Only the last query report is needed.

Add a new BE config tablet_writer_rpc_timeout_sec.
It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing
about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading.

Use streaming_load_max_mb instead of mini_load_max_mb in BE config.

Add more logs for tracing a broker load process easily.
2019-07-27 20:17:05 +08:00
a88b55e649 Add more logs and metrics to trace the broker load process (#1530)
The Operator wants to known when the job being scheduled as PENDING
and LOADING. And how long it takes to finish these sub states.

Also add 2 metrics on BE to monitor the memtable's flush time.
`memtable_flush_total` and `memtable_flush_duration_us`
2019-07-23 21:42:44 +08:00
69040572fb Use different ID instead of table ID for base index of an OLAP table (#1524) 2019-07-23 15:48:45 +08:00
6c1f95c3a0 Fix bug that BE may crash when closing OlapTableSink (#1507)
The `_profile` in OlapTableSink may not be initialized if `prepare()`
method is not called. So when close the OlapTableSink, we should
check if `_profile` is initialized.
2019-07-19 10:30:44 +08:00
4e043e66e2 Modify the result json format of mini load (#1487)
Mini load is now using stream load framework. But we should keep the
mini load return behavior and result json format be same as old.
So PUBLISH_TIMEOUT error should be treated as OK in mini load.

Also add 2 counters for OlapTableSink profile:
SerializeBatchTime: time of serializing all row batch.
WaitInFlightPacketTime: time of waiting last send packet
2019-07-16 19:15:41 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
ae6f2d99c5 Fix bug when use SELECT * FROM TABLE LIMIT 1 (#1469) 2019-07-13 23:57:14 +08:00
aff1559c4d FixBug: if columns of doris table less than parquet file columns , BE will be crash (#1464) 2019-07-12 15:23:13 +08:00
b9c79d4b1b Fix importing non-parquet format file causing be crash (#1454) 2019-07-11 16:04:36 +08:00
51c92a0bec Validate the UTF-8 encode of loading data (#1457)
Currently, Doris only support UTF-8 encoded data. All data will be
shown to user in UTF-8 format. So if data loaded in Doris does not
UTF-8 encoded, user will see garbled data when querying.

I introduce a fast UTF-8 validator from

    https://github.com/lemire/fastvalidate-utf-8

This validator is highly optimized that it only takes 0.7 CPU cycles
to validata a 64k string. And by testing 1GB data load to Doris, the
validator has no impact on performance.
2019-07-11 09:46:38 +08:00
615c979727 Fix bug that BE crashes when inserting null value to non-nullable columns (#1447) 2019-07-10 09:20:09 +08:00
7eab12a40e Support reading Parquet file when loading data (#1173) 2019-07-01 18:39:27 +08:00
1422414e43 Add varchar column name to stream load error msg (#1366) 2019-06-24 14:52:59 +08:00
687d57be66 Fix bug that query statistics in audit log are wrong (#1354) 2019-06-21 19:16:05 +08:00
ba44249f80 Remove unused code (#1320) 2019-06-15 20:41:48 +08:00
30028bc35b Deny specify partition for unpartitioned table (#1319) 2019-06-15 18:19:56 +08:00
9d03ba236b Uniform Status (#1317) 2019-06-14 23:38:31 +08:00
53062122ea Change strategy of incorrect data (#1255)
This change adds a load property named strict_mode which is used to prohibit the incorrect data.
When it is set to false, the incorrect data will be loaded by NULL just like before.
When it is set to true, the incorrect data which belongs to a column without expr will be filtered.
The strict_mode is supported in broker load v2 now. It will be supported in stream load later.
2019-06-10 20:39:45 +08:00
e4e04e8203 Make LZO support optional (#1263) 2019-06-07 22:26:54 +08:00
934ca2481a Make MySQL support optional (#1248) 2019-06-05 12:28:15 +08:00
9f5f44ec48 Reduce memory RowBlock needed (#1238)
Before RowBlock will reserve memory for all columns in schema, even if
it is not queried. Which will cause bad performance when quering wide
table.

In this patch, RowBlock will reserve memory for needed columns. In a
case, this reduce ConvertBatchTime from 10s to 60ms when quering a wide
table who has 178 columns.

 #1236
2019-06-04 12:58:41 +08:00
85b4619d54 Change insert into to streaming (#1191)
The non-streaming hint of insert into will use the streamin plan which is same as the plan of stream insert.
It will also record the load info and return the label of insert stmt.
The partition is supportted in insert into stmt. The result which meet the target partitions will be loaded.
The introduction of example has been changed especially non-streaming insert.
Also, the param of partition_names is added in sql syntax which is used to declare the target partition_names in target table.

Change META_VERSION to 50
2019-05-23 20:53:30 +08:00
02f36c23ed Set tablet as bad when loading index failed (#1146)
Bad tablet will be reported to FE and be handled

And add a config auto_recover_index_loading_failure to control the index loading failure processing
2019-05-13 10:22:04 +08:00