Commit Graph

1973 Commits

Author SHA1 Message Date
86d235a76a [Extension] Logstash Doris output plugin (#3800)
This plugin is used to output data to Doris for logstash
Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load
2020-06-11 08:54:51 +08:00
8caedadb67 use scoped_refptr to new HashIndex (#3818) 2020-06-10 23:47:10 +08:00
cd402a6827 [Restore] Fix error message not match of restore job when job is time out (#3798)
For the current code if a restore job is time out it will be reported as user canceled. This error message is very misleading
2020-06-10 23:12:04 +08:00
ef94c25773 [Bug]fix the crash of checksum task #3735 (#3738)
1. the table include key column of double/float type
2. when run checksum task, will use all of key columns to compare
3. schema.column(idx) of double/float type is NULL

#3735
2020-06-10 22:59:15 +08:00
4adc9d45c2 [Doc] Update ALTER TABLE.md 2020-06-10 22:58:29 +08:00
de91037d8c [Doc]Add some routine load docs (#3796)
Add some documentation about using routine load in the cloud environment
2020-06-10 22:57:00 +08:00
4cb5f7a535 [Config]Remove max_user_connections from config (#3805)
Update max_user_connections by user property:

```
set property `user` max_user_connections=100;
```
2020-06-10 22:56:05 +08:00
8c608bbad5 [Doris On ES] Skip function_call expr when process predicate (#3813)
[Doris On ES] Skip function_call expr when process predicate

Fixed #3801
Do not push-down function_call such as split_xxx when process predicate, Doris BE is responsible for processing these predicate

All rows in table:

```
+------+------+------+------------+------------+
| k1   | k2   | k3   | UpdateTime | ArriveTime |
+------+------+------+------------+------------+
| NULL | NULL | kkk1 |  123456789 |       NULL |
| kkk1 | NULL | NULL |  123456789 |       NULL |
| NULL | kkk2 | NULL |  123456789 |       NULL |
+------+------+------+------------+------------+
```

The following predicate could not push down to ES.

```
SQL 1:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk is not null;
+------+
| kk   |
+------+
| kkk  |
+------+
1 row in set (0.02 sec)

SQL 2:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > 'a';      
+------+
| kk   |
+------+
| kkk  |
+------+

SQL 3:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > '2';
+------+
| kk   |
+------+
| kkk  |
+------+
1 row in set (0.03 sec)
```
2020-06-10 11:22:53 +08:00
4fa9d8cbe9 [Spark load][Fe 3/5] Fe create job (#3715)
* Add create spark load job

* Remove unused import
2020-06-09 21:57:46 +08:00
5b1589498a [Bug] Fix SchemaChangeJobV2's meta persist bug (#3804)
1. Missing field `partitionIndexMap` in SchemaChangeJobV2
2. Pair in field `indexSchemaVersionAndHashMap` can not be persisted by GSON
3. Exit the FE process when replay edit log error.

Fix: #3802
2020-06-09 21:55:46 +08:00
acd7a58875 [Doris On ES] [1/3] Add ES QueryBuilders for debug mode (#3774) 2020-06-09 16:45:16 +08:00
8ada2559b7 [Bug] Fix bug that checkpoint thread failed to start (#3795)
1. Set thread id before starting the checkpoint thread
2. Init the CHECKPOINT catalog instance before visiting it.
2020-06-08 23:00:36 +08:00
559714f3d4 Fix largeint max min bug (#3793) 2020-06-08 21:01:30 +08:00
e4dc2ec440 [StorageEngine] Make StorageEngine::open return more detailed info (#3761)
StorageEngine::open just return a very vague status info when failed,
we have to check logs to find out the root reason, and it's not
convenient to check logs if we run unit tests in CI dockers.
It would be better to return more detailed failure info to point out
the root reason, for example, it may return error status with message
"file descriptors limit is too small".
2020-06-07 10:21:33 +08:00
928379c5d8 [Bug] Fix colocate group replay NPE (#3790)
Group id should also be persisted for replaying
2020-06-07 10:20:22 +08:00
ea5b3b2d4c [Bug] Fix bug that should not use "!=" to judge the equivalence of Type (#3786)
org.apache.doris.catalog.Type is not an enum, so should not judge the
equivalence of Type using "==" or "!="
2020-06-06 11:38:32 +08:00
a7bf006b51 Use BackendStatus to show BE's infomation in show backends; (#3713)
The infomation is displayed in JSON format.For example:
{"lastTabletReportTime":"2020-05-28 15:29:01"}
2020-06-06 11:37:48 +08:00
3b6a781862 [Bug] Fix a bug that tablet's _preferred_rowset_type may be modified to BETA_ROWSET after cloned (#3750)
TabletMeta's _preferred_rowset_type is not initialized after object constructing and
may be a random value, and this field is not updated when create ALPHA_ROWSET tablet,
and it will not be serialized into pb in this case. So if cloning an ALPHA_ROWSET
tablet from another BE, this new created local tablet's _preferred_rowset_type field
may be random as BETA_ROWSET and can not be overwrote after cloned, then new input
rows will be wrote as BETA_ROWSET format which is not we expect.
This patch fix this bug by giving _preferred_rowset_type a default value and updating
this field when create any type of tablet, and add an unit test and related overwrite
equal operator functions.
2020-06-06 11:36:28 +08:00
c51f20bb7a Disable Bitmap or Hll type in keys or in values with incorrect agg-type (#3768)
Bitmap and Hll type can not be used with incorrect aggregate functions, which will cause to BE crush.
Add some logical checks in FE's ColumnDef#analyze to avoid creating tables or changing schemas incorrectly.

Keys never be bitmap or hll type
values with bitmap or hll type have to be associated with bitmap_union or hll_union
2020-06-06 11:36:06 +08:00
173dd3953d [Code Refactor] Remove Catalog.getInstance() method (#3784)
Use Catalog.getCurrentCatalog() instead, to avoid potential meta operation error.
2020-06-06 11:35:01 +08:00
4cbce687b7 Add getValueFn and removeFn to properties (#3782) 2020-06-06 11:34:32 +08:00
0f6e74f3f9 [BUG] Fix location url in agg_fn_evaluator (#3780) 2020-06-06 11:34:12 +08:00
5abef19be4 [Doris On ES] Add more detailed error message when fail to create es table (#3758) 2020-06-05 23:06:46 +08:00
ed9022a908 Ignore broken disk when BE starts up (#3741) 2020-06-05 10:26:07 +08:00
73719f263d Fix document (#3773) 2020-06-05 10:19:17 +08:00
cdd17333ba Add some log to make it easier to find out bug (#3770)
Added some logs to record to which be a query was sent.
Increasing the efficiency of tracing the problem
2020-06-05 10:18:58 +08:00
wyb
fdf3415d06 [Website] Fix CREATE RESOURCE sidebar text and link not right bug (#3777) 2020-06-05 09:20:36 +08:00
0a748661c1 Fix the error selectedIndexId when keysType of table is UNIQUE (#3772)
The unique table also should be compensated candidate index.
The reason is the same as the agg table type.

Fixed #3771.
Change-Id: Ic04b0360a0b178cb0b6ee635e56f48852092ec09
2020-06-04 19:26:50 +08:00
9b2cf1c18e [Bug] Clear Txn when load been cancelled (#3766)
If you a load task encoutering error, it will be cancelled.
At this time, FE will clear the Txn according to the DbName.
In FE, DbName should be added by cluter name.
If missing cluster name, it will encounter NullPointer.
As a result, the Txn will still exists until timeout.
2020-06-04 18:18:37 +08:00
484e7de3c5 [Doirs On ES] fix bug for sparse docvalue context and remove the mistake usage of total (#3751)
The other PR : https://github.com/apache/incubator-doris/pull/3513 (https://github.com/apache/incubator-doris/issues/3479) try to resolved the `inner hits node is not an array` because when a  query( batch-size) run against new segment without this field, as-well the filter_path just only take `hits.hits.fields` 、`hits.hits._source` into account, this would appear an null inner hits node:
```
{
   "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2",
   "hits": {
      "total": 1
   }
}
```

Unfortunately this PR introduce another serious inconsistent result with different batch_size because of misusing the `total`.

To avoid this two problem,  we just add `hits.hits._score` to filter_path when `docvalue_mode` is true,   `_score`  would always `null` ,  and populate the inner hits node:

```
{
   "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2",
   "hits": {
      "total": 1,
      "hits": [
         {
            "_score": null
         }
      ]
   }
}
```

related issue: https://github.com/apache/incubator-doris/issues/3752
2020-06-04 16:31:18 +08:00
01c1de1870 [Load] Add more metric to trace the time cost in stream load and make brpc_num_threads configurable (#3703) 2020-06-04 13:37:28 +08:00
27046c5b61 [Enhancement] Improve the performance of query with IN predicate (#3694)
This CL mainly changes:
1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine.

2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.
2020-06-04 11:39:00 +08:00
fc33ee3618 [Plugin] Add timeout of connection when downloading the plugins from URL (#3755)
If no timeout is set, the download process may be blocked forever.
2020-06-04 11:37:18 +08:00
791f8fee49 [Bug][Outfile] Fix bug that column separater is missing in output file. (#3765)
When output result of a query using `OUTFILE` statement, is some of output
column is null, then then following column separator is missing.
2020-06-04 10:35:32 +08:00
a8c95e7369 [Bug] Fix binaryPredicte's equals function ignore op (#3753)
BinaryPredicte's equals function compare by opcode ,
but the opcode may not be inited yet. 
so it will return true if this child is same,  for example `a>1` and `a<1` are equal.
2020-06-04 09:29:19 +08:00
wyb
7f6a7c6807 Remove unused import 2020-06-03 22:32:52 +08:00
7f6271c637 [Bug]Fix Query failed when fact table has no data in join case (#3604)
major work
1.  Correct the value of ```numNodes``` and ```cardinality``` when ```OlapTableScan``` computeStats so that the ``` broadcast cost``` and ```paritition join cost ``` can be calculated correctly.
2. Find a input fragment with higher parallelism for shuffle fragment to assign backend
2020-06-03 22:01:55 +08:00
2ad1b20b24 [Config] Add new BE config for tcmalloc (#3732)
Add a new BE config tc_max_total_thread_cache_bytes
2020-06-03 21:58:13 +08:00
wyb
edfa6683fc Add create spark load job 2020-06-03 21:27:27 +08:00
73c3de4313 [refactor] Simple refactor on class Reader (#3691)
This is a simple refactor patch on class Reader without any functional changes.
Main refactor points:
- Remove some useless return value
- Use range loop
- Use empty() instead of size() for some STL containers size judgement
- Use in-class initialization instead of initialize in constructor function
- Some other small refactor
2020-06-03 19:55:53 +08:00
ed886a485d [HttpServer] capture convert exception (#3736)
If parameter str is an empty string, it will throw exception too. Maybe we can add an ut for parsing parameters in http server.
2020-06-03 19:54:41 +08:00
e16873a6c1 Fix large string val allocation failure (#3724)
* Fix large string val allocation failure

Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT.
The overflow will cause serialization failure of bitmap.

Fixed #3600
2020-06-03 17:07:54 +08:00
70aa9d6ca8 [Memory Engine] Add MemTabletScan (#3734) 2020-06-03 15:42:38 +08:00
wyb
ad7270b7ca [Spark load][Fe 1/5] Add spark etl job config (#3712)
Add spark etl job config, includes:

1. Schema of the load tables, including columns, partitions and rollups
2. Infos of the source file, including split rules, corresponding columns, and conversion rules
3. ETL output directory and file name format
4. Job properties
5. Version for further extension
2020-06-03 11:23:09 +08:00
3194aa129d Add a link to Tablet Meta URL (#3745) 2020-06-03 10:10:32 +08:00
60f93b2142 Fix bitmap type (#3749) 2020-06-03 10:07:58 +08:00
761a0ccd12 [Bug] Fix bug that runningprofile show time problem in FE web page and add the runingprofile doc (#3722) 2020-06-02 11:07:15 +08:00
fdf66b8102 [MemTracker] add log depth & auto unregister (#3701) 2020-06-01 23:16:25 +08:00
ee260d5721 [Bug][FsBroker] NPE throw when username is empty (#3731)
When using Broker with an empty username, a NPE is thrown, which is
not expected.
2020-06-01 21:03:21 +08:00
wyb
8e71c0787c [Spark load][Fe 2/5] Update push task thrift interface (#3718)
1. Add TBrokerScanRange and TDescriptorTable used by ParquetScanner
2. Add new TPushType LOAD_V2 for spark load
2020-06-01 18:21:43 +08:00