Commit Graph

1998 Commits

Author SHA1 Message Date
c6f2b5ef0d [Doris On ES][Docs] refator documentation for doe (#3867) 2020-06-17 10:54:28 +08:00
d659167d6d [Planner] Set MysqlScanNode's cardinality to avoid unexpected shuffle join (#3886) 2020-06-17 10:53:36 +08:00
a2df29efe9 [Bug][RoutineLoad] Fix bug that exception thrown when txn of a routineload task become visible (#3890) 2020-06-17 10:52:51 +08:00
bfbe22526f Show create table result with bitmap column should not return default value (#3882) 2020-06-17 09:43:17 +08:00
ae7028bee4 [Enhancement] Replace N/A with NULL in ShowStmt result (#3851) 2020-06-17 09:41:51 +08:00
0224d49842 [Fix][Bug] Fix compile bug (#3888)
Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2020-06-16 18:42:04 +08:00
6c4d7c60dd [Feature] Add QueryDetail to store query statistics. (#3744)
1. Store the query statistics in memory.
2. Supporting RESTFUL interface to get the statistics.
2020-06-15 18:16:54 +08:00
2211cb0ee0 [Metrics] Add metrics document and 2 new metrics of TCP (#3835) 2020-06-15 09:48:09 +08:00
3c09e1e1d8 [trace] Adapt trace util to compaction module (#3814)
Trace util is helpful for diagnosing compaction performance problems,
we can get trace log for base compaction like:
```
W0610 11:26:33.804431 56452 storage_engine.cpp:552] Trace:
0610 11:23:03.727535 (+     0us) storage_engine.cpp:554] start to perform base compaction
0610 11:23:03.728961 (+  1426us) storage_engine.cpp:560] found best tablet 546859
0610 11:23:03.728963 (+     2us) base_compaction.cpp:40] got base compaction lock
0610 11:23:03.729029 (+    66us) base_compaction.cpp:44] rowsets picked
0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:24:51.784818 (+   379us) compaction.cpp:74] prepare finished
0610 11:26:33.359265 (+101574447us) compaction.cpp:87] merge rowsets finished
0610 11:26:33.484481 (+125216us) compaction.cpp:102] output rowset built
0610 11:26:33.484482 (+     1us) compaction.cpp:106] check correctness finished
0610 11:26:33.513197 (+ 28715us) compaction.cpp:110] modify rowsets finished
0610 11:26:33.513300 (+   103us) base_compaction.cpp:49] compaction finished
0610 11:26:33.513441 (+   141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue
Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_segments_num":44,"merge_rowsets_latency_us":101574444,"merged_rows":0,"output_row_num":3346807,"output_rowset_data_size":1228439659,"output_segments_num":6}
```
for cumulative compaction like:
```
W0610 11:14:18.714366 56468 storage_engine.cpp:518] Trace:
0610 11:14:08.068484 (+     0us) storage_engine.cpp:520] start to perform cumulative compaction
0610 11:14:08.069844 (+  1360us) storage_engine.cpp:526] found best tablet 547083
0610 11:14:08.069846 (+     2us) cumulative_compaction.cpp:42] got cumulative compaction lock
0610 11:14:08.069947 (+   101us) cumulative_compaction.cpp:46] calculated cumulative point
0610 11:14:08.070141 (+   194us) cumulative_compaction.cpp:50] rowsets picked
0610 11:14:08.070143 (+     2us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:14:08.070518 (+   375us) compaction.cpp:74] prepare finished
0610 11:14:15.389893 (+7319375us) compaction.cpp:87] merge rowsets finished
0610 11:14:15.390916 (+  1023us) compaction.cpp:102] output rowset built
0610 11:14:15.390917 (+     1us) compaction.cpp:106] check correctness finished
0610 11:14:15.409460 (+ 18543us) compaction.cpp:110] modify rowsets finished
0610 11:14:15.409496 (+    36us) cumulative_compaction.cpp:55] compaction finished
0610 11:14:15.410138 (+   642us) cumulative_compaction.cpp:65] unused rowsets have been moved to GC queue
Metrics: {"filtered_rows":0,"input_row_num":136707,"input_rowsets_count":302,"input_rowsets_data_size":76617836,"input_segments_num":302,"merge_rowsets_latency_us":7319372,"merged_rows":0,"output_row_num":136707,"output_rowset_data_size":53893280,"output_segments_num":1}
```
2020-06-13 19:31:51 +08:00
b3811f910f [Spark load][Fe 4/6] Add hive external table and update hive table syntax in loadstmt (#3819)
* Add hive external table and update hive table syntax in loadstmt

* Move check hive table from SelectStmt to FromClause and update doc

* Update hive external table en sql reference
2020-06-13 16:28:24 +08:00
414a0a35e5 [Dynamic Partition] Use ZonedDateTime to support set timezone (#3799)
This CL mainly support timezone in dynamic partition:
1. use new Java Time API to replace Calendar.
2. support set time zone in dynamic partition parameters.
2020-06-13 16:27:09 +08:00
b8ee84a120 [Doc] Add docs to OLAP_SCAN_NODE query profile (#3808) 2020-06-13 16:25:40 +08:00
6928c72703 Optimize the logic for getting TabletMeta from TabletInvertedIndex to reduce frequency of getting read lock (#3815)
This PR is to optimize the logic for getting tabletMeta from TabletInvertedIndex to reduce frequence of getting read lock
2020-06-13 12:46:59 +08:00
61be7132a9 fix for be server crash which throwing syntax error when parse json … (#3846)
Fix for be server crash which throwing syntax error when parse json from kafka message
2020-06-13 12:45:16 +08:00
38b6d291f1 [Bug] fix uninitialized member vars (#3848)
This fix is based on UBSAN unit test. So if we create & use class obj in a different way, may have runtime error: load of value XX, which is not a valid value for type 'YYY' warnings again.

Unit test should build in DEBUG or XXSAN mode(at lease DEBUG). RELEASE mode will add -DNDEBUG, turn off dchecks/asserts/debug.
2020-06-13 12:44:49 +08:00
83d39ff9c9 Avoid pass NULL to memcmp() (#3844)
If we exec "StringVal(len=0, ptr="") == StringVal(len=0,ptr=NULL)", it will pass NULL ptr to memcmp(). It should be avoided.
2020-06-13 12:43:41 +08:00
dac156b6b1 [Spill To Disk] Analytic_Eval_Node Support Spill Disk and Del Some Unless Code (#3820)
* 1. Add enable spilling in query option, support spill disk in Analytic_Eval_Node, FE can open enable spilling by

         set enable_spilling = true;

Now, Sort Node and Analytic_Eval_Node can spill to disk.
2. Delete merge merge_sorter code we do not use now.
3. Replace buffered_tuple_stream by buffered_tuple_stream2 in Analytic_Eval_Node and support spill to disk. Delete the useless code of buffered_block_mgr and buffered_tuple_stream.
4. Add DataStreamRecvr Profile. Move the counter belong to DataStreamRecvr from fragment to DataStreamRecvr Profile to make clear of Running Profile.

* change some hint in code

* replace disable_spill with enable_spill which is better compatible to FE
2020-06-13 10:19:02 +08:00
wyb
44dbdf4986 Update hive external table en sql reference 2020-06-12 21:38:05 +08:00
88a5429165 [FE] Add db&tbl info in broker load log (#3837)
stream load log in FE has db & tbl info, broker load log should have too.
2020-06-12 20:54:41 +08:00
7591527977 [Bug] Fix a bug that insert null bitmap crashes BE (#3830)
INSERT INTO VALUES to_bitmap('xx') may insert null into bitmap column, which may cause dirty data to be written.
2020-06-12 18:03:02 +08:00
75f4df400e Stop travis building when an error occurred (#3838)
Co-authored-by: fariel huang <farielclaire@gmail.com>
2020-06-12 09:16:01 +08:00
wyb
7f7ee63723 Move check hive table from SelectStmt to FromClause and update doc 2020-06-11 16:53:41 +08:00
2ce2cf78ac Remove unused import (#3826)
Change-Id: Ic6ef5a0d372a9b17ffa21cffb9027d2d7e856474
2020-06-11 11:44:51 +08:00
8d11ad3a16 [Doc] Fix website doc error (#3823) 2020-06-11 10:01:54 +08:00
86d235a76a [Extension] Logstash Doris output plugin (#3800)
This plugin is used to output data to Doris for logstash
Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load
2020-06-11 08:54:51 +08:00
8caedadb67 use scoped_refptr to new HashIndex (#3818) 2020-06-10 23:47:10 +08:00
cd402a6827 [Restore] Fix error message not match of restore job when job is time out (#3798)
For the current code if a restore job is time out it will be reported as user canceled. This error message is very misleading
2020-06-10 23:12:04 +08:00
ef94c25773 [Bug]fix the crash of checksum task #3735 (#3738)
1. the table include key column of double/float type
2. when run checksum task, will use all of key columns to compare
3. schema.column(idx) of double/float type is NULL

#3735
2020-06-10 22:59:15 +08:00
4adc9d45c2 [Doc] Update ALTER TABLE.md 2020-06-10 22:58:29 +08:00
de91037d8c [Doc]Add some routine load docs (#3796)
Add some documentation about using routine load in the cloud environment
2020-06-10 22:57:00 +08:00
4cb5f7a535 [Config]Remove max_user_connections from config (#3805)
Update max_user_connections by user property:

```
set property `user` max_user_connections=100;
```
2020-06-10 22:56:05 +08:00
wyb
4c2e73a5fe Add hive external table and update hive table syntax in loadstmt 2020-06-10 16:32:32 +08:00
8c608bbad5 [Doris On ES] Skip function_call expr when process predicate (#3813)
[Doris On ES] Skip function_call expr when process predicate

Fixed #3801
Do not push-down function_call such as split_xxx when process predicate, Doris BE is responsible for processing these predicate

All rows in table:

```
+------+------+------+------------+------------+
| k1   | k2   | k3   | UpdateTime | ArriveTime |
+------+------+------+------------+------------+
| NULL | NULL | kkk1 |  123456789 |       NULL |
| kkk1 | NULL | NULL |  123456789 |       NULL |
| NULL | kkk2 | NULL |  123456789 |       NULL |
+------+------+------+------------+------------+
```

The following predicate could not push down to ES.

```
SQL 1:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk is not null;
+------+
| kk   |
+------+
| kkk  |
+------+
1 row in set (0.02 sec)

SQL 2:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > 'a';      
+------+
| kk   |
+------+
| kkk  |
+------+

SQL 3:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > '2';
+------+
| kk   |
+------+
| kkk  |
+------+
1 row in set (0.03 sec)
```
2020-06-10 11:22:53 +08:00
4fa9d8cbe9 [Spark load][Fe 3/5] Fe create job (#3715)
* Add create spark load job

* Remove unused import
2020-06-09 21:57:46 +08:00
5b1589498a [Bug] Fix SchemaChangeJobV2's meta persist bug (#3804)
1. Missing field `partitionIndexMap` in SchemaChangeJobV2
2. Pair in field `indexSchemaVersionAndHashMap` can not be persisted by GSON
3. Exit the FE process when replay edit log error.

Fix: #3802
2020-06-09 21:55:46 +08:00
acd7a58875 [Doris On ES] [1/3] Add ES QueryBuilders for debug mode (#3774) 2020-06-09 16:45:16 +08:00
8ada2559b7 [Bug] Fix bug that checkpoint thread failed to start (#3795)
1. Set thread id before starting the checkpoint thread
2. Init the CHECKPOINT catalog instance before visiting it.
2020-06-08 23:00:36 +08:00
559714f3d4 Fix largeint max min bug (#3793) 2020-06-08 21:01:30 +08:00
e4dc2ec440 [StorageEngine] Make StorageEngine::open return more detailed info (#3761)
StorageEngine::open just return a very vague status info when failed,
we have to check logs to find out the root reason, and it's not
convenient to check logs if we run unit tests in CI dockers.
It would be better to return more detailed failure info to point out
the root reason, for example, it may return error status with message
"file descriptors limit is too small".
2020-06-07 10:21:33 +08:00
928379c5d8 [Bug] Fix colocate group replay NPE (#3790)
Group id should also be persisted for replaying
2020-06-07 10:20:22 +08:00
ea5b3b2d4c [Bug] Fix bug that should not use "!=" to judge the equivalence of Type (#3786)
org.apache.doris.catalog.Type is not an enum, so should not judge the
equivalence of Type using "==" or "!="
2020-06-06 11:38:32 +08:00
a7bf006b51 Use BackendStatus to show BE's infomation in show backends; (#3713)
The infomation is displayed in JSON format.For example:
{"lastTabletReportTime":"2020-05-28 15:29:01"}
2020-06-06 11:37:48 +08:00
3b6a781862 [Bug] Fix a bug that tablet's _preferred_rowset_type may be modified to BETA_ROWSET after cloned (#3750)
TabletMeta's _preferred_rowset_type is not initialized after object constructing and
may be a random value, and this field is not updated when create ALPHA_ROWSET tablet,
and it will not be serialized into pb in this case. So if cloning an ALPHA_ROWSET
tablet from another BE, this new created local tablet's _preferred_rowset_type field
may be random as BETA_ROWSET and can not be overwrote after cloned, then new input
rows will be wrote as BETA_ROWSET format which is not we expect.
This patch fix this bug by giving _preferred_rowset_type a default value and updating
this field when create any type of tablet, and add an unit test and related overwrite
equal operator functions.
2020-06-06 11:36:28 +08:00
c51f20bb7a Disable Bitmap or Hll type in keys or in values with incorrect agg-type (#3768)
Bitmap and Hll type can not be used with incorrect aggregate functions, which will cause to BE crush.
Add some logical checks in FE's ColumnDef#analyze to avoid creating tables or changing schemas incorrectly.

Keys never be bitmap or hll type
values with bitmap or hll type have to be associated with bitmap_union or hll_union
2020-06-06 11:36:06 +08:00
173dd3953d [Code Refactor] Remove Catalog.getInstance() method (#3784)
Use Catalog.getCurrentCatalog() instead, to avoid potential meta operation error.
2020-06-06 11:35:01 +08:00
4cbce687b7 Add getValueFn and removeFn to properties (#3782) 2020-06-06 11:34:32 +08:00
0f6e74f3f9 [BUG] Fix location url in agg_fn_evaluator (#3780) 2020-06-06 11:34:12 +08:00
5abef19be4 [Doris On ES] Add more detailed error message when fail to create es table (#3758) 2020-06-05 23:06:46 +08:00
ed9022a908 Ignore broken disk when BE starts up (#3741) 2020-06-05 10:26:07 +08:00
73719f263d Fix document (#3773) 2020-06-05 10:19:17 +08:00