Commit Graph

333 Commits

Author SHA1 Message Date
ffe3eaa1a7 Implement adddate, days_add and from_unixtime function in FE (#1149) 2019-05-13 16:59:52 +08:00
6117227754 FE support assigning constant conjunct and calculating expression (#1126) 2019-05-11 23:10:58 +08:00
15c9be4dfe Fix bug that balance task always choose high usage path (#1143) 2019-05-11 22:07:17 +08:00
ae18cebe0b Improve colocate table balance logic for backend added (#1139)
1. Improve colocate table balance logic for backend added
2. Add more comment
3. Break loop early
2019-05-11 21:49:51 +08:00
debb58c278 Add SHOW FUNCTION and update docs for UDF (#1140) 2019-05-11 21:46:37 +08:00
f499759f15 Revise the exception message (#1141) 2019-05-10 19:52:11 +08:00
4039985729 Fix some bugs about decommission (#1138)
1. Print the last few tablets of decommission backend in fe.log for debug.
2. OlapTableSink should get replica on alive Backends, not only available Backends.
3. When decommission multi Backends, we should drop the redundant replicas before creating a new one.
4. Replicas on decommissioning Backends should be not added to catalog again.
5. Decommissioning Backends should not be chosen as destination of tablet repairing.
2019-05-10 17:41:48 +08:00
79ab7f4413 Change label of broker load txn (#1134)
* Change label of broker load txn

1. put broker load label into txn label
2. fix the bug of `label is already used`
3. fix partition error of new broker load

* Fix count error in mini load and broker load

There are three params (num_rows_load_total, num_rows_load_filtered, num_rows_load_unselected) which are used to count dpp.norm.ALL and dpp.abnorm.ALL.
num_rows_load_total is the number rows of source file.
num_rows_load_unselected is the not satisfied (where conjuncts) rows of num_rows_load_total
num_rows_load_filtered is the rows (quality not good enough) of (num_rows_load_total-num_rows_load_unselected)
2019-05-10 16:53:46 +08:00
fdc0c40549 Fix malformed at AM bug (#1136) 2019-05-10 16:23:14 +08:00
6635c36cc5 Fix decommission BE bug (#1122)
The pre-check of replica num should be check database of current cluster
2019-05-09 18:53:10 +08:00
1eeb5ea891 Add str_to_date function in fe (#1118) 2019-05-09 17:20:44 +08:00
93d2dd5f82 Fix bug of routine load job (#1116)
Fix null pointer exception when sending routine load task
2019-05-09 12:56:53 +08:00
77a1b31baa Add show load of loadv2 (#1113)
This change include the show load of loadv2 and some bug fix of loadv2.
Firstly, the show load will perform both load and loadv2 info. According to loadv2, the ETL progress of loadv2 is N/A during the period of loading.
Secondly, the loadv2 will be created when version of property is v2.
This is a temporary property which will not influence the old broker load.
After the loadv2 is finished, the default load will be changed to loadv2.
Finally, there are some bug in LoadingTaskPlanner fixed by this change.
2019-05-09 10:27:30 +08:00
7699c76df2 Fix Nullpointer exception encountered in transaction process (#1112)
* Fix Nullpointer exception encountered in transaction process
* Do not choose unavailable BE when repair tablet
2019-05-08 20:30:34 +08:00
965ecedd5d Fix compile bug (#1106) 2019-05-07 17:36:34 +08:00
a08170fd50 Enhance the usabilities (#1100)
* Enhence the usabilities

1. Add metrics to monitor transactions and steaming load process in BE.
2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s.
3. Modify FE config 'enable_metric_calculator' to true.
4. Add more log for tracing broker load process.
5. Modify the query report process, to cancel query immediately if some instance failed.

* Fix bugs
1. Avoid NullPointer when enabling colocation join with broker load
2. Return immediately when pull load task coordinator execution failed
2019-05-07 15:55:04 +08:00
0c62cb888f Support negative keyword in Broker Load (#1101) 2019-05-06 22:15:27 +08:00
11be24df40 Add new scheduler of load in fe (#1076)
* Add new scheduler of load in fe

1. New scheduler only support the broker load now.
2. The stage of load consist of PENDING -> LOADING -> FINISHED
3. The LoadScheduler will divide job into a pending task. There are preparations that need to be done on pending task.
4. OnPendingTaskFinished will be invoked after pending task. It is used to submit the loading task which is created based on attachment of pending task.
5. OnLoadingTaskFinished will be invoked after loding task. It is used to record the commit info and commit txn when all of task has been finished.
.

* Combine pendingTask and loadingTask into loadTask

1. The load task callback include two methods: onTaskFinished, onTaskFailed

* Add txn callback of load job

1. isCommittting is true when beforeCommitted in txn
2. job could not be cancelled when isCommitting is true
3. job will be finished after txn is visible
4. old job will be cleaned when (CurrentTS - FinishedTs) / 1000 > Config.label_keep_seconds
5. LoadTimeoutChecker is performed to cancel timeout job
2019-05-06 13:49:06 +08:00
ba78adae94 Fix bugs when using function in both stream load request and routine load job (#1091) 2019-05-05 20:51:30 +08:00
588aa7bed3 Fix date_format function in fe (#1082) 2019-05-01 22:20:49 +08:00
e373aa51e6 Fix bug that premature updating schema hash of replica when reporting (#1084)
The new schema hash should only be updated when schema changing finished.
2019-05-01 20:51:10 +08:00
afa3aa9069 Add some pre-calculated metrics (#1079)
1. max io util of disks
2. max network send/receive bytes rate of all network devices
3. base/cumulative compaction request counter and failure counter
2019-04-30 11:12:23 +08:00
310a375aec Fix bug that null value is not correctly handled when loading data (#1070)
When partition column's value is NULL, it should be loaded into
    the partition which include MIN VALUE
2019-04-29 13:55:28 +08:00
77ceef6391 Support insert values (#1067)
* Support insert into values

* Fix hll import bug

* Fix insert subquery fail
2019-04-29 10:39:01 +08:00
1662d91877 Change the logic of RoutineLoadTaskScheduler (#1061)
1. TaskScheduler will process one task per round
2. TaskScheduler will be blocked till queue tasks a new task
3. TaskScheduler will submit tasks when queue is empty
4. Add a example of creating a broker table by BOS
5. Change syntax of show routine load job
2019-04-28 20:05:48 +08:00
9c82d41981 Support Doris query ES by HTTP way (#925) 2019-04-28 17:14:44 +08:00
4559bc3558 Change transaction timeout default configuration (#1060) 2019-04-28 16:42:19 +08:00
60df7cdb8d fix ut bug (#1051) 2019-04-28 10:33:50 +08:00
5e36a769a0 Change the way to calculate task num (#1049) 2019-04-28 10:33:50 +08:00
0adb150da7 Fix ut bugs (#1046)
Also fix a metrics collection bug
2019-04-28 10:33:50 +08:00
4e5197ba52 Modify transaction proc info (#1029)
1. Add running/finished state proc to show specified state txns.
2. Add max disk used percent info in backends proc dir.

* add missing file

* fix bug

* Update fe/src/main/java/org/apache/doris/transaction/GlobalTransactionMgr.java

Co-Authored-By: morningman <morningman@163.com>

* Update fe/src/main/java/org/apache/doris/transaction/GlobalTransactionMgr.java

Co-Authored-By: morningman <morningman@163.com>
2019-04-28 10:33:50 +08:00
4a95c53f07 Fix bug of listener (#1017)
* Fix bug of listener

* Change txnStateChangeListener to txnStateChangeCallback

* Fix the logic of beforeAborted
1. It task is not belong to job, the txn attachment will be set to null.
* Txn will be abort normally without attachment.
* Job will not be updated by this task which attachment is null.
2019-04-28 10:33:50 +08:00
7f39738b08 Change log tips (#1002) 2019-04-28 10:33:50 +08:00
3409ed41ac Reset commit offset if task aborted due to runtime error (#994) 2019-04-28 10:33:50 +08:00
a79bd0c771 Add doc of auto creator of kafka topic (#985)
* Add annotation of show routine load
2019-04-28 10:33:50 +08:00
1b5643c6fb Fix some bugs (#979)
1. Add Config.max_routine_load_concurrent_task_num instead of the old one
2. Fix a bug that SHOW ALTER TABLE COLUMN may throw Nullpointer exception
3. Fix some misspelling of docs
2019-04-28 10:33:50 +08:00
56bec6f22a Add routine load manual (#967) 2019-04-28 10:33:50 +08:00
b7b66527ce Fix some load bugs (#961)
1. Use load job's timeout as its txn timeout
2. Add a new session variable 'forward_to_master' for SHOW PROC and ADMIN stmt
2019-04-28 10:33:50 +08:00
e352a08339 Change tips of show routine load task (#959)
1. Add pauseTimestamp
2. It will be set when job is paused and it will be removed when job is resumed
2019-04-28 10:33:50 +08:00
178757d46c Revert "Use http redirect method instead of old way (#948)" (#949)
This reverts commit 84c720e9e7f4123864a1068cf17d3736468ea528.
2019-04-28 10:33:50 +08:00
1787e7bb05 Use http redirect method instead of old way (#948) 2019-04-28 10:33:50 +08:00
24d22c6f6b Forawrd some stmt to master (#944)
1. SHOW PROC
2. SHOW PROC web action
3. ADMIN SHOW stmt
4. SHOW ROUTINE LOAD stmt
2019-04-28 10:33:50 +08:00
19b34129bb Fix bug when resume a runnning job (#942) 2019-04-28 10:33:50 +08:00
0579540ba2 Fix routine load bugs (#940)
1. Plan for each task in case table schema may changed
2. Add more detail info for txn
2019-04-28 10:33:50 +08:00
f49c53ee5b Change some throwable to userException (#939) 2019-04-28 10:33:50 +08:00
2b4d02b2fa Add error load log url for routine load job (#938) 2019-04-28 10:33:50 +08:00
8e0512e88d Move lock of routine load job (#934)
1. Moving lock of routine load job from inside of lock of txn to outside.
2. The process of routine load task commit or abort is following:
* lock job
      check task
  lock txn
      commit txn
  unlock txn
      commit task
* unlock job
3. The process of checking timeout txn will be ignored when there are related task of txn.
4. The relationship between task and txn will be removed when task timeout.
2019-04-28 10:33:50 +08:00
0cccb5cc9c Fix bugs of routine load job (#917)
1. Uninitialized counter cause endless data consuming.
2. Incorrect handle null value in column mapping.

* fix bug
2019-04-28 10:33:50 +08:00
75674753c2 Add unit test for RoutineLoadManager and RoutineLoadJob (#881)
1. Add ut
2. Show history job when table has been deleted. Checking auth whatever tablename is null or not.
2019-04-28 10:33:50 +08:00
400d8a906f Optimize the consumer assignment of Kafka routine load job (#870)
1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine
load job. 

Test results:

* 1 consumer, 1 partitions:
    consume time: 4.469s, rows: 990140, bytes: 128737139.  221557 rows/s, 28M/s
* 1 consumer, 3 partitions:
    consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s
    blocking get time(us): 12268241, blocking put time(us): 1886431
* 3 consumers, 3 partitions:
    consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s
    blocking get time(us): 1041639, blocking put time(us): 10356581

The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough.

I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3.

In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load.

2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load
2019-04-28 10:33:50 +08:00