Commit Graph

235 Commits

Author SHA1 Message Date
b2a022b348 Add money_format function (#1064) 2019-04-29 18:31:24 +08:00
310a375aec Fix bug that null value is not correctly handled when loading data (#1070)
When partition column's value is NULL, it should be loaded into
    the partition which include MIN VALUE
2019-04-29 13:55:28 +08:00
9c82d41981 Support Doris query ES by HTTP way (#925) 2019-04-28 17:14:44 +08:00
cf1e7aa844 Add close tablet writer log (#1014) 2019-04-28 10:33:50 +08:00
3409ed41ac Reset commit offset if task aborted due to runtime error (#994) 2019-04-28 10:33:50 +08:00
b7b66527ce Fix some load bugs (#961)
1. Use load job's timeout as its txn timeout
2. Add a new session variable 'forward_to_master' for SHOW PROC and ADMIN stmt
2019-04-28 10:33:50 +08:00
2b4d02b2fa Add error load log url for routine load job (#938) 2019-04-28 10:33:50 +08:00
0cccb5cc9c Fix bugs of routine load job (#917)
1. Uninitialized counter cause endless data consuming.
2. Incorrect handle null value in column mapping.

* fix bug
2019-04-28 10:33:50 +08:00
400d8a906f Optimize the consumer assignment of Kafka routine load job (#870)
1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine
load job. 

Test results:

* 1 consumer, 1 partitions:
    consume time: 4.469s, rows: 990140, bytes: 128737139.  221557 rows/s, 28M/s
* 1 consumer, 3 partitions:
    consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s
    blocking get time(us): 12268241, blocking put time(us): 1886431
* 3 consumers, 3 partitions:
    consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s
    blocking get time(us): 1041639, blocking put time(us): 10356581

The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough.

I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3.

In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load.

2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load
2019-04-28 10:33:50 +08:00
e8b360d193 Merge master and fix BE ut 2019-04-28 10:33:50 +08:00
9d08be3c5f Add metrics for routine load (#795)
* Add metrics for routine load
* limit the max number of routine load task in backend to 10
* Fix bug that some partitions will no be assigned
2019-04-28 10:33:50 +08:00
8d2de42b36 Fix some routine load bugs (#787)
1. Reserve the column order in load stmt.
2. Fix some replay bugs of routine load task.
2019-04-28 10:33:50 +08:00
9fa5e1b768 Add a cleaner bg thread to clean idle data consumer (#776) 2019-04-28 10:33:50 +08:00
8f781f95c7 Add persist operations for routine load job (#754) 2019-04-28 10:33:50 +08:00
8b52787114 Stream load with no data will abort txn (#735)
1. stream load executor will abort txn when no correct data in task
2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be
3. change print uuid to hi-lo
2019-04-28 10:33:50 +08:00
062f827b60 Add attachment in rollback txn (#725)
1. init cmt offset in stream load context
2. init default max error num = 5000 rows / per 10000 rows
3. add log builder for routine load job and task
4. clone plan fragment param for every task
5. be does not throw too many filter rows while the init max error ratio is 1
2019-04-28 10:33:50 +08:00
192c8c5820 Fix bug that data consumer should be removed from pool when being got (#723) 2019-04-28 10:33:50 +08:00
8474061d63 Add some logs (#711) 2019-04-28 10:33:50 +08:00
fbbe0d19ba Change the relationship between txn and task (#703)
1. Check if properties is null before check routine load properties
2. Change transactionStateChange reason to string
3. calculate current num by beId
4. Add kafka offset properties
5. Prefer to use previous be id
6. Add before commit listener of txn: if txn is committed after task is aborted, commit will be aborted
7. queryId of stream load plan = taskId
2019-04-28 10:33:50 +08:00
a2c3fb1507 Missing to set auth code (#699) 2019-04-28 10:33:50 +08:00
10cee6ecff Add missing files (#696) 2019-04-28 10:33:50 +08:00
567d5de2de Add a data consumer pool to reuse the data consumer (#691) 2019-04-28 10:33:50 +08:00
20b2b2c37f Modify interface (#684)
1. Add batch submit interface
2. Add Kafka Event callback to catch Kafka events
2019-04-28 10:33:50 +08:00
9618d20a72 Add unit test (#675) 2019-04-28 10:33:50 +08:00
0820a29b8d Implement the routine load process of Kafka on Backend (#671) 2019-04-28 10:33:50 +08:00
da308da17c Fix bug that empty stream load return unexpected error msg (#1052) 2019-04-28 09:36:19 +08:00
e8397d17f3 Change length of CHAR type when upon schema change (#1035)
CHAR type is of the fixed length string. Upon modify length
of CHAR type, the size of slice should be equal to new length.
2019-04-25 21:28:32 +08:00
8031ab773b Support add hll column in Schemachange (#1033) 2019-04-25 16:09:19 +08:00
95a06dcd2a Change the buffer length for FloatToBuffer() method (#1019) 2019-04-24 20:11:06 +08:00
f852aa1cff Support enable_insert_strict (#1013)
Add a variable enable_insert_strict, this default value is false. When
this value is set to true, insert will fail if there is any filtered
data. If this value is false, insert will ignore filtered data and
success
2019-04-24 17:06:53 +08:00
7933cc2209 Make add_pending_segment_group() idempotent (#1001) 2019-04-23 17:33:45 +08:00
1cc2926a40 Fix wrong result in decimal type multiplication when carry is required (#980) 2019-04-21 17:12:58 +08:00
487c881092 Fix but that query cause be core when collecting query statistics (#970) 2019-04-19 17:08:49 +08:00
b9054f537b Support UDAF (#943) 2019-04-17 19:08:37 +08:00
b48a4ab6a0 Eliminate LOG(FATAL) when error occurs in ColumnData (#935) 2019-04-16 18:27:31 +08:00
e5a5b6da16 Fix concat_ws return null when argument is null (#923)
#918
2019-04-15 19:54:02 +08:00
d3251a19f7 Modify the method to obtain some metrics (#904) 2019-04-10 19:37:48 +08:00
9c1de6ce38 Fix HLL compaction bug. (#901)
1. Cumulative Compaction in HLL will core dump because of null pointer
2019-04-10 10:37:23 +08:00
29f6665826 Change value pass to value reference (#884) 2019-04-08 10:44:23 +08:00
92f17afb69 Fix a problem which may cause nullptr in ResultBufferMgr (#882)
Because there is no lock in the middle call to findBlock(),
even if the first call returns non-null, the second call
may still get a null block
2019-04-06 14:18:34 +08:00
c0fbc84381 Fix bug that ScanBytes is when collect executing query's infos (#869) 2019-04-03 18:27:50 +08:00
71380d436f Remove unnecessary check (#862) 2019-04-02 15:54:48 +08:00
a1bfc90320 Support hll_raw_agg in Aggregate Function (#832)
hll_raw_agg Function aggregates the HLL type value, and return the HLL type value
2019-04-01 16:17:56 +08:00
e409c2edc3 Optimize error handling of rocksdb (#841) 2019-04-01 14:42:59 +08:00
348c61c69f Fix doris on es bug (#826)
* Get in pred from hybridset

* ignore new_filter_in when push down

* Ignore cast case in to_ext_literal
2019-03-28 12:54:17 +08:00
f4a63b29d8 Fix doris on es bug (#791) 2019-03-22 19:03:27 +08:00
c34b306b4f Decimal optimize branch #695 (#727) 2019-03-22 17:22:16 +08:00
e60b71da8c Release SegmentGroup reference count (#790)
In streaming ingestion, segment group is set to be one in creation.
Upon closing, reference count should to be released. Otherwise,
file descriptor and segment group object in memory can not be freed.
2019-03-22 14:17:05 +08:00
4d8f0dc203 Fix add_version () core dump on acquiring delta (#788)
SchemaChange convert segment groups in reverse.
So SegmentGroup with segment_group_id = 1 may be handled
before SegmentGroup with segment_group_id = 0.
This will leads to acquiring delta not be allocated.
It will be core dump in SIGSEGV.
2019-03-21 20:58:01 +08:00
11307b23c8 Fix bug: stream load ignore last line with no-newline (#785)
#783
2019-03-21 19:18:22 +08:00