Commit Graph

57 Commits

Author SHA1 Message Date
7550b2f09b Convert mini load to streaming mini load (#1323)
* This commit has brought contribution to streaming mini load
The operation of streaming mini load is sames as previous. Also, user can check the load by frontend.
The difference is that streaming mini load finish the task before reply of REST API while the non-streaming only register a load.

* When updating doris
Updating fe or be firstly are also supported. After fe and be are updated, the streaming mini load will take effect.

* For multi mini load
The non-streaming mini load still has been used by multi mini load. The behavior of multi mini load has not been changed.

* Add a interface named isSupportedFunction
This function is used to protect the correctness of new feature which consists of be and fe during updaing.
2019-06-21 19:34:50 +08:00
6afedb88a8 Add bitshuffle page (#1304) 2019-06-19 21:57:06 +08:00
7f1720b632 Add rle encoding (#1326) 2019-06-18 14:48:33 +08:00
a0294b8f40 Add Env for file operation (#1321) 2019-06-17 10:18:16 +08:00
9d03ba236b Uniform Status (#1317) 2019-06-14 23:38:31 +08:00
e9b2d30c6a Add faststring and cpu util (#1281) 2019-06-12 14:00:50 +08:00
84632cd062 Add BitMapIterator (#1277) 2019-06-11 09:23:02 +08:00
d1b1fce92f Change LICENSE file (#1265) 2019-06-09 15:55:46 +08:00
3e1c70d1b7 Add coding function (#1264) 2019-06-08 21:02:31 +08:00
934ca2481a Make MySQL support optional (#1248) 2019-06-05 12:28:15 +08:00
77a1b31baa Add show load of loadv2 (#1113)
This change include the show load of loadv2 and some bug fix of loadv2.
Firstly, the show load will perform both load and loadv2 info. According to loadv2, the ETL progress of loadv2 is N/A during the period of loading.
Secondly, the loadv2 will be created when version of property is v2.
This is a temporary property which will not influence the old broker load.
After the loadv2 is finished, the default load will be changed to loadv2.
Finally, there are some bug in LoadingTaskPlanner fixed by this change.
2019-05-09 10:27:30 +08:00
a08170fd50 Enhance the usabilities (#1100)
* Enhence the usabilities

1. Add metrics to monitor transactions and steaming load process in BE.
2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s.
3. Modify FE config 'enable_metric_calculator' to true.
4. Add more log for tracing broker load process.
5. Modify the query report process, to cancel query immediately if some instance failed.

* Fix bugs
1. Avoid NullPointer when enabling colocation join with broker load
2. Return immediately when pull load task coordinator execution failed
2019-05-07 15:55:04 +08:00
afa3aa9069 Add some pre-calculated metrics (#1079)
1. max io util of disks
2. max network send/receive bytes rate of all network devices
3. base/cumulative compaction request counter and failure counter
2019-04-30 11:12:23 +08:00
9c82d41981 Support Doris query ES by HTTP way (#925) 2019-04-28 17:14:44 +08:00
400d8a906f Optimize the consumer assignment of Kafka routine load job (#870)
1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine
load job. 

Test results:

* 1 consumer, 1 partitions:
    consume time: 4.469s, rows: 990140, bytes: 128737139.  221557 rows/s, 28M/s
* 1 consumer, 3 partitions:
    consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s
    blocking get time(us): 12268241, blocking put time(us): 1886431
* 3 consumers, 3 partitions:
    consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s
    blocking get time(us): 1041639, blocking put time(us): 10356581

The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough.

I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3.

In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load.

2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load
2019-04-28 10:33:50 +08:00
8b52787114 Stream load with no data will abort txn (#735)
1. stream load executor will abort txn when no correct data in task
2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be
3. change print uuid to hi-lo
2019-04-28 10:33:50 +08:00
0820a29b8d Implement the routine load process of Kafka on Backend (#671) 2019-04-28 10:33:50 +08:00
95a06dcd2a Change the buffer length for FloatToBuffer() method (#1019) 2019-04-24 20:11:06 +08:00
d3251a19f7 Modify the method to obtain some metrics (#904) 2019-04-10 19:37:48 +08:00
c34b306b4f Decimal optimize branch #695 (#727) 2019-03-22 17:22:16 +08:00
a9e9aef3ca Fix sync and trash bug (#570)
1. no need to save header when header has no incremental delta
2. make fsync tablet_meta configurable
3. add metric for meta operation
2019-01-22 20:13:09 +08:00
0dd4c6e0a0 Fix ASAN compilation issue (#561) 2019-01-21 13:09:45 +08:00
79dc521893 Fix UnixMicros function (#544) 2019-01-16 17:26:43 +08:00
a51ce03595 Enhance the usability of Load operation (#490)
1. Add broker load error hub
A broker load error hub will collect error messages in load process and saves them as a file to the specified remote storage via broker. In case that in broker/min/streaming load process, user may not be able to access the error log file in Backend directly.
We also add a new header option: 'enable_hub' in streaming load request, and default is false. Because if we enable the broker load error hub, it will significantly slow down the processing speed of streaming load, due to the visit of remote storage via broker. So use can disable the error load hub using this header option, to avoid slowing down the load speed.

2. Show load error logs by using SHOW LOAD WARNINGS stmt
We also provide a more easy way to get load error logs. We implement 'SHOW LOAD WARNINGS ON 'url'' stmt to show load error logs directly. The 'url' in stmt is provided in 'SHOW  LOAD' stmt.
eg:
show load warnings on "http://192.168.1.1:8040/api/_load_error_log?file=__shard_2/error_log_xxx";

3. Support now() function in broker load
User can mapping a column to now() in broker load stmt, which means this column will be filled with time when the ETL started.

4. Support more types of wildcard in broker load
Currently, we only support wildcard '*' to match the file names. wildcard like '/path/to/20190[1-4]*' is not support.
2019-01-03 19:07:27 +08:00
ff7d3e5878 Unify the print method of TUniqueId (#487) 2018-12-29 16:22:38 +08:00
c74c915441 Fix license issue of gutil (#471) 2018-12-27 13:53:27 +08:00
b037466d56 Get rid of choosing one tablet by compaction (#433)
1. Get rid of choosing one tablet by compaction.
2. Change PREFER_READER to PREFER_WRITING from _tablet_map_lock.
3. Change license of murmur_hash
2018-12-24 16:55:39 +08:00
f1db289934 Fix compile error (#461) 2018-12-24 10:26:03 +08:00
90d71508ff Add UserFunctionCache to cache UDF's library (#453)
* Add UserFunctionCache to cache UDF's library

This patch replace LibCache with UserFunctionCache. LibCache use HDFS
URL to identify a UDF's Library, and when BE process restart all of
downloaded library should be loaded another time. We use function id
corresponding to a library, and when process restart, all downloaded
libraries can be loaded without another downloading.

* update
2018-12-21 22:07:21 +08:00
e177c23787 Update the increased frequency of priority for remaining tasks in BlockingPriorityQueue (#434) 2018-12-17 21:05:57 +08:00
e2bb86cf78 Add Md5Digest to util (#420) 2018-12-12 20:06:35 +08:00
6b4049e21c Unify Slice code path (#380) 2018-12-03 18:11:47 +08:00
9a2ad18428 Add path info of replica in catalog (#327)
Add path info of replica in catalog

Also fix a bug that when calling check_none_row_oriented_table,
store is null, it cannot be used to create table.
Instead, OLAPHeader can be used to get storage type information.
2018-11-19 17:42:46 +08:00
a2b299e3b9 Reduce UT binary size (#314)
* Reduce UT binary size

Almost every module depend on ExecEnv, and ExecEnv contains all
singleton, which make UT binary contains all object files.

This patch seperate ExecEnv's initial and destory to anthor file to
avoid other file's dependence. And status.cc include debug_util.h which
depend tuple.h tuple_row.h, and I move get_stack_trace() to
stack_util.cpp to reduce status.cc's dependence.

I add USE_RTTI=1 to build rocksdb to avoid linking librocksdb.a

Issue: #292

* Update
2018-11-15 16:17:23 +08:00
063f7d7a9a Fix code LICENSE for file modified from LevelDB. (#300) 2018-11-12 16:09:40 +08:00
2081b7fea5 Be compatible with old RPC (#296)
Add palo.PInternalService which can server old version palo's client.

Issue: #293
2018-11-10 15:46:45 +08:00
c877b43013 Remove my aes and fix palo ns to doris (#277) 2018-11-02 17:05:48 +08:00
d57e91db6e Rewrite aes encryption (#264)
Resolve #257
2018-11-02 15:26:31 +08:00
37b4cafe87 Change variable and namespace name in BE (#268)
Change 'palo' to 'doris'
2018-11-02 10:22:32 +08:00
2868793b6b Change license to Apache License 2.0 (#262) 2018-11-01 09:06:01 +08:00
051aced48d Missing many files in last commit
In last commit, a lot of files has been missed
2018-10-31 16:19:21 +08:00
5d3fc80067 Added:
* Add streaming load feature. You can execute 'help stream load;' to see more information.

Changed:
* Loading phase of a certain table can be parallelized, to reduce the load job execution time when multi load jobs to a single table.
* Using RocksDB to save the header info of tablets in Backends, to reduce the IO operations and increate speeding of restarting.

Fixed:
* A lot of bugs fixed.
2018-10-31 14:46:22 +08:00
765c91bbc2 Added: change Doris build.sh to get environment variables from
custom_env.sh, and add run-ut.sh and run-fe-ut.sh
2018-10-30 23:42:05 +08:00
ae9ce81453 Changed: change build.sh to use environment variable to get thirdparty's
path, and change PALO_HOME to DORIS_HOME
2018-10-30 16:29:06 +08:00
4f6f8572de Added: Add 3 new metrics of Backends: host_fd_metrics, process_fd_metrics and process_thread_metrics, to monitor open file number and thread number.
Added: Support getting column size and precision info of table or view using JDBC.

Updated: Change the promethues type name GAUGE to lowercase, to fit the latest promethues version.
Updated: Backend ip saved in FE will be compared with BE's local ip when doing heartbeat, to avoid false positive heartbeat response.
Updated: Using version_num of tablet instead of calculating nice value to select cumulative compaction candicates.

Fixed: Predicates should not be pushed down to subquery which contains limit clause.
Fixed: Fix the formula of calculating BE load score.
Fixed: Fix a bug that in some edge cases, non-master Fontend may wait for a unnecessary long timeout after forwarding cmd to Master FE.
Fixed: A bug that granting privs on more than one table does not work.
Fixed: Support 'Insert into' table which contains HLL columns.
Fixed: ExportStmt' toSql() method may throw NullPointer Exception if table does not exist.
Fixed: Remove unnecessary 'get capacity' operation to avoid IO impact.

Internal commit id: merge to c16bd603a53dfe2089ff95704c698a738c317792
2018-10-26 14:48:21 +08:00
cc74efb3c5 merge to ddb65b69f9c788e359e191889cb31f15279c41ec (#224)
1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication.
2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS.
3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user.
4. A lot of bugs fixed.
5. Performance improvement.
2018-08-24 17:12:26 +08:00
19997510a6 merge to 9625ef157dd44c58802d63cb7547f037b75fd710 (#208)
1. Implement Backend http server using libevent instead of mongoose.
2. Remove Old Hypertable rpc framework, use brpc instead.
3. Change rpc from FE to BE to brpc.
4. Fs broker support HDFS HA.
5. add more metrics to monitor.
6. Lots of bug fixed.
2018-07-17 09:20:30 +08:00
bfe1bc7cf3 remove mysql_dtoa 2018-06-13 08:33:48 +08:00
7e2a3aa1b3 modify the license (#203)
some license is replaced not correctly.
2018-06-09 19:12:16 +08:00
2419384e8a push 3.3.19 to github (#193)
* push 3.3.19 to github

* merge to 20ed420122a8283200aa37b0a6179b6a571d2837
2018-05-15 20:38:22 +08:00