Commit Graph

61 Commits

Author SHA1 Message Date
2b4d02b2fa Add error load log url for routine load job (#938) 2019-04-28 10:33:50 +08:00
9d08be3c5f Add metrics for routine load (#795)
* Add metrics for routine load
* limit the max number of routine load task in backend to 10
* Fix bug that some partitions will no be assigned
2019-04-28 10:33:50 +08:00
8474061d63 Add some logs (#711) 2019-04-28 10:33:50 +08:00
0820a29b8d Implement the routine load process of Kafka on Backend (#671) 2019-04-28 10:33:50 +08:00
da308da17c Fix bug that empty stream load return unexpected error msg (#1052) 2019-04-28 09:36:19 +08:00
c0fbc84381 Fix bug that ScanBytes is when collect executing query's infos (#869) 2019-04-03 18:27:50 +08:00
348c61c69f Fix doris on es bug (#826)
* Get in pred from hybridset

* ignore new_filter_in when push down

* Ignore cast case in to_ext_literal
2019-03-28 12:54:17 +08:00
f4a63b29d8 Fix doris on es bug (#791) 2019-03-22 19:03:27 +08:00
c34b306b4f Decimal optimize branch #695 (#727) 2019-03-22 17:22:16 +08:00
11307b23c8 Fix bug: stream load ignore last line with no-newline (#785)
#783
2019-03-21 19:18:22 +08:00
7965a7129a Add esquery function (#652) 2019-03-08 09:27:41 +08:00
397747af2c Fix bug that push down the predicates past AggregateNode (#658) 2019-02-26 10:55:14 +08:00
aba1b9e5d6 Reopen the thrift client when got exception (#610)
To avoid broken connection being reused.
2019-01-31 16:54:49 +08:00
af445b6cc2 Optimize something (#607)
1. Unify the thrift rpc timeout from BE to FE.
    Add a BE config 'thrift_rpc_timeout_ms', default is 5000
2. Add hostname in "show proc '/frontends';" stmt result.
3. Fix a lock order bug in Load.java
2019-01-31 13:30:45 +08:00
4f3954fd77 Fix bug that recvr thread update sub plan's QueryStatistics when it is destructed (#573) 2019-01-23 17:00:40 +08:00
f7155217bf Remove build rows counter in PartitionHashJoinNode (#557)
* Remove build rows counter in PartitionHashJoinNode
* Fix unit test fail in RuntimeProfileTest
* Add check for result type length in cast_to_string_val
2019-01-21 14:08:59 +08:00
717285db1e Remove unused code about showing current queries (#552) 2019-01-18 09:53:40 +08:00
4d5f92cce7 Add EsScanNode (#450) 2019-01-17 17:59:33 +08:00
0e5b193243 Add cpu and io indicates to audit log (#531) 2019-01-17 12:43:15 +08:00
e8360f5eee Add counters to OlapScanNode (#538)
There is unnegligible cost to covnert VectorRowBatch to RowBatch,
When we seek block, we only read one row from engine to minimize
this convert cost.

This patch can optimize some query's time from 5s to 2s
2019-01-16 18:57:04 +08:00
d372b04e42 Revert "Add cpu and io indicates to audit log (#513)" (#520)
This reverts commit 5192e2f010308eefffa5271b0bdc947dfd9168ae.
2019-01-10 12:44:09 +08:00
5192e2f010 Add cpu and io indicates to audit log (#513)
Record query consumption into fe audit log. Its basic mode of work is as follows, one of instance of parent plan is responsible for accumulating sub plan's consumption and send to it's parent, BE coordinator will get total consumption because it's a single instance.
2019-01-09 22:28:20 +08:00
92b138121b Support io and cpu indicates for current query (#497)
Help to locate big query when system overload, by checking consumptions of running parts of current all queries or specified one query. Its basic mode of work is as follows: firstly trigger BE to report RuntimeProfiles, and wait a moment. secondly caculate consumptions with RuntimeProfiles reported by BE. The consumptions supported by it are the cost of running ExecNode in query when call it.
2019-01-08 10:59:42 +08:00
a51ce03595 Enhance the usability of Load operation (#490)
1. Add broker load error hub
A broker load error hub will collect error messages in load process and saves them as a file to the specified remote storage via broker. In case that in broker/min/streaming load process, user may not be able to access the error log file in Backend directly.
We also add a new header option: 'enable_hub' in streaming load request, and default is false. Because if we enable the broker load error hub, it will significantly slow down the processing speed of streaming load, due to the visit of remote storage via broker. So use can disable the error load hub using this header option, to avoid slowing down the load speed.

2. Show load error logs by using SHOW LOAD WARNINGS stmt
We also provide a more easy way to get load error logs. We implement 'SHOW LOAD WARNINGS ON 'url'' stmt to show load error logs directly. The 'url' in stmt is provided in 'SHOW  LOAD' stmt.
eg:
show load warnings on "http://192.168.1.1:8040/api/_load_error_log?file=__shard_2/error_log_xxx";

3. Support now() function in broker load
User can mapping a column to now() in broker load stmt, which means this column will be filled with time when the ETL started.

4. Support more types of wildcard in broker load
Currently, we only support wildcard '*' to match the file names. wildcard like '/path/to/20190[1-4]*' is not support.
2019-01-03 19:07:27 +08:00
74cc5c5404 Write summary line in load error file anyway (#425)
Summary line should be wrote in spite of the limit error number
2018-12-13 12:35:19 +08:00
530bdec020 Fix bug that null value will be loaded to non-nullable column (#401)
* Fix bug that null value will be loaded to non-nullable column

* Optimize performance
2018-12-06 19:55:55 +08:00
e913e45343 Fix bug that null value will be loaded to non-nullable column (#397) 2018-12-06 10:09:34 +08:00
6b4049e21c Unify Slice code path (#380) 2018-12-03 18:11:47 +08:00
33873f2446 Fix wrong query result when column value is Null (#344) 2018-11-26 13:36:23 +08:00
fec3c58655 Change log verbose level to vlog(3) (#325)
* Transform row-oriented table to columnar-oriented table

* Transform row-oriented table to columnar-oriented table

* change log verbose level
2018-11-16 17:17:39 +08:00
1ba8a4ee4e Transform row-oriented table to columnar-oriented table (#311) 2018-11-16 16:03:56 +08:00
a2b299e3b9 Reduce UT binary size (#314)
* Reduce UT binary size

Almost every module depend on ExecEnv, and ExecEnv contains all
singleton, which make UT binary contains all object files.

This patch seperate ExecEnv's initial and destory to anthor file to
avoid other file's dependence. And status.cc include debug_util.h which
depend tuple.h tuple_row.h, and I move get_stack_trace() to
stack_util.cpp to reduce status.cc's dependence.

I add USE_RTTI=1 to build rocksdb to avoid linking librocksdb.a

Issue: #292

* Update
2018-11-15 16:17:23 +08:00
2081b7fea5 Be compatible with old RPC (#296)
Add palo.PInternalService which can server old version palo's client.

Issue: #293
2018-11-10 15:46:45 +08:00
37b4cafe87 Change variable and namespace name in BE (#268)
Change 'palo' to 'doris'
2018-11-02 10:22:32 +08:00
2868793b6b Change license to Apache License 2.0 (#262) 2018-11-01 09:06:01 +08:00
051aced48d Missing many files in last commit
In last commit, a lot of files has been missed
2018-10-31 16:19:21 +08:00
5d3fc80067 Added:
* Add streaming load feature. You can execute 'help stream load;' to see more information.

Changed:
* Loading phase of a certain table can be parallelized, to reduce the load job execution time when multi load jobs to a single table.
* Using RocksDB to save the header info of tablets in Backends, to reduce the IO operations and increate speeding of restarting.

Fixed:
* A lot of bugs fixed.
2018-10-31 14:46:22 +08:00
4f6f8572de Added: Add 3 new metrics of Backends: host_fd_metrics, process_fd_metrics and process_thread_metrics, to monitor open file number and thread number.
Added: Support getting column size and precision info of table or view using JDBC.

Updated: Change the promethues type name GAUGE to lowercase, to fit the latest promethues version.
Updated: Backend ip saved in FE will be compared with BE's local ip when doing heartbeat, to avoid false positive heartbeat response.
Updated: Using version_num of tablet instead of calculating nice value to select cumulative compaction candicates.

Fixed: Predicates should not be pushed down to subquery which contains limit clause.
Fixed: Fix the formula of calculating BE load score.
Fixed: Fix a bug that in some edge cases, non-master Fontend may wait for a unnecessary long timeout after forwarding cmd to Master FE.
Fixed: A bug that granting privs on more than one table does not work.
Fixed: Support 'Insert into' table which contains HLL columns.
Fixed: ExportStmt' toSql() method may throw NullPointer Exception if table does not exist.
Fixed: Remove unnecessary 'get capacity' operation to avoid IO impact.

Internal commit id: merge to c16bd603a53dfe2089ff95704c698a738c317792
2018-10-26 14:48:21 +08:00
65fe7f65c1 Fixed: privilege logic error:
1. No one can set root password expect for root user itself
    2. NODE_PRIV cannot be granted.
    3. ADMIN_PRIV and GRANT_PRIV can only be granted or revoked on *.*
    4. No one can modifly privs of default role 'operator' and 'admin'.
    5. No user can be granted to role 'operator'.
Fixed: the running load limit should not be applied to replay logic. It will cause replay or loading image fail.
Changed: optimize the problem of too many directories under mini load directory.
Fixed: missing password and auth check when handling mini load request in Frontend.
Fixed: DomainResolver should start after Frontends transfer to a certain ROLE, not in Catalog construction methods.
Fixed: a stupid bug that no one can set password for root user... fix it: only root user can set password for root.
Fixed: read null data twice
    When reading data with a null value, in some cases, the same data will be read twice by the storage engine,
    resulting in a wrong result.The reason for this problem is that when splitting,
    and the start key is the minimum value, the data with null is read.
Fixed: add a flag to prevent DomainResovler thread start twice.
Fixed: fixed a mem leak of using ByteBuf when parsing auth info of http request.
Fixed: add a new config 'disable_hadoop_load', default is false, set to true to disable hadoop load.
Changed: add detail error msg of submitting hadoop load job in show load result.
Fixed: Backend process should be crashed if failed to saving header.
Added: exposure backend info to user when encounter error on Backend. for debugging it more convenient.
Fixed: Should remove fd from map when inputstream or outputstream is closed in Broker process.
Fixed: Change all files' LF to unix format.

Internal commit id: merge from dfcd0aca18eed9ff99d188eb3d01c60d419be1b8
2018-10-01 19:58:41 +08:00
bea10e4f06 1. hide password and other sensitive information in log and audit log
2. add 2 new proc '/current_queries' and '/current_backend_instances' to monitor the current running queries.
3. add a manual compaction api on Backend to trigger cumulative or base compaction manually.
4. add Frontend config 'max_bytes_per_broker_scanner' to limit to bytes per one broker scanner. This is to limit the memory cost of a single broker load job
5. add Frontend config 'max_unfinished_load_job' to limit load job number: if number of running load jobs exceed the limit, no more load job is allowed to be submmitted.
6. a log of bug fixed
2018-09-19 20:04:01 +08:00
cc74efb3c5 merge to ddb65b69f9c788e359e191889cb31f15279c41ec (#224)
1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication.
2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS.
3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user.
4. A lot of bugs fixed.
5. Performance improvement.
2018-08-24 17:12:26 +08:00
19997510a6 merge to 9625ef157dd44c58802d63cb7547f037b75fd710 (#208)
1. Implement Backend http server using libevent instead of mongoose.
2. Remove Old Hypertable rpc framework, use brpc instead.
3. Change rpc from FE to BE to brpc.
4. Fs broker support HDFS HA.
5. add more metrics to monitor.
6. Lots of bug fixed.
2018-07-17 09:20:30 +08:00
9f7b1ea6d4 merge to 87fd4ebd9977afb1e1193429dd75c7c82caab204 (#202)
1. ix bugs in query layer.
2. remove some redundant code in BE
3. support specify multi helper node when starting FE
4. add proc 'cluster_load_statistic' to show load balance situation of Palo
2018-06-08 08:42:23 +08:00
c4be57150b merge to 6da2dd322d34810ef6f12ebc0f870d89f55df140 (#200)
1. rewrite metric module, add disk capacity metric
2. add Cluster load statistic proc to observer cluster load balance status
3. fix bug: show table status from db throw Nullpointer exception
4. performance: change push_down table size to 1024 in HashJoinNode
2018-06-04 14:51:11 +08:00
2419384e8a push 3.3.19 to github (#193)
* push 3.3.19 to github

* merge to 20ed420122a8283200aa37b0a6179b6a571d2837
2018-05-15 20:38:22 +08:00
5de798fdd6 Merge code to github (#187)
* merge to 95787f8be1fd0ff215708fb0f49997b632876586
* Bugs fixed
2018-03-23 14:04:55 +08:00
838290db88 use sychronized to protect fs stream in broker (#171) 2018-01-04 19:30:48 +08:00
32c5570771 fix some bugs (#169)
* change broker's default log level to INFO. fix some log error

* change exporting data via broker in batch
2018-01-02 19:50:15 +08:00
4b4a52369e improve logic to sending to broker (#166)
1. Increase default timeout to 5s to avoid error in network jitter scenarios;
2. do not re-send request if meet TTransportException
2017-12-27 21:05:15 +08:00
756f9bdb6c exchange check error when parent child is union (#158) 2017-12-18 11:20:39 +08:00