Commit Graph

79 Commits

Author SHA1 Message Date
615c979727 Fix bug that BE crashes when inserting null value to non-nullable columns (#1447) 2019-07-10 09:20:09 +08:00
7eab12a40e Support reading Parquet file when loading data (#1173) 2019-07-01 18:39:27 +08:00
1422414e43 Add varchar column name to stream load error msg (#1366) 2019-06-24 14:52:59 +08:00
687d57be66 Fix bug that query statistics in audit log are wrong (#1354) 2019-06-21 19:16:05 +08:00
ba44249f80 Remove unused code (#1320) 2019-06-15 20:41:48 +08:00
30028bc35b Deny specify partition for unpartitioned table (#1319) 2019-06-15 18:19:56 +08:00
9d03ba236b Uniform Status (#1317) 2019-06-14 23:38:31 +08:00
53062122ea Change strategy of incorrect data (#1255)
This change adds a load property named strict_mode which is used to prohibit the incorrect data.
When it is set to false, the incorrect data will be loaded by NULL just like before.
When it is set to true, the incorrect data which belongs to a column without expr will be filtered.
The strict_mode is supported in broker load v2 now. It will be supported in stream load later.
2019-06-10 20:39:45 +08:00
e4e04e8203 Make LZO support optional (#1263) 2019-06-07 22:26:54 +08:00
934ca2481a Make MySQL support optional (#1248) 2019-06-05 12:28:15 +08:00
9f5f44ec48 Reduce memory RowBlock needed (#1238)
Before RowBlock will reserve memory for all columns in schema, even if
it is not queried. Which will cause bad performance when quering wide
table.

In this patch, RowBlock will reserve memory for needed columns. In a
case, this reduce ConvertBatchTime from 10s to 60ms when quering a wide
table who has 178 columns.

 #1236
2019-06-04 12:58:41 +08:00
85b4619d54 Change insert into to streaming (#1191)
The non-streaming hint of insert into will use the streamin plan which is same as the plan of stream insert.
It will also record the load info and return the label of insert stmt.
The partition is supportted in insert into stmt. The result which meet the target partitions will be loaded.
The introduction of example has been changed especially non-streaming insert.
Also, the param of partition_names is added in sql syntax which is used to declare the target partition_names in target table.

Change META_VERSION to 50
2019-05-23 20:53:30 +08:00
02f36c23ed Set tablet as bad when loading index failed (#1146)
Bad tablet will be reported to FE and be handled

And add a config auto_recover_index_loading_failure to control the index loading failure processing
2019-05-13 10:22:04 +08:00
79ab7f4413 Change label of broker load txn (#1134)
* Change label of broker load txn

1. put broker load label into txn label
2. fix the bug of `label is already used`
3. fix partition error of new broker load

* Fix count error in mini load and broker load

There are three params (num_rows_load_total, num_rows_load_filtered, num_rows_load_unselected) which are used to count dpp.norm.ALL and dpp.abnorm.ALL.
num_rows_load_total is the number rows of source file.
num_rows_load_unselected is the not satisfied (where conjuncts) rows of num_rows_load_total
num_rows_load_filtered is the rows (quality not good enough) of (num_rows_load_total-num_rows_load_unselected)
2019-05-10 16:53:46 +08:00
afa3aa9069 Add some pre-calculated metrics (#1079)
1. max io util of disks
2. max network send/receive bytes rate of all network devices
3. base/cumulative compaction request counter and failure counter
2019-04-30 11:12:23 +08:00
310a375aec Fix bug that null value is not correctly handled when loading data (#1070)
When partition column's value is NULL, it should be loaded into
    the partition which include MIN VALUE
2019-04-29 13:55:28 +08:00
9c82d41981 Support Doris query ES by HTTP way (#925) 2019-04-28 17:14:44 +08:00
b7b66527ce Fix some load bugs (#961)
1. Use load job's timeout as its txn timeout
2. Add a new session variable 'forward_to_master' for SHOW PROC and ADMIN stmt
2019-04-28 10:33:50 +08:00
2b4d02b2fa Add error load log url for routine load job (#938) 2019-04-28 10:33:50 +08:00
9d08be3c5f Add metrics for routine load (#795)
* Add metrics for routine load
* limit the max number of routine load task in backend to 10
* Fix bug that some partitions will no be assigned
2019-04-28 10:33:50 +08:00
8474061d63 Add some logs (#711) 2019-04-28 10:33:50 +08:00
0820a29b8d Implement the routine load process of Kafka on Backend (#671) 2019-04-28 10:33:50 +08:00
da308da17c Fix bug that empty stream load return unexpected error msg (#1052) 2019-04-28 09:36:19 +08:00
c0fbc84381 Fix bug that ScanBytes is when collect executing query's infos (#869) 2019-04-03 18:27:50 +08:00
348c61c69f Fix doris on es bug (#826)
* Get in pred from hybridset

* ignore new_filter_in when push down

* Ignore cast case in to_ext_literal
2019-03-28 12:54:17 +08:00
f4a63b29d8 Fix doris on es bug (#791) 2019-03-22 19:03:27 +08:00
c34b306b4f Decimal optimize branch #695 (#727) 2019-03-22 17:22:16 +08:00
11307b23c8 Fix bug: stream load ignore last line with no-newline (#785)
#783
2019-03-21 19:18:22 +08:00
7965a7129a Add esquery function (#652) 2019-03-08 09:27:41 +08:00
397747af2c Fix bug that push down the predicates past AggregateNode (#658) 2019-02-26 10:55:14 +08:00
aba1b9e5d6 Reopen the thrift client when got exception (#610)
To avoid broken connection being reused.
2019-01-31 16:54:49 +08:00
af445b6cc2 Optimize something (#607)
1. Unify the thrift rpc timeout from BE to FE.
    Add a BE config 'thrift_rpc_timeout_ms', default is 5000
2. Add hostname in "show proc '/frontends';" stmt result.
3. Fix a lock order bug in Load.java
2019-01-31 13:30:45 +08:00
4f3954fd77 Fix bug that recvr thread update sub plan's QueryStatistics when it is destructed (#573) 2019-01-23 17:00:40 +08:00
f7155217bf Remove build rows counter in PartitionHashJoinNode (#557)
* Remove build rows counter in PartitionHashJoinNode
* Fix unit test fail in RuntimeProfileTest
* Add check for result type length in cast_to_string_val
2019-01-21 14:08:59 +08:00
717285db1e Remove unused code about showing current queries (#552) 2019-01-18 09:53:40 +08:00
4d5f92cce7 Add EsScanNode (#450) 2019-01-17 17:59:33 +08:00
0e5b193243 Add cpu and io indicates to audit log (#531) 2019-01-17 12:43:15 +08:00
e8360f5eee Add counters to OlapScanNode (#538)
There is unnegligible cost to covnert VectorRowBatch to RowBatch,
When we seek block, we only read one row from engine to minimize
this convert cost.

This patch can optimize some query's time from 5s to 2s
2019-01-16 18:57:04 +08:00
d372b04e42 Revert "Add cpu and io indicates to audit log (#513)" (#520)
This reverts commit 5192e2f010308eefffa5271b0bdc947dfd9168ae.
2019-01-10 12:44:09 +08:00
5192e2f010 Add cpu and io indicates to audit log (#513)
Record query consumption into fe audit log. Its basic mode of work is as follows, one of instance of parent plan is responsible for accumulating sub plan's consumption and send to it's parent, BE coordinator will get total consumption because it's a single instance.
2019-01-09 22:28:20 +08:00
92b138121b Support io and cpu indicates for current query (#497)
Help to locate big query when system overload, by checking consumptions of running parts of current all queries or specified one query. Its basic mode of work is as follows: firstly trigger BE to report RuntimeProfiles, and wait a moment. secondly caculate consumptions with RuntimeProfiles reported by BE. The consumptions supported by it are the cost of running ExecNode in query when call it.
2019-01-08 10:59:42 +08:00
a51ce03595 Enhance the usability of Load operation (#490)
1. Add broker load error hub
A broker load error hub will collect error messages in load process and saves them as a file to the specified remote storage via broker. In case that in broker/min/streaming load process, user may not be able to access the error log file in Backend directly.
We also add a new header option: 'enable_hub' in streaming load request, and default is false. Because if we enable the broker load error hub, it will significantly slow down the processing speed of streaming load, due to the visit of remote storage via broker. So use can disable the error load hub using this header option, to avoid slowing down the load speed.

2. Show load error logs by using SHOW LOAD WARNINGS stmt
We also provide a more easy way to get load error logs. We implement 'SHOW LOAD WARNINGS ON 'url'' stmt to show load error logs directly. The 'url' in stmt is provided in 'SHOW  LOAD' stmt.
eg:
show load warnings on "http://192.168.1.1:8040/api/_load_error_log?file=__shard_2/error_log_xxx";

3. Support now() function in broker load
User can mapping a column to now() in broker load stmt, which means this column will be filled with time when the ETL started.

4. Support more types of wildcard in broker load
Currently, we only support wildcard '*' to match the file names. wildcard like '/path/to/20190[1-4]*' is not support.
2019-01-03 19:07:27 +08:00
74cc5c5404 Write summary line in load error file anyway (#425)
Summary line should be wrote in spite of the limit error number
2018-12-13 12:35:19 +08:00
530bdec020 Fix bug that null value will be loaded to non-nullable column (#401)
* Fix bug that null value will be loaded to non-nullable column

* Optimize performance
2018-12-06 19:55:55 +08:00
e913e45343 Fix bug that null value will be loaded to non-nullable column (#397) 2018-12-06 10:09:34 +08:00
6b4049e21c Unify Slice code path (#380) 2018-12-03 18:11:47 +08:00
33873f2446 Fix wrong query result when column value is Null (#344) 2018-11-26 13:36:23 +08:00
fec3c58655 Change log verbose level to vlog(3) (#325)
* Transform row-oriented table to columnar-oriented table

* Transform row-oriented table to columnar-oriented table

* change log verbose level
2018-11-16 17:17:39 +08:00
1ba8a4ee4e Transform row-oriented table to columnar-oriented table (#311) 2018-11-16 16:03:56 +08:00
a2b299e3b9 Reduce UT binary size (#314)
* Reduce UT binary size

Almost every module depend on ExecEnv, and ExecEnv contains all
singleton, which make UT binary contains all object files.

This patch seperate ExecEnv's initial and destory to anthor file to
avoid other file's dependence. And status.cc include debug_util.h which
depend tuple.h tuple_row.h, and I move get_stack_trace() to
stack_util.cpp to reduce status.cc's dependence.

I add USE_RTTI=1 to build rocksdb to avoid linking librocksdb.a

Issue: #292

* Update
2018-11-15 16:17:23 +08:00