Commit Graph

5755 Commits

Author SHA1 Message Date
4f39d405ee Fix some load bugs (#2384)
For #2383 
1. Limit the concurrent transactions of routine load job
2. Create new routine load task when txn is VISIBLE, not after COMMITTED.

For #2267 
1. All non-master daemon thread should also be started after catalog is ready.

For #2354 
1. `fixLoadJobMetaError()` should be called after all meta data is read, including image and edit logs.
2. Mini load job should set to CANCELLED when corresponding transaction is not found, instead
of UNKNOWN.
2019-12-05 13:41:04 +08:00
102a845131 Support convert date to datetime through alter table (#2385) 2019-12-05 07:37:45 +08:00
92536272d3 Fixed bdbje heartbeat timeout config format bug (#2369)
The heartbeat config format should be like "30 s", not "30"
This CL is related to commit 261072ecdda7e8eb3ce685c557c6dab15488d1f3
2019-12-04 13:28:08 +08:00
0f00febd21 Optimize Doris On Elasticsearch performance (#2237)
Pure DocValue optimization for doris-on-es

Future todo:
Today, for every tuple scan we check if pure_docvalue is enabled, this is not reasonable,  should check pure_docvalue enabled for one whole scan outside,  I will add this todo in future
2019-12-04 12:57:45 +08:00
f0c0a715d1 Add bdbje heartbeat timeout as a configuration of FE (#2366)
The timeline for this question is as follows:

1. For some reason, the master have lost contact with the other two followers.
Judging from the logs of the master, for almost 40 seconds, the master did not print any logs.
It is suspected that it is stuck due to full gc or other reasons, causing the
other two followers to think that the master has been disconnected.

2. After the other two followers re-elected, they continued to provide services.

3. The master node is manually restarted afterwards. When restarting it for the first time,
it needs to rollback some committed logs, so it needs to be closed and restarted again.
After restarting again, it returns to normal.

The main reason is that the master got stuck for 40 seconds for some reason.
This issue requires further observation.

At the same time, in order to alleviate this problem, we decided to set bdbje's heartbeat timeout
as a configurable value. The default is 30 seconds. Can be configured to 1 minute,
try to avoid this problem first.
2019-12-04 08:56:37 +08:00
c8cff85c94 Fixed a bug that HttpServer in unit test does not start correctly. (#2361)
Because the http client in unit test try to connect to the server when
server is not ready yet.
2019-12-03 20:34:16 +08:00
086bb82fd2 Fixed a bug that Load job's state is incorrect when upgrading from 0.10.x to 0.11.x (#2356)
There is bug in Doris version 0.10.x. When a load job in PENDING or LOADING
state was replayed from image (not through the edit log), we forgot to add
the corresponding callback id in the CallbackFactory. As a result, the
subsequent finish txn edit logs cannot properly finish the job during the
replay process. This results in that when the FE restarts, these load jobs
that should have been completed are re-entered into the pending state,
resulting in repeated submission load tasks.

Those wrong images are unrecoverable, so that we have to reset all load jobs
in PENDING or LOADING state when restarting FE, depends on its corresponding
txn's status, to avoid submit jobs repeatedly.

If corresponding txn exist, set load job' state depends on txn's status.
If txn does not exist, may be the txn has been removed due to label expiration.
So that we don't know the txn is aborted or visible. So we have to set the job's state
as UNKNOWN, which need handle it manually.
2019-12-03 16:02:50 +08:00
875790eb13 Remove VersionHash used to comparation in Fe (#2335) 2019-12-02 19:59:13 +08:00
d90995c410 Make node info metrics available on all FE node (#2353)
Previously, only Master FE has node info metrics to indicate which node is alive.
But this info should be available on every FE, so that the monitor system
can get all metrics from any FE.
2019-12-02 17:31:32 +08:00
698d93a077 Suport convert float to double, datetime to date by "alter table modify column type" (#2310) 2019-12-02 15:55:14 +08:00
725468f8a2 Fix bug of getting ES host error (#2342) 2019-12-01 13:06:32 +08:00
5ac4f3468e Remove old decommission job (#2326)
DecommissionJob is also a type of AlterJob.
When AlterJobV2 was introduced before, DecommissionJob was not modified accordingly.

In fact, the Decommission operation does not need to generate a Job, but only need to mark the corresponding Backend state as Decommission. After that, the tablet repair logic will try to migrate the tablet on that Backend. And SystemHandler only needs to check all nodes marked as decommission, and then drop the emptied nodes.
2019-11-29 21:02:53 +08:00
ba76504fdc Fixed a bug that FE config 'async_load_task_pool_size' missing @ConfField annotation (#2339)
This will cause 'async_load_task_pool_size' not configurable.
2019-11-29 14:11:51 +08:00
8bf00afa25 Create table with nullable column for default (#2256)
Change the default column null property to nullable
2019-11-29 11:11:31 +08:00
6e33308472 Show tablet lists in EXPLAIN OlapScanNode (#2316) 2019-11-29 07:38:47 +08:00
814e486113 Support ifnull fe builtin function (#2292) (#2327) 2019-11-28 22:10:08 +08:00
e7b05f7eb3 Date format support java date style "yyyy-MM-dd HH:mm:ss" (#2309) 2019-11-28 14:34:31 +08:00
a2d7c42042 Add a variable to specifically limit the memory usage of the load part in the insert operation (#2305)
This variable is mainly for INSERT operation, because INSERT operation has both query and load part.
Using only the exec_mem_limit variable does not make a good distinction of memory limit between the two parts.
2019-11-28 13:03:11 +08:00
ccbd65daeb Ensure ES endpoint without http prefix can work (#2303) 2019-11-26 22:52:10 +08:00
78cee0050d Fix IFNULL constants compute error (#2290) (#2291) 2019-11-25 18:47:36 +08:00
d5aeb9a6b7 Add document for session variables. (#2284)
Also make the variable effective in current session when setting it globally.
2019-11-24 22:47:05 +08:00
46181c0880 Fix some bugs about load label (#2241) 2019-11-23 00:04:45 +08:00
79ff0ad2a4 Add pipes_as_concat_mode (#2252)
This commit will add a new sql mode named MODE_PIPES_AS_CONCAT:
Description:
1、If this mode is active, '||' will be handled different from the original way ('||' and 'or' are seen as the same symbols in Doris) that it can be used to concat two exps and returns a new string. For example, 'a' || 'b' = 'ab' and 1 || 0 = '10'.
2. User can active this mode by "SET sql_mode = PIPES_AS_CONCAT", and deactive it by "SET sql_mode = '' ".
2019-11-22 15:01:53 +08:00
297542bd3f Delay start master only daemon threads (#2268)
These daemon thread should be started after catalog is ready,
otherwise it may cause some undefined behavior.
2019-11-22 14:39:37 +08:00
f7d3af1f0a Fix export job bug (#2250)
The query type of export job plan is SELECT, not LOAD.
We need to remove the assertion.
2019-11-21 22:00:39 +08:00
4fb498a1dc fix unit test failure for show columns from unknown table (#2261) 2019-11-21 21:38:36 +08:00
9c85a04580 Add schema hash to tablet proc info (#2257) 2019-11-21 10:06:30 +08:00
88236de63e Fix bug for showing columns from non exist table doesn't prompt error (#2254) 2019-11-20 19:02:34 +08:00
46005bf6ba Fix bug for show create table statement with unique key types (#2231) 2019-11-20 10:02:04 +08:00
9b5eeaec19 Fix bug that DeployManager should start working after catalog is ready. (#2244)
Otherwise, it can not get master ip/port from not-ready catalog.
2019-11-20 09:49:09 +08:00
db8819d365 Change sqlmode 'required' to 'optional' in forward master request, (#2236) 2019-11-19 17:32:37 +08:00
4984be9d76 Persist sqlmode in load metadata and add sqlmode to forward master request (#2216) 2019-11-19 16:58:10 +08:00
d8cfbbedf7 Support bitmap_empty function (#2227) 2019-11-18 20:37:00 +08:00
c5ce72215d Optimize tablet report with expired transaction. (#2215)
When there are lots of expired transactions on BE, and with large
number of tablet, the report thread may become to slow. Because it
has to iterate the whole transaction map for each tablet.

But this is unnecessary. We should first build a expired transaction
map with 'tablet id' as key. And for each tablet, we only need to seek
the expired transaction map once with tablet id, instead of traversing
the whole transaction map.
2019-11-15 23:03:21 +08:00
84c1fa88b8 Add node dead num metrics for all types of node (#2191)
Following metrics will show the number of nodes which are down.

frontend_down_num
backend_down_num
broker_down_num
2019-11-13 23:25:51 +08:00
11872d5cf6 Sending clear txn task explicitly after transaction being aborted (#2182) 2019-11-13 11:22:45 +08:00
b4d630137a Fix DB meta lost bug 2 (#2174) 2019-11-12 09:35:27 +08:00
1695d8ffc7 Clean the fe/target directory before building (#2173)
Clean the fe/target directory before building
Otherwise, the fe output dir will contains some deprecated libraries.
2019-11-11 22:04:17 +08:00
06befc45ed Support decrease edit_log_roll_num config (#2171) 2019-11-11 14:20:32 +08:00
288cf1ec80 Fix DB meta lost bug (#2167) 2019-11-11 11:02:21 +08:00
9eaba67606 Limit the FE log file number (#2163)
1. upgrade log4j to 2.12.1
2. Add 2 new FE config:
        'sys_log_delete_age' and default is '7d', for sys log.
        'audit_log_delete_age' and default is '30d', for audit log.

   it means if a log's last modification time is 7/30 days ago, it will be deleted.
2019-11-11 09:12:57 +08:00
48d9318d07 Support date_add function to support partition prune (#2154)
Currently in the date_add/date_sub functions (DATE_ADD(DATETIME date,INTERVAL expr type)), the expr parameter is the interval you want to add.
Doris will convert these functions to xxx_sub/xxx_add. However, there is only the days_add function in fe, which causes other date_add formats, such as select date_add('2010-11-30 23:59:59', INTERVAL 2 DAY), cannot be pruned.

So I've added other functions to support fe partition prune
2019-11-08 18:57:21 +08:00
3886503c02 Fix bug of core in multi join (#2164)
The result of function named getHashLookupJoinConjuncts() are the predicates which already adjust the order of left and right child.
2019-11-08 18:55:38 +08:00
42395d2455 Change Null-safe equal operator from cross join to hash join (#2156)
* Change Null-safe equal operator from cross join to hash join
ISSUE-2136

This commit change the join method from cross join to hash join when the equal operator is Null-safe '<=>'.
It will improve the speed of query which has the Null-safe equal operator.
The finds_nulls field is used to save if there is Null-safe operator.
The finds_nulls[i] is true means that the i-th equal operator is Null-safe.
The equal function in hash table will return true, if both val and loc are NULL when finds_nulls[i] is true.
2019-11-08 12:43:48 +08:00
af79485eb2 Ignore --helper start argument if not first time to start FE (#2159) 2019-11-08 08:48:11 +08:00
89dc461f91 Fix UT and remove unused code (#2160) 2019-11-08 08:47:48 +08:00
d461a451d7 Add log info for QueryPlanAction (#2152) 2019-11-07 22:48:20 +08:00
2efd9e54ea Optimize the query plan so that UnionNode can be executed distributedly (#2150) 2019-11-07 19:41:06 +08:00
5a4908e99a Forward stmt with stmt id generated on origin FE. (#2129)
Some stmt, such as DDL and DML stmt will be forwarded from non-master FE
to Master FE. But these stmt will be logged in non-master FE's audit log
with its origin stmt id generated on non-master FE.

So we should also pass this origin stmt id to Master, so that we can track
this stmt's execution process more easily.
2019-11-07 10:28:15 +08:00
7b4ae7df06 Merge pull request #2141 from morningman/modify_routine_load_log
Add some log to detect deadlock of routine load job
2019-11-06 16:09:56 +08:00