FIX: Redis fallback handler refactoring (#8771)

* DEV: Add a fake Mutex that for concurrency testing with Fibers

* DEV: Support running in sleep order in concurrency tests

* FIX: A separate FallbackHandler should be used for each redis pair

This commit refactors the FallbackHandler and Connector:

 * There were two different ways to determine whether the redis master
   was up. There is now one way and it is the responsibility of the
   new RedisStatus class.

 * A background thread would be created whenever `verify_master` was
   called unless the thread already existed. The thread would
   periodically check the status of the redis master. However, checking
   that a thread is `alive?` is an ineffective way of determining
   whether it will continue to check the redis master in the future
   since the thread may be in the process of winding down.

   Now, this thread is created when the recorded master status goes from
   up to down. Since this thread runs the only part of the code that is
   able to bring the recorded status up again, we ensure that only one
   thread is probing the redis master at a time and that there is always
   a thread probing redis master when it is recorded as being down.

 * Each time the status of the redis master was checked periodically, it
   would spawn a new thread and immediately join on it. I assume this
   happened to isolate the check from the current execution, but since
   the join rethrows exceptions in the parent thread, this was not
   effective.

 * The logic for falling back was spread over the FallbackHandler and
   the Connector. The connector is now a dumb object that delegates
   responsibility for determining the status of redis to the
   FallbackHandler.

 * Previously, failing to connect to a master redis instance when it was
   not recorded as down would raise an exception. Now, this exception is
   passed to `Discourse.warn_exception` and the connection is made to
   the slave.

This commit introduces the FallbackHandlers singleton:

 * It is responsible for holding the set of FallbackHandlers.

 * It adds callbacks to the fallback handlers for when a redis master
   comes up or goes down. Main redis and message bus redis may exist on
   different or the same redis hosts and so these callbacks may all
   exist on the same FallbackHandler or on separate ones.

These objects are tested using fake concurrency provided by the
Concurrency module:

 * An `around(:each)` hook is used to cause each test to run inside a
   Scenario so that the test body, mocking cleanup and `after(:each)`
   callbacks are run in a different Fiber.

 * Therefore, holting the execution of the Execution abruptly (so that
   the fibers aren't run to completion), prevents the mocking cleaning
   and `after(:each)` callbacks from running. I have tried to prevent
   this by recovering from all exceptions during an Execution.

* FIX: Create frozen copies of passed in config where possible

* FIX: extract start_reset method and remove method used by tests

Co-authored-by: Daniel Waterworth <me@danielwaterworth.com>
This commit is contained in:
Krzysztof Kotlarek
2020-01-23 13:39:29 +11:00
committed by GitHub
parent 1b3b0708c0
commit 4f677854d3
4 changed files with 720 additions and 235 deletions

View File

@ -3,139 +3,248 @@
#
# A wrapper around redis that namespaces keys with the current site id
#
require_dependency 'cache'
require_dependency 'concurrency'
class DiscourseRedis
class FallbackHandler
include Singleton
class RedisStatus
MASTER_ROLE_STATUS = "role:master".freeze
MASTER_LOADING_STATUS = "loading:1".freeze
MASTER_LOADED_STATUS = "loading:0".freeze
CONNECTION_TYPES = %w{normal pubsub}.each(&:freeze)
def initialize
@master = true
@running = false
@mutex = Mutex.new
@slave_config = DiscourseRedis.slave_config
@message_bus_keepalive_interval = MessageBus.keepalive_interval
def initialize(master_config, slave_config)
master_config = master_config.dup.freeze unless master_config.frozen?
slave_config = slave_config.dup.freeze unless slave_config.frozen?
@master_config = master_config
@slave_config = slave_config
end
def verify_master
synchronize do
return if @thread && @thread.alive?
@thread = Thread.new do
loop do
begin
thread = Thread.new { initiate_fallback_to_master }
thread.join
break if synchronize { @master }
sleep 5
ensure
thread.kill
end
end
end
end
end
def initiate_fallback_to_master
success = false
def master_alive?
master_client = connect(@master_config)
begin
redis_config = DiscourseRedis.config.dup
redis_config.delete(:connector)
master_client = ::Redis::Client.new(redis_config)
logger.warn "#{log_prefix}: Checking connection to master server..."
info = master_client.call([:info])
if info.include?(MASTER_LOADED_STATUS) && info.include?(MASTER_ROLE_STATUS)
begin
logger.warn "#{log_prefix}: Master server is active, killing all connections to slave..."
self.master = true
slave_client = ::Redis::Client.new(@slave_config)
CONNECTION_TYPES.each do |connection_type|
slave_client.call([:client, [:kill, 'type', connection_type]])
end
MessageBus.keepalive_interval = @message_bus_keepalive_interval
Discourse.clear_readonly!
Discourse.request_refresh!
success = true
ensure
slave_client&.disconnect
end
end
rescue => e
logger.warn "#{log_prefix}: Connection to Master server failed with '#{e.message}'"
rescue Redis::ConnectionError, Redis::CannotConnectError, RuntimeError => ex
raise ex if ex.class == RuntimeError && ex.message != "Name or service not known"
warn "Master not alive, error connecting"
return false
ensure
master_client&.disconnect
master_client.disconnect
end
success
unless info.include?(MASTER_LOADED_STATUS)
warn "Master not alive, status is loading"
return false
end
unless info.include?(MASTER_ROLE_STATUS)
warn "Master not alive, role != master"
return false
end
true
end
def master
synchronize { @master }
end
def fallback
warn "Killing connections to slave..."
def master=(args)
synchronize do
@master = args
slave_client = connect(@slave_config)
# Disables MessageBus keepalive when Redis is in readonly mode
MessageBus.keepalive_interval = 0 if !@master
begin
CONNECTION_TYPES.each do |connection_type|
slave_client.call([:client, [:kill, 'type', connection_type]])
end
rescue Redis::ConnectionError, Redis::CannotConnectError, RuntimeError => ex
raise ex if ex.class == RuntimeError && ex.message != "Name or service not known"
warn "Attempted a redis fallback, but connection to slave failed"
ensure
slave_client.disconnect
end
end
private
def synchronize
@mutex.synchronize { yield }
end
def logger
Rails.logger
def connect(config)
config = config.dup
config.delete(:connector)
::Redis::Client.new(config)
end
def log_prefix
"#{self.class}"
@log_prefix ||= begin
master_string = "#{@master_config[:host]}:#{@master_config[:port]}"
slave_string = "#{@slave_config[:host]}:#{@slave_config[:port]}"
"RedisStatus master=#{master_string} slave=#{slave_string}"
end
end
def warn(message)
Rails.logger.warn "#{log_prefix}: #{message}"
end
end
class FallbackHandler
def initialize(log_prefix, redis_status, execution)
@log_prefix = log_prefix
@redis_status = redis_status
@mutex = execution.new_mutex
@execution = execution
@master = true
@event_handlers = []
end
def add_callbacks(handler)
@mutex.synchronize do
@event_handlers << handler
end
end
def start_reset
@mutex.synchronize do
if @master
@master = false
trigger(:down)
true
else
false
end
end
end
def use_master?
master = @mutex.synchronize { @master }
if !master
false
elsif safe_master_alive?
true
else
if start_reset
@execution.spawn do
loop do
@execution.sleep 5
info "Checking connection to master"
if safe_master_alive?
@mutex.synchronize do
@master = true
@redis_status.fallback
trigger(:up)
end
break
end
end
end
end
false
end
end
private
attr_reader :log_prefix
def trigger(event)
@event_handlers.each do |handler|
begin
handler.public_send(event)
rescue Exception => e
Discourse.warn_exception(e, message: "Error running FallbackHandler callback")
end
end
end
def info(message)
Rails.logger.info "#{log_prefix}: #{message}"
end
def safe_master_alive?
begin
@redis_status.master_alive?
rescue Exception => e
Discourse.warn_exception(e, message: "Error running master_alive?")
false
end
end
end
class MessageBusFallbackCallbacks
def down
@keepalive_interval, MessageBus.keepalive_interval =
MessageBus.keepalive_interval, 0
end
def up
MessageBus.keepalive_interval = @keepalive_interval
end
end
class MainRedisReadOnlyCallbacks
def down
end
def up
Discourse.clear_readonly!
Discourse.request_refresh!
end
end
class FallbackHandlers
include Singleton
def initialize
@mutex = Mutex.new
@fallback_handlers = {}
end
def handler_for(config)
config = config.dup.freeze unless config.frozen?
@mutex.synchronize do
@fallback_handlers[[config[:host], config[:port]]] ||= begin
log_prefix = "FallbackHandler #{config[:host]}:#{config[:port]}"
slave_config = DiscourseRedis.slave_config(config)
redis_status = RedisStatus.new(config, slave_config)
handler =
FallbackHandler.new(
log_prefix,
redis_status,
Concurrency::ThreadedExecution.new
)
if config == GlobalSetting.redis_config
handler.add_callbacks(MainRedisReadOnlyCallbacks.new)
end
if config == GlobalSetting.message_bus_redis_config
handler.add_callbacks(MessageBusFallbackCallbacks.new)
end
handler
end
end
end
def self.handler_for(config)
instance.handler_for(config)
end
end
class Connector < Redis::Client::Connector
def initialize(options)
options = options.dup.freeze unless options.frozen?
super(options)
@slave_options = DiscourseRedis.slave_config(options)
@fallback_handler = DiscourseRedis::FallbackHandler.instance
@slave_options = DiscourseRedis.slave_config(options).freeze
@fallback_handler = DiscourseRedis::FallbackHandlers.handler_for(options)
end
def resolve(client = nil)
if !@fallback_handler.master
@fallback_handler.verify_master
return @slave_options
end
begin
options = @options.dup
options.delete(:connector)
client ||= Redis::Client.new(options)
loading = client.call([:info, :persistence]).include?(
DiscourseRedis::FallbackHandler::MASTER_LOADING_STATUS
)
loading ? @slave_options : @options
rescue Redis::ConnectionError, Redis::CannotConnectError, RuntimeError => ex
raise ex if ex.class == RuntimeError && ex.message != "Name or service not known"
@fallback_handler.master = false
@fallback_handler.verify_master
raise ex
ensure
client.disconnect
def resolve
if @fallback_handler.use_master?
@options
else
@slave_options
end
end
end
@ -159,10 +268,6 @@ class DiscourseRedis
@namespace = namespace
end
def self.fallback_handler
@fallback_handler ||= DiscourseRedis::FallbackHandler.instance
end
def without_namespace
# Only use this if you want to store and fetch data that's shared between sites
@redis
@ -176,7 +281,6 @@ class DiscourseRedis
STDERR.puts "WARN: Redis is in a readonly state. Performed a noop"
end
fallback_handler.verify_master if !fallback_handler.master
Discourse.received_redis_readonly!
nil
else
@ -302,5 +406,4 @@ class DiscourseRedis
def remove_namespace(key)
key[(namespace.length + 1)..-1]
end
end