FEATURE: Site setting for blocking onebox of URLs that redirect (#16881)

Meta topic: https://meta.discourse.org/t/prevent-to-linkify-when-there-is-a-redirect/226964/2?u=osama.

This commit adds a new site setting `block_onebox_on_redirect` (default off) for blocking oneboxes (full and inline) of URLs that redirect. Note that an initial http → https redirect is still allowed if the redirect location is identical to the source (minus the scheme of course). For example, if a user includes a link to `http://example.com/page` and the link resolves to `https://example.com/page`, then the link will onebox (assuming it can be oneboxed) even if the setting is enabled. The reason for this is a user may type out a URL (i.e. the URL is short and memorizable) with http and since a lot of sites support TLS with http traffic automatically redirected to https, so we should still allow the URL to onebox.
This commit is contained in:
Osama Sayegh
2022-05-23 13:52:06 +03:00
committed by GitHub
parent a03ae9b323
commit d15867463f
8 changed files with 206 additions and 21 deletions

View File

@ -3,8 +3,12 @@
module RetrieveTitle
CRAWL_TIMEOUT = 1
def self.crawl(url)
fetch_title(url)
def self.crawl(url, max_redirects: nil, initial_https_redirect_ignore_limit: false)
fetch_title(
url,
max_redirects: max_redirects,
initial_https_redirect_ignore_limit: initial_https_redirect_ignore_limit
)
rescue Exception => ex
raise if Rails.env.test?
Rails.logger.error(ex)
@ -53,8 +57,14 @@ module RetrieveTitle
end
# Fetch the beginning of a HTML document at a url
def self.fetch_title(url)
fd = FinalDestination.new(url, timeout: CRAWL_TIMEOUT, stop_at_blocked_pages: true)
def self.fetch_title(url, max_redirects: nil, initial_https_redirect_ignore_limit: false)
fd = FinalDestination.new(
url,
timeout: CRAWL_TIMEOUT,
stop_at_blocked_pages: true,
max_redirects: max_redirects,
initial_https_redirect_ignore_limit: initial_https_redirect_ignore_limit
)
current = nil
title = nil