FEATURE: Add experimental tracking of 'real browser' pageviews (#26647)

Our 'page_view_crawler' / 'page_view_anon' metrics are based purely on the User Agent sent by clients. This means that 'badly behaved' bots which are imitating real user agents are counted towards 'anon' page views.

This commit introduces a new method of tracking visitors. When an initial HTML request is made, we assume it is a 'non-browser' request (i.e. a bot). Then, once the JS application has booted, we notify the server to count it as a 'browser' request. This reliance on a JavaScript-capable browser matches up more closely to dedicated analytics systems like Google Analytics.

Existing data collection and graphs are unchanged. Data collected via the new technique is available in a new 'experimental' report.
This commit is contained in:
David Taylor
2024-04-25 11:00:01 +01:00
committed by GitHub
parent 52e8d57293
commit 2f2da72747
16 changed files with 412 additions and 7 deletions

View File

@ -66,6 +66,7 @@ RSpec.describe Middleware::RequestTracker do
CachedCounting.flush
expect(ApplicationRequest.page_view_anon.first.count).to eq(2)
expect(ApplicationRequest.page_view_anon_browser.first.count).to eq(2)
end
it "can log requests correctly" do
@ -119,6 +120,23 @@ RSpec.describe Middleware::RequestTracker do
expect(ApplicationRequest.page_view_anon_mobile.first.count).to eq(1)
expect(ApplicationRequest.page_view_crawler.first.count).to eq(1)
expect(ApplicationRequest.page_view_anon_browser.first.count).to eq(1)
end
it "logs deferred pageviews correctly" do
data =
Middleware::RequestTracker.get_data(
env(:path => "/message-bus/abcde/poll", "HTTP_DISCOURSE_DEFERRED_TRACK_VIEW" => "1"),
["200", { "Content-Type" => "text/html" }],
0.1,
)
Middleware::RequestTracker.log_request(data)
expect(data[:deferred_track]).to eq(true)
CachedCounting.flush
expect(ApplicationRequest.page_view_anon_browser.first.count).to eq(1)
end
it "logs API requests correctly" do