Commit Graph

49 Commits

Author SHA1 Message Date
95a82d608d SECURITY: Prevent Onebox cache overflow by limiting downloads and URL lengths 2023-11-09 13:39:18 +11:00
d78357917c SECURITY: Onebox templates' HTML injections.
The use of triple-curlies on Mustache templates opens the possibility for HTML injections.
2023-11-09 13:39:11 +11:00
b2a5f5802a DEV: Replace custom Onebox symbolize_keys implementation with ActiveSupport (#23828)
We have a custom implementation of #symbolize_keys in our Onebox helpers. This is likely a legacy from when Onebox was a standalone gem. This change replaces all usages with either #deep_symbolize_keys from ActiveSupport, or appropriate option to the JSON parser gem used.
2023-10-09 09:32:09 +02:00
d10e9a6c1d FEATURE: Onebox and Download for WEBP and AVIF (#23235)
This adds support for oneboxing WEBP and AVIF images in posts and fixing
oneboxing fixes download remote images for those formats too.

Reported in https://meta.discourse.org/t/-/276433?u=falco
2023-08-24 16:44:06 -03:00
df7dab9dce FIX: ensures generic onebox has width/height for thumbnail (#23040)
Prior to this fix we would output an image with no width/height which would then bypass a large part of `CookedProcessorMixin` and have no aspect ratio. As a result, an image with no size would cause layout shift.

It also removes a fix for oneboxes in chat messages due to this case.
2023-08-09 20:31:11 +02:00
3eac47443f FEATURE: Add audio.com onebox provider (#22936)
* Audio.com provider added to onebox
* added specs for audio.com onebox provider
2023-08-08 16:55:04 +10:00
44a104dff8 FIX: Update "Embed Motoko" Onebox URLs (#22198)
Embed Motoko service's primary URL is transiting from embed.smartcontracts.org to embed.motoko.org, this PR updates the Onebox logic to work for either domain.
2023-07-26 09:41:01 +08:00
9e8010df8b DEV: Use thumbnail url for wikimedia onebox image (#22620)
Wikimedia provides a thumbnail url for its images, so we should use that
for oneboxes instead of the full-size image. Because the size of the
  onebox image we display is quite small anyways the thumbnail wikimedia
  provides should suffice and will save bandwidth.

See: https://meta.discourse.org/t/264039
2023-07-14 12:20:18 -06:00
3fd327c458 FEATURE: Basic support for threads.net onebox (#22471) 2023-07-06 16:02:49 -03:00
0f4beab0fb DEV: Update the rubocop-discourse gem
This enables cops related to RSpec `subject`.

See https://github.com/discourse/rubocop-discourse/pull/32
2023-06-26 11:41:52 +02:00
24c90534fb FIX: Use Twitter API v2 for oneboxes and restore OpenGraph fallback (#22187) 2023-06-22 14:39:02 -03:00
2b7c677a8c Fix tests 2023-05-15 16:45:33 +02:00
8b67a534a0 FIX: Allow floats for zoom level in Google Maps onebox
Sometimes we get Maps URL containing a zoom level as a float (17.5z and
not 17z) but this doesn’t work with our current onebox implementation.

While Google accepts those float zoom levels, it removes automatically
the floating part in the URL (thus when visiting a Maps URL containing
17.5z, the URL will be rewritten shortly after as 17z). When putting a
float zoom level in an embedded URL, this actually breaks (Maps API
returns a 400 error).

This patch addresses the issue by allowing the onebox engine to match on
a zoom level expressed as a float but we only keep the integer part thus
rendering properly maps.
2023-03-01 12:45:33 +01:00
14d97f9cf1 FEATURE: Show more context in Discourse topic oneboxes
Currently when generating a onebox for Discourse topics, some important
context is missing such as categories and tags.

This patch addresses this issue by introducing a new onebox engine
dedicated to display this information when available. Indeed to get this
new information, categories and tags are exposed in the topic metadata
as opengraph tags.
2023-01-11 14:22:53 +01:00
cb932d6ee1 DEV: Apply syntax_tree formatting to spec/* 2023-01-09 11:49:28 +00:00
e6439e89cf FEATURE: Onebox for Embed Motoko (#19293) 2022-12-16 09:59:40 -05:00
d247e5d37c FEATURE: Youtube Short onebox support (#19335)
* FEATURE: Youtube Shorts onebox support

Co-authored-by: Canapin <canapin@gmail.com>
2022-12-06 11:56:48 -03:00
266e165885 FIX: Use only first line from commit message (#18724)
Linking a commit from a GitHub pull request included the complete commit
message, instead of just the first line. The rest of the commit message
will be added to the body of the Onebox.
2022-10-24 22:26:48 +03:00
73e9875a1d FEATURE: Handle oneboxes for complex GitHub URLs (#18474)
GitHub PR URLs can link to a commit of the PR, a comment or a review
discussion.
2022-10-06 20:26:04 +03:00
08e63ddab2 DEV: Fix spec file name (#18227)
Match the impl file name
2022-09-12 14:03:23 +02:00
626d50c15c FIX: Disable Twitter onebox without API support (#17519)
Twitter removed OpenGraph tags from their pages. We can no longer
extract all the information (for example, the quoted tweet) we need
to render Oneboxes without using their API.
2022-08-17 18:32:48 +03:00
3eaac56797 DEV: Use proper wording for contexts in specs 2022-08-04 11:05:02 +02:00
493d437e79 Add RSpec 4 compatibility (#17652)
* Remove outdated option

04078317ba

* Use the non-globally exposed RSpec syntax

https://github.com/rspec/rspec-core/pull/2803

* Use the non-globally exposed RSpec syntax, cont

https://github.com/rspec/rspec-core/pull/2803

* Comply to strict predicate matchers

See:
 - https://github.com/rspec/rspec-expectations/pull/1195
 - https://github.com/rspec/rspec-expectations/pull/1196
 - https://github.com/rspec/rspec-expectations/pull/1277
2022-07-28 10:27:38 +08:00
f0c6dd5682 Add support for JSON LD in Onebox (#17007)
* FIX: Fix a bug that is accessing the values in a hash wrongly and write tests

I decided to write tests in order to be confident in my refactor that's in the next commit.
Meanwhile I have discovered a potential bug. The `title_attr` key was accessed as a string,
but all the keys are actually symbols so it was never evaluated to be true.

irb(main):025:0> d = {key: 'value'}
=> {:key=>"value"}
irb(main):026:0> d['key']
=> nil
irb(main):027:0> d[:key]
=> "value"

* DEV: Extract methods for readability

I will be adding a new method following the conventions in place for adding a new normalizer. And this will make the readability of the `raw` block even more difficult; so I am extracting self contained private methods beforehand.

* FEATURE: Parse JSON-LD and introduce Movie object

JSON LD data is very easily transferable to Ruby objects because they contain types. If these types are mapped to Ruby objects, it is also better to make all the parsed data very explicit and easily extendable.

JSON-LD has many more standardized item types, with a full list here: https://schema.org/docs/full.html
However in order to decrease the scope, I only adapted the movie type.

* DEV: Change inheritance between normalizers

Normalizers are not supposed to have an inheritance relationships amongst each other. They are all normalizers, but all normalizing separate protocols. This is why I chose to extract a parent class and relieve Open Graph off that responsibility. Removing the parent class altogether could also a possibility, but I am keeping the scope limited to having a more accurate representation of the normalizers while making it easier to add a new one.

* Lint changes

* Bring back the Oembed OpenGraph inheritance

There is one test that caught that this inheritance was necessary. I still think modelling wise this inheritance shouldn't exist, but this can be tackled separately.

* Return empty hash if the json received is invalid

Before this change if there was a parsing error with JSON it would throw an exception. The goal of this commit is to rescue that exception and then log a warning. I chose to use Discourse's logger wrapper `warn_exception` to have the backtrace and not just used Rails logger. I considered raising an `InvalidParameters` error however if the JSON here is invalid it should not block showing of the Onebox, so logging is enough.

* Prep to support more JSONLD schema types with case

* Extract mustache template object created from JSONLD
2022-06-13 17:32:34 +02:00
ff93833fdf UX: Use committed date for GitHub oneboxes (#16318)
Our copy says 'committed {date}`, but we were previously using the commit's authored date
2022-03-30 09:16:28 +08:00
fc30669db2 FIX: Support new layout on Amazon product pages (#16091)
Some product pages on Amazon are using a new HTML structure, meaning the previous Onebox engine was unable to gather the price and/or description. This change should allow these pages to be Oneboxed.
2022-03-04 18:31:53 -05:00
c9dab6fd08 DEV: Automatically require 'rails_helper' in all specs (#16077)
It's very easy to forget to add `require 'rails_helper'` at the top of every core/plugin spec file, and omissions can cause some very confusing/sporadic errors.

By setting this flag in `.rspec`, we can remove the need for `require 'rails_helper'` entirely.
2022-03-01 17:50:50 +00:00
7afe768d60 DEV: Add tests for wistia onebox. (#15860)
Follow-up to 4ef56b0ca4d997e5d0b331c488c83b7f574721df
2022-02-08 13:04:32 +08:00
5b5cbbfe5c FEATURE: Onebox for news.ycombinator.com (#15781) 2022-02-03 13:39:21 -03:00
376799b1a4 FIX: Hide excerpt of binary files in GitHub onebox (#15639)
Oneboxer did not know if a file is binary or not and always tried to
show an excerpt of the file.
2022-01-19 14:45:36 +02:00
31b27b3712 FIX: Broken GitHub folder onebox logic (#15612)
1. `html_doc.css('.Box.md')` always returns a truthy value (e.g. `[]`) so the second branch of the if-elsif never ran
2. `node&.css('text()')` was invalid code that would raise an error
3. Matching on h3 elements is no longer correct with the current html structure returned by GitHub
2022-01-17 18:32:07 +01:00
6e925fee6f FIX: Use basic meta description if other description tags are missing (#15356)
When attempting to Onebox a page if there is no `meta property="og:description"` tag but there is a  `meta name="description"` tag, Onebox should try to use that value.
2021-12-17 19:36:54 -05:00
aec125b617 FIX: Display Instagram Oneboxes in an iframe (#14789)
We are no longer able to display the image returned by Instagram directly within a Discourse site (either in the composer, or within a cooked post within a topic), so:

- Display an image placeholder in the composer preview
- A cooked post should use an iframe to display the Instagram 'embed' content
2021-11-02 14:34:51 -04:00
745b99edbf TEST: Adds test for urls with url-encoded section hash 2021-08-12 10:43:50 -04:00
6b8ee4d5ef TEST: Adds test for urls with section hash 2021-08-12 10:43:50 -04:00
2f28ba318c FEATURE: Onebox can match engines based on the content_type (#13876)
* FEATURE: Onebox can match engines based on the content_type

`FinalDestination` now returns the `content_type` of a resolved URL.

`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.

`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.

This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.
2021-07-30 13:36:30 -04:00
76a11e6dc9 DEV: fix test (missed a reference to master) 2021-07-19 12:47:45 -04:00
aa12d12c0b discourse/discourse change from 'master' to 'main': update fixture data 2021-07-19 11:46:15 -04:00
8b89787426 SECURITY: Sanitize YouTube Onebox data (#13748)
CVE-2021-32764
2021-07-15 19:31:50 +01:00
a64aea38b7 FIX: Don’t use user_generated images as avatar images in Oneboxed Twitter content (#13712)
By default, Twitter will return the URL for the avatar image of the tweet poster as the `og:image` value.

However, if the `user_generated` attribute is true, we should not use this as the avatar URL as this will be an URL of an image in the tweet itself (e.g., an image belonging to a tweeted news story).
2021-07-13 14:54:28 -04:00
05bdbd9f97 SECURITY: Onebox canonical links bypassing FinalDestination checks (#13605) 2021-07-01 20:09:29 +05:30
09bc95d46b FIX: Quoting Oneboxed content should exclude formatting (#13296)
* FIX: Quoting Oneboxed content should exclude formatting

When a post is quoted that includes Oneboxed content, we should not include the formatting generated by the Onebox. Rather, we should attempt to collapse the link referenced by the Onebox to a single line text link.

* DEV: fix tests
2021-06-07 13:03:53 -04:00
2e4f07678e FIX: IMDb links were being oneboxed as posters (#13310)
IMDb movie links were being rendered as posters. This was because
IMDb was sending `og:type` as `image` randomly in some cases. To
fix this we'll now default all IMDb links as article type. This will
ensure that the IMDb onebox link includes all the information instead
of showing just a poster without any context.
2021-06-07 18:45:59 +05:30
461a2c334b FIX: return an empty result if response from Amazon is missing expected attributes (#13173)
* FIX: return an empty result if response from Amazon is missing attributes

Check we have the basic attributes requires to construct a Onebox for Amazon.

This is an attempt to handle scenarios where we receive a valid 200-status response from an Amazon request that does not include the data we’re expecting.

* Update lib/onebox/engine/amazon_onebox.rb

Co-authored-by: Régis Hanol <regis@hanol.fr>

Co-authored-by: Régis Hanol <regis@hanol.fr>
2021-06-01 16:23:18 -04:00
3df928d609 DEV: Fix flaky specs (#13226)
Some specs failed when `LOAD_PLUGINS=1` was set while migrating the test DB and the narrative-bot plugin disabled the `send_welcome_message` site setting.
2021-06-01 14:38:55 +02:00
06e1af2b1d FIX: Giphy oneboxing when the response is an image (#13199) 2021-05-28 15:10:32 -04:00
47e09700fe FIX: Support pausing GIFs for giphy/tenor oneboxes (#13194) 2021-05-28 08:40:30 -04:00
723d7de18c Various GitHub Onebox improvements (#13163)
* FIX: Improve GitHub folder regexp in Onebox

It used to match any GitHub URL that was not matched by the other GitHub
Oneboxes and it did not do a good job at handling those. With this
change, the generic Onebox will handle the remaining URLs.

* FEATURE: Add Onebox for GitHub Actions

* FEATURE: Add Onebox for PR check runs

* FIX: Remove image from GitHub folder Oneboxes

It is a generic, auto-generated image which does not provide any value.

* DEV: Add tests

* FIX: Strip HTML comments from PR body
2021-05-27 12:38:42 +03:00
283b08d45f DEV: Absorb onebox gem into core (#12979)
* Move onebox gem in core library

* Update template file path

* Remove warning for onebox gem caching

* Remove onebox version file

* Remove onebox gem

* Add sanitize gem

* Require onebox library in lazy-yt plugin

* Remove onebox web specific code

This code was used in standalone onebox Sinatra application

* Merge Discourse specific AllowlistedGenericOnebox engine in core

* Fix onebox engine filenames to match class name casing

* Move onebox specs from gem into core

* DEV: Rename `response` helper to `onebox_response`

Fixes a naming collision.

* Require rails_helper

* Don't use `before/after(:all)`

* Whitespace

* Remove fakeweb

* Remove poor unit tests

* DEV: Re-add fakeweb, plugins are using it

* Move onebox helpers

* Stub Instagram API

* FIX: Follow additional redirect status codes (#476)

Don’t throw errors if we encounter 303, 307 or 308 HTTP status codes in responses

* Remove an empty file

* DEV: Update the license file

Using the copy from https://choosealicense.com/licenses/gpl-2.0/#

Hopefully this will enable GitHub to show the license UI?

* DEV: Update embedded copyrights

* DEV: Add Onebox copyright notice

* DEV: Add MIT license, convert COPYRIGHT.txt to md

* DEV: Remove an incorrect copyright claim

Co-authored-by: Jarek Radosz <jradosz@gmail.com>
Co-authored-by: jbrw <jamie@goatforce5.org>
2021-05-26 15:11:35 +05:30