Commit Graph

169 Commits

Author SHA1 Message Date
7fd63b34b1 DEV: Make it obvious that joined translation is used by onebox (#20158)
This also moves the date as interpolation key into the string which makes translation easier.
2023-02-03 10:02:14 +08:00
aecd8b1eff FIX: Add support for multiple TikTok aspect ratios (#20064) 2023-01-30 18:12:01 -03:00
d0c820e816 FEATURE: Add better TikTok onebox support (#19934) 2023-01-23 09:49:02 -03:00
666536cbd1 DEV: Prefer \A and \z over ^ and $ in regexes (#19936) 2023-01-20 12:52:49 -06:00
14d97f9cf1 FEATURE: Show more context in Discourse topic oneboxes
Currently when generating a onebox for Discourse topics, some important
context is missing such as categories and tags.

This patch addresses this issue by introducing a new onebox engine
dedicated to display this information when available. Indeed to get this
new information, categories and tags are exposed in the topic metadata
as opengraph tags.
2023-01-11 14:22:53 +01:00
6417173082 DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
e6439e89cf FEATURE: Onebox for Embed Motoko (#19293) 2022-12-16 09:59:40 -05:00
d247e5d37c FEATURE: Youtube Short onebox support (#19335)
* FEATURE: Youtube Shorts onebox support

Co-authored-by: Canapin <canapin@gmail.com>
2022-12-06 11:56:48 -03:00
68b4fe4cf8 SECURITY: Expand and improve SSRF Protections (#18815)
See https://github.com/discourse/discourse/security/advisories/GHSA-rcc5-28r3-23rr

Co-authored-by: OsamaSayegh <asooomaasoooma90@gmail.com>
Co-authored-by: Daniel Waterworth <me@danielwaterworth.com>
2022-11-01 16:33:17 +00:00
266e165885 FIX: Use only first line from commit message (#18724)
Linking a commit from a GitHub pull request included the complete commit
message, instead of just the first line. The rest of the commit message
will be added to the body of the Onebox.
2022-10-24 22:26:48 +03:00
73e9875a1d FEATURE: Handle oneboxes for complex GitHub URLs (#18474)
GitHub PR URLs can link to a commit of the PR, a comment or a review
discussion.
2022-10-06 20:26:04 +03:00
2e00d4d024 DEV: Fix flaky twitter onebox behavior (#18141)
The order in which Onebox engines are loaded is not guaranteed. Occasionally during tests, the twitter engine would be loaded before the instagram engine, and cause the Instagram Onebox spec to fail due to the lack of `Onebox.options.twitter_client`.

This commit makes the load order of Onebox engines consistent, and fixes the issue in the twitter_status_onebox.
2022-08-31 08:42:55 +08:00
626d50c15c FIX: Disable Twitter onebox without API support (#17519)
Twitter removed OpenGraph tags from their pages. We can no longer
extract all the information (for example, the quoted tweet) we need
to render Oneboxes without using their API.
2022-08-17 18:32:48 +03:00
e7f04a8674 FIX: Use URI#merge to merge base and relative URLs (#17454)
The old implementation did not handle all cases, such as the case when
`src` is a relative URL that starts with `..`.
2022-07-18 14:17:54 +03:00
d0a4bc636f FIX: Vimeo regex pattern (#17277)
Vimeo has two url structure:

- Normal /video_id

- Private/Unlisted /video_id/hash_string

This changes change the regex pattern thus it would be able to

catch both. Also it tolerate trailing slash.

This shall fixes:

https://meta.discourse.org/t/vimeo-embed-urls-parsed-incorrectly-in-email/231042
2022-06-30 13:13:25 -03:00
f130ec35d9 FEATURE: Use full post width for Vimeo embeds (#17289) 2022-06-30 13:08:24 -03:00
9874fe3fb3 FIX: Improve mixcloud oneboxing (#17237)
- Sets `https://www.mixcloud.com` as a `requires_iframe_origins` to allow the iframe content to be displayed
- Attempts to render something approximating the Mixcloud content in the preview pane of the Composer, rather than just displaying a large version of the artwork associated with the link
2022-06-27 08:32:24 +10:00
3baefa25b5 FIX: Use first supported type item when JSON-LD returns array (#17217) 2022-06-23 13:02:01 -04:00
dfba188bd6 UX: remove extra whitespace in github onebox (#17111) 2022-06-17 11:09:45 +10:00
f723b4c322 FIX: Handle sites with more than 1 JSON-LD element (#17095)
A followup to #17007
2022-06-15 02:55:55 +02:00
f0c6dd5682 Add support for JSON LD in Onebox (#17007)
* FIX: Fix a bug that is accessing the values in a hash wrongly and write tests

I decided to write tests in order to be confident in my refactor that's in the next commit.
Meanwhile I have discovered a potential bug. The `title_attr` key was accessed as a string,
but all the keys are actually symbols so it was never evaluated to be true.

irb(main):025:0> d = {key: 'value'}
=> {:key=>"value"}
irb(main):026:0> d['key']
=> nil
irb(main):027:0> d[:key]
=> "value"

* DEV: Extract methods for readability

I will be adding a new method following the conventions in place for adding a new normalizer. And this will make the readability of the `raw` block even more difficult; so I am extracting self contained private methods beforehand.

* FEATURE: Parse JSON-LD and introduce Movie object

JSON LD data is very easily transferable to Ruby objects because they contain types. If these types are mapped to Ruby objects, it is also better to make all the parsed data very explicit and easily extendable.

JSON-LD has many more standardized item types, with a full list here: https://schema.org/docs/full.html
However in order to decrease the scope, I only adapted the movie type.

* DEV: Change inheritance between normalizers

Normalizers are not supposed to have an inheritance relationships amongst each other. They are all normalizers, but all normalizing separate protocols. This is why I chose to extract a parent class and relieve Open Graph off that responsibility. Removing the parent class altogether could also a possibility, but I am keeping the scope limited to having a more accurate representation of the normalizers while making it easier to add a new one.

* Lint changes

* Bring back the Oembed OpenGraph inheritance

There is one test that caught that this inheritance was necessary. I still think modelling wise this inheritance shouldn't exist, but this can be tackled separately.

* Return empty hash if the json received is invalid

Before this change if there was a parsing error with JSON it would throw an exception. The goal of this commit is to rescue that exception and then log a warning. I chose to use Discourse's logger wrapper `warn_exception` to have the backtrace and not just used Rails logger. I considered raising an `InvalidParameters` error however if the JSON here is invalid it should not block showing of the Onebox, so logging is enough.

* Prep to support more JSONLD schema types with case

* Extract mustache template object created from JSONLD
2022-06-13 17:32:34 +02:00
99b0578b4c FIX: escape youtube title when constructing onebox preview html (#16999) 2022-06-08 13:42:37 +08:00
8fe3934856 UX: Make YouTube playlist onebox full width to match video onebox (#16936) 2022-05-27 10:39:12 +01:00
46176b7dd7 DEV: Don’t patch Sanitize::Config
Currently we’re reopening the `Sanitize::Config` class (which is part of
the `sanitize` gem) to put our custom config for Onebox in it. This is
unnecessary as we can simply create a dedicated module to hold our
custom configuration.
2022-04-06 17:10:51 +02:00
ff93833fdf UX: Use committed date for GitHub oneboxes (#16318)
Our copy says 'committed {date}`, but we were previously using the commit's authored date
2022-03-30 09:16:28 +08:00
528c3e311a FIX: Only display the first listed price (#16138)
Multiple prices may be returned by Amazon (e.g. for new, and also for used). We should only display the first price.
2022-03-08 15:24:45 -05:00
fc30669db2 FIX: Support new layout on Amazon product pages (#16091)
Some product pages on Amazon are using a new HTML structure, meaning the previous Onebox engine was unable to gather the price and/or description. This change should allow these pages to be Oneboxed.
2022-03-04 18:31:53 -05:00
2fc70c5572 DEV: Correctly tag heredocs (#16061)
This allows text editors to use correct syntax coloring for the heredoc sections.

Heredoc tag names we use:

languages: SQL, JS, RUBY, LUA, HTML, CSS, SCSS, SH, HBS, XML, YAML/YML, MF, ICS
other: MD, TEXT/TXT, RAW, EMAIL
2022-02-28 20:50:55 +01:00
7afe768d60 DEV: Add tests for wistia onebox. (#15860)
Follow-up to 4ef56b0ca4d997e5d0b331c488c83b7f574721df
2022-02-08 13:04:32 +08:00
4ef56b0ca4 FIX: Explicitly set allowfullscreen on Wistia Oneboxes (#15828) 2022-02-08 13:02:32 +11:00
5b5cbbfe5c FEATURE: Onebox for news.ycombinator.com (#15781) 2022-02-03 13:39:21 -03:00
aac9f43038 Only block domains at the final destination (#15689)
In an earlier PR, we decided that we only want to block a domain if 
the blocked domain in the SiteSetting is the final destination (/t/59305). That 
PR used `FinalDestination#get`. `resolve` however is used several places
 but blocks domains along the redirect chain when certain options are provided.

This commit changes the default options for `resolve` to not do that. Existing
users of `FinalDestination#resolve` are
- `Oneboxer#external_onebox`
- our onebox helper `fetch_html_doc`, which is used in amazon, standard embed 
and youtube
  - these folks already go through `Oneboxer#external_onebox` which already
  blocks correctly
2022-01-31 15:35:12 +08:00
847c77de65 FIX: Add another method to check binary file (#15648)
This method looks for a NULL byte that is not usually contained in text
files. Follow up to 376799b1a49306be500be3419a327af8b03819ec.
2022-01-20 23:47:18 +02:00
376799b1a4 FIX: Hide excerpt of binary files in GitHub onebox (#15639)
Oneboxer did not know if a file is binary or not and always tried to
show an excerpt of the file.
2022-01-19 14:45:36 +02:00
2909b8b820 FIX: origins_to_regexes should always return an array (#15589)
If the SiteSetting `allowed_onebox_iframes` contains a value of `*`, it will use the values of `all_iframe_origins` during the Oneboxing process. If `all_iframe_origins` itself contains a value of `*`, `origins_to_regexes` will try to return a "catch-all" regex.

Other code assumes `origins_to_regexes`will return an array, so this change ensures the `*` case will return an array containing only the catch-all regex.
2022-01-17 12:48:41 -05:00
31b27b3712 FIX: Broken GitHub folder onebox logic (#15612)
1. `html_doc.css('.Box.md')` always returns a truthy value (e.g. `[]`) so the second branch of the if-elsif never ran
2. `node&.css('text()')` was invalid code that would raise an error
3. Matching on h3 elements is no longer correct with the current html structure returned by GitHub
2022-01-17 18:32:07 +01:00
f56eff2303 FIX: limits pre-line impact to tweet text (#15583) 2022-01-14 10:44:21 +01:00
6e925fee6f FIX: Use basic meta description if other description tags are missing (#15356)
When attempting to Onebox a page if there is no `meta property="og:description"` tag but there is a  `meta name="description"` tag, Onebox should try to use that value.
2021-12-17 19:36:54 -05:00
4c46c7e334 DEV: Remove xlink hrefs (#15059) 2021-11-25 15:22:43 +11:00
aec125b617 FIX: Display Instagram Oneboxes in an iframe (#14789)
We are no longer able to display the image returned by Instagram directly within a Discourse site (either in the composer, or within a cooked post within a topic), so:

- Display an image placeholder in the composer preview
- A cooked post should use an iframe to display the Instagram 'embed' content
2021-11-02 14:34:51 -04:00
69f0f48dc0 DEV: Fix rubocop issues (#14715) 2021-10-27 11:39:28 +03:00
3fbfec06fc Update replit onebox to accept .com 2021-10-19 16:37:33 -04:00
ba81d1853b FIX: Disable previews if diffhtml is enabled (#14537)
diffhtml should not rerender video and audio elements so there is no
point in having these.
2021-10-08 15:57:08 +03:00
fbe9cd49b6 FIX: Vimeo private video oneboxes were broken (#14510) 2021-10-05 15:46:58 +05:30
02a6b991fe FIX: Correct the play icon position (#14295) 2021-09-09 15:10:32 +02:00
11a07b37e1 FIX: ignore canonical link for medium.com oneboxes (#14278)
https://meta.discourse.org/t/bug-in-onebox-link-being-rendered-as-a-gist-when-it-isnt/202463
2021-09-08 20:19:57 +05:30
d27d7c8cca FIX: Unescapes hash section with present to account for url-encoded chars
Sections with unreserverd characters will appear url-encoded and need to
be unescaped before using it.

Wikipedia generates 2 different spans in this case in the same page, one
with an id resulting of replacing the % symbols with . and the other with
the decoded version of the string. For example, for /wiki/foo#A%C3%A1A it
will generate:

<span id="A.C3.A1A"></span>
<span id="AáA">AáA</span>

Unescaping the `m_url_hash_name` should work in all cases to target the
proper section span.
2021-08-12 10:43:50 -04:00
bb2c48b065 FIX: update iframe url for simplecast onebox (#13957)
https://meta.discourse.org/t/onebox-regression-simplecast-com/187911
2021-08-05 18:29:04 +05:30
a341dba5d9 FIX: update oEmbed URL for simplecast onebox (#13956) 2021-08-05 17:42:38 +05:30
2f28ba318c FEATURE: Onebox can match engines based on the content_type (#13876)
* FEATURE: Onebox can match engines based on the content_type

`FinalDestination` now returns the `content_type` of a resolved URL.

`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.

`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.

This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.
2021-07-30 13:36:30 -04:00