Commit Graph

108 Commits

Author SHA1 Message Date
70fc39211b FIX: topic embed blank tags or passed with nil do not blank out existing topic tags (#27699)
When a topic embed is run with either no tags argument or a nil tag argument
this should not affect any existing tags.

Only update topic tags when tags argument is explicitly empty.
2024-07-03 14:50:59 -07:00
099cf71bcc FIX: Topic embedding importer should accept string tags (#27624)
* FIX: Embedding importer should accept string tags
2024-06-26 12:34:55 -04:00
489aac3fdd FIX: Disallow table cells to be weighted actual articles can be main content (#27508)
For Topic Embeds, we would prefer <article> to be the main article in a topic, rather than a table cell <td> with potentially a lot of data. However, in an example URL like here, the table cell (the very large code snippet) is seen as the Topic Embed's article due to the determined content weight by the Readability library we use.

In the newly released 0.7.1 cantino/ruby-readability#94, the library has a new option to exclude the library's default <td> element into content weighting. This is more in line with the original library where they only weighted <p>. So this PR excludes the td, as seen in the tests, to allow the actual article to be seen as the article. This PR also adds the details tag into the allow-list.
2024-06-19 09:50:49 +08:00
766231b102 FIX: Prevent crash importing topics on a tagged embeddable host (#27254) 2024-05-30 12:04:36 -03:00
2a28cda15c DEV: Update to lastest rubocop-discourse 2024-05-27 18:06:14 +02:00
63b7a36fac FEATURE: Extend embeddable hosts with Individual tags and author assignments (#26868)
* FEATURE: Extend embeddable hosts with tags and author assignments
2024-05-16 15:47:01 -04:00
175d2451a9 DEV: Add topic_embed_import_create_args plugin modifier (#26527)
* DEV: Add `topic_embed_import_create_args` plugin modifier

This modifier allows a plugin to change the arguments used when creating
 a new topic for an imported article.

For example: let's say you want to prepend "Imported: " to the title of
every imported topic. You could use this modifier like so:

```ruby
# In your plugin's code
plugin.register_modifier(:topic_embed_import_create_args) do |args|
  args[:title] = "Imported: #{args[:title]}"
  args
end
```

In this example, the modifier is prepending "Imported: " to the `title` in the `create_args` hash. This modified title would then be used when the new topic is created.
2024-04-05 12:37:53 -03:00
7dc552c9cc DEV: Add import_embed_unlisted site setting (#26222) 2024-03-27 08:57:43 -04:00
13735f35fb FEATURE: Cache embed contents in the database (#25133)
* FEATURE: Cache embed contents in the database

This will be useful for features that rely on the semantic content of topics, like the many AI features



Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
2024-01-05 10:09:31 -03:00
95c61b88dc Apply embed unlisted setting consistently (#24294)
Applies the embed_unlisted site setting consistently across topic embeds, including those created via the WP Discourse plugin. Relatedly, adds a embed exception to can_create_unlisted_topic? check. Users creating embedded topics are not always staff.
2023-12-12 09:35:26 -05:00
5f20748e40 SECURITY: SSRF vulnerability in TopicEmbed
Block redirects when making the final request in TopicEmbed to prevent Server Side Request Forgery (SSRF)
2023-11-09 13:39:08 +11:00
644dded000 SECURITY: Use canonical url for topic embeddings (#22085)
This prevents duplicate topics from being created when using embed_urls
that only differ on query params.
2023-06-13 11:08:08 -06:00
Sam
2ccc5fc66e FEATURE: add support for figure and figcaption tags in embeddings (#21276)
Many blog posts use these to illustrate and images were previously omitted

Additionally strip superfluous HTML and BODY tags from embed HTML.

This was incorrectly returned from server.
2023-04-27 19:57:06 +10:00
f3f30d6865 SECURITY: Encode embed url (#21133)
The embed_url in "This is a companion discussion..." could be used for
XSS.

Co-authored-by: Blake Erickson <o.blakeerickson@gmail.com>
2023-04-18 15:05:29 +08:00
ccb345bd88 FEATURE: Update topic/comment embedding parameters (#20181)
This commit implements many changes to topic and comments embedding. It
deprecates the class_name field from EmbeddableHost and suggests using
the className parameter. discourse_username parameter has been
deprecated and it will fetch it from embedded site from the author or
discourse-username meta.

See the updated code sample from Admin > Customize > Embedding page.

* FEATURE: Add className parameter for Discourse embed

* DEV: Hide class_name from EmbeddableHost

* DEV: Deprecate class_name field of EmbeddableHost

* FEATURE: Use either author or discourse-username meta tag

* DEV: Deprecate discourse_username parameter

* DEV: Improve embed code sample
2023-02-28 14:31:59 +02:00
25a226279a DEV: Replace #pluck_first freedom patch with AR #pick in core (#19893)
The #pluck_first freedom patch, first introduced by @danielwaterworth has served us well, and is used widely throughout both core and plugins. It seems to have been a common enough use case that Rails 6 introduced it's own method #pick with the exact same implementation. This allows us to retire the freedom patch and switch over to the built-in ActiveRecord method.

There is no replacement for #pluck_first!, but a quick search shows we are using this in a very limited capacity, and in some cases incorrectly (by assuming a nil return rather than an exception), which can quite easily be replaced with #pick plus some extra handling.
2023-02-13 12:39:45 +08:00
666536cbd1 DEV: Prefer \A and \z over ^ and $ in regexes (#19936) 2023-01-20 12:52:49 -06:00
5a003715d3 DEV: Apply syntax_tree formatting to app/* 2023-01-09 14:14:59 +00:00
3c81683955 DEV: Rename UriHelper.escape_uri to .normalized_encode
This is a much better description of its function. It performs idempotent normalization of a URL. If consumers truly need to `encode` a URL (including double-encoding of existing encoded entities), they can use the existing `.encode` method.
2022-08-09 11:55:25 +01:00
f3b2ee8e1b FIX: Use default locale for footer of embedded topics (#17760)
The content from the remote site and the footer get cached for 10 minutes, so Discourse should use the default locale instead of the user locale for the footer. Otherwise Discourse might cache the message in a different language.
2022-08-02 20:49:28 +02:00
357011eb3b DEV: Clean up freedom patches
This patch removes some of our freedom patches that have been deprecated
for some time now.
Some of them have been updated so we’re not shipping code based on an
old version of Rails.
2022-04-06 10:07:14 +02:00
e4e37257cc FIX: Handle malformed URLs in TopicEmbed.absolutize_urls. 2022-01-21 11:18:54 +08:00
cc1b45f58b FIX: Convert URLs embedded topics to absolute form (#14975)
Sometimes the expanded post contained broken relative URLs because they
were not converted to their absolute form.
2021-11-17 16:39:49 +11:00
69f0f48dc0 DEV: Fix rubocop issues (#14715) 2021-10-27 11:39:28 +03:00
2944d2cdd6 FIX: Ruby 3 does not freeze interpolated string (#14567)
Ruby 2.7 or earlier `+contents` returns self.dup
when `frozen_string_literal: true`. However, Ruby 3.0 returns self
because this string is interpolated one, which is not frozen anymore.

This commit uses self.dup to return duplicated string regardless Ruby
versions.
https://bugs.ruby-lang.org/issues/17104
2021-10-11 13:20:18 +11:00
4c2d5158c5 FIX: Follow the canonical URL when importing a remote topic. (#14489)
FinalDestination now supports the `follow_canonical` option, which will perform an initial GET request, parse the canonical link if present, and perform a HEAD request to it.

We use this mode during embeds to avoid treating URLs with different query parameters as different topics.
2021-10-01 12:48:21 -03:00
2e0992c757 DEV: Allow TopicEmbed.import to optionally receive a list of tags (#14301)
This will be used by the rss-polling plugin
2021-09-13 17:01:59 -03:00
ef0471d7ae DEV: Allow passing cook_method to TopicEmbed.import to override default (#14209)
DEV: Allow passing cook_method to TopicEmbed.import to override default

This will be used in the rss-polling plugin when we want to have
oneboxes on feed content, like youtube for example.
2021-09-01 15:46:39 -03:00
560c13211a DEV: Allow passing a category parameter when importing a topic (#14069)
This will be used in the rss pooling plugin to address the feature
request at https://meta.discourse.org/t/-/200644?u=falco
2021-08-17 18:17:07 -03:00
59097b207f DEV: Correct typos and spelling mistakes (#12812)
Over the years we accrued many spelling mistakes in the code base. 

This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change 

- comments
- test descriptions
- other low risk areas
2021-05-21 11:43:47 +10:00
ed818a4a19 FIX: prevents malformed href to crash TopicEmbed (#12910)
If the associated page of a remote url passed to `TopicEmber.new(remote_url)` contained a malformed link like: `<a href="(http://foo.bar)">Baz</a>` it would raise an uncaught exception:

```
Job exception: Invalid scheme format: (http
```
2021-04-30 11:10:19 +02:00
27386ba714 FIX: Kernel.open is deprecated (#12387) 2021-03-12 12:42:32 -06:00
3d9eb3e085 Rename options passed to Readability::Document back to whitelist and blacklist (#10340) 2020-07-30 12:56:48 -07:00
e0d9232259 FIX: use allowlist and blocklist terminology (#10209)
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
a41476800b FIX: Don't raise an exception if a topic cannot be retrieved (#9906) 2020-05-28 11:59:20 -03:00
da839e6d26 SECURITY: Use FinalDestination for topic embeds 2020-05-27 09:26:09 -06:00
d9a02d1336 Revert "Revert "Merge branch 'master' of https://github.com/discourse/discourse""
This reverts commit 20780a1eeed56b321daf18ee6bbfe681a51d1bf4.

* SECURITY: re-adds accidentally reverted commit:
  03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
  instead of the 03d26cd6 parent (which contains security fixes)
2020-05-23 00:56:13 -04:00
20780a1eee Revert "Merge branch 'master' of https://github.com/discourse/discourse"
This reverts commit e62a85cf6fd81a2a34aff6144bd36b9ac459964a, reversing
changes made to 2660c2e21d84bea667e1ea339f91cda352328062.
2020-05-22 20:25:56 -07:00
03d26cd6f0 SECURITY: ensure embed_url contains valid http(s) uri 2020-05-22 14:54:56 -06:00
9bff0882c3 FEATURE: Nokogumbo (#9577)
* FEATURE: Nokogumbo

Use Nokogumbo HTML parser.
2020-05-05 13:46:57 +10:00
98729c8e6e FIX: Allow embed updates of just the title 2020-04-20 14:31:24 -04:00
56a23c68f1 FIX: Embedded topics couldn't update their titles 2020-04-20 14:27:43 -04:00
b6b92a562c FEATURE: New site setting embed_unlisted (#9391)
If enabled, posts imported to discourse via embeddings will default to
unlisted until they receive a reply.
2020-04-13 15:17:02 -04:00
55b8620b43 FIX: TopicEmbed#absolutize_urls was trying to modify a frozen string 2020-03-25 12:57:54 -03:00
99fd65328c FIX: Skip absolutizing URLs when source URI is invalid 2020-02-07 10:54:24 -05:00
0fb497eb23 DEV: use Discourse.cache over Rails.cache
Discourse.cache is a more consistent method to use and offers clean fallback
if you are skipping redis

This is part of a larger change that both optimizes Discoruse.cache and omits
use of setex on $redis in favor of consistently using discourse cache

Bench does reveal that use of Rails.cache and Discourse.cache is 1.25x slower
than redis.setex / get so a re-implementation will follow prior to porting
2019-11-27 12:36:19 +11:00
55a1394342 DEV: pluck_first
Doing .pluck(:column).first is a very common pattern in Discourse and in
most cases, a limit cause isn't being added. Instead of adding a limit
clause to all these callsites, this commit adds two new methods to
ActiveRecord::Relation:

pluck_first, equivalent to limit(1).pluck(*columns).first

and pluck_first! which, like other finder methods, raises an exception
when no record is found
2019-10-21 12:08:20 +01:00
427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
d01c938e1a Revert "FIX: Use #dup instead of #+@ since content could be an instance of Nokogiri::XML::Element."
This reverts commit 50afe59306ec41a4fd74b01ca001734840adca8d.
2019-08-09 11:35:22 -03:00
50afe59306 FIX: Use #dup instead of #+@ since content could be an instance of Nokogiri::XML::Element. 2019-08-09 11:13:09 -03:00