Commit Graph

641 Commits

Author SHA1 Message Date
24532653e6 FIX: A typo bug in an import script (#24553) 2023-11-25 18:10:42 +01:00
8a5d97ef3f DEV: Update importers from PostUpload to UploadReference (#23681)
Discourse stopped using PostUpload in 9db8f00b3d. Since then, these importers have been writing to the table, but any data was totally unused. This commit updates the easy cases to use UploadReference, and adds an error to the discourse_merger import script, which needs more significant work.
2023-09-27 15:01:04 +01:00
94649565ce DEV: Correct Style/RedundantReturn rubocop issues (#23052) 2023-08-10 02:03:38 +02:00
26a8e7da28 DEV: Remove redundant case in import script (#22882) 2023-07-31 14:52:06 -04:00
c90e399003 FIX: user_id arg override in Slack import (#22713)
Fixes `user_id` arg override during mention parsing.
2023-07-20 11:22:43 +00:00
e09ce99884 DEV: Slack import script (#22386)
It's very simple import script and currently imports only the following content:
* Users
* Messages as Discourse topics/posts
* Attachments

Each channel can be mapped to a category and tags. It uses regular expressions to convert formatted messages ("rich text") into Markdown used by Discourse. In the future we could convert the `blocks` attribute from each message into Markdown instead of applying regular expressions on the `text` attribute.
2023-07-04 21:37:45 +02:00
c83914e2e5 FIX: fix normalize_raw method for nil inputs in migration scripts (#22304)
Various migration scripts define a normalize_raw method to do custom processing of post contents before storing it in the Post.raw and other fields.

They normally do not handle nil inputs, but it's a relatively common occurrence in data dumps.

Since this method is used from various points in the migration script, as it stands, the experience of using a migration script is that it will fail multiple times at different points, forcing you to fix the data or apply logic hacks every time then restarting.

This PR generalizes handling of nil input by returning a <missing> string.

Pros:

    no more messy repeated crashes + restarts
    consistency

Cons:

    it might hide data issues
        OTOH we can't print a warning on that method because it will flood the console since it's called from inside loops.

* FIX: zendesk import script: support nil inputs in normalize_raw
* FIX: return '<missing>' instead of empty string; do it for all methods
2023-06-29 13:22:47 -03:00
911539c1b2 FIX: Removing arbitrary limit in a Discuz importer script query (#21686) 2023-05-23 17:07:09 -04:00
b24c35d887 DEV: phpBB3 importer should get quoted username from actual post (#20979)
The actual code in the import script didn't match the expected behavior as defined by the specs. This fixes it and is a follow-up to ad32fa56
2023-04-05 15:43:20 +02:00
ad32fa5690 FIX: phpBB3 importer created invalid quote for posts (#20646)
for proper quotes (those including a valid reference to a source post), the
importer failed to yield a username, and the imported quote was broken.
2023-04-05 13:50:01 +02:00
29e2e3ff3b DEV: Fix random typos (#20937) 2023-04-03 19:27:32 +02:00
ccb345bd88 FEATURE: Update topic/comment embedding parameters (#20181)
This commit implements many changes to topic and comments embedding. It
deprecates the class_name field from EmbeddableHost and suggests using
the className parameter. discourse_username parameter has been
deprecated and it will fetch it from embedded site from the author or
discourse-username meta.

See the updated code sample from Admin > Customize > Embedding page.

* FEATURE: Add className parameter for Discourse embed

* DEV: Hide class_name from EmbeddableHost

* DEV: Deprecate class_name field of EmbeddableHost

* FEATURE: Use either author or discourse-username meta tag

* DEV: Deprecate discourse_username parameter

* DEV: Improve embed code sample
2023-02-28 14:31:59 +02:00
f63d18d6ca DEV: Update syntax_tree to 6.0.1 (#20466) 2023-02-27 17:20:00 +01:00
f7c57fbc19 DEV: Enable unless cops
We discussed the use of `unless` internally and decided to enforce
available rules from rubocop to restrict its most problematic uses.
2023-02-21 10:30:48 +01:00
25a226279a DEV: Replace #pluck_first freedom patch with AR #pick in core (#19893)
The #pluck_first freedom patch, first introduced by @danielwaterworth has served us well, and is used widely throughout both core and plugins. It seems to have been a common enough use case that Rails 6 introduced it's own method #pick with the exact same implementation. This allows us to retire the freedom patch and switch over to the built-in ActiveRecord method.

There is no replacement for #pluck_first!, but a quick search shows we are using this in a very limited capacity, and in some cases incorrectly (by assuming a nil return rather than an exception), which can quite easily be replaced with #pick plus some extra handling.
2023-02-13 12:39:45 +08:00
d1e844841d Fix occasional bug in order of imported comments (#20204)
This bug is actually a Drupal issue where some edited posts have their `created` and `changed` timestamps set to the same value. But even when that happens in Drupal it still maintains the correct post order in an affected thread. This PR makes the Discourse importer also maintain the original Drupal comment order by sorting comments in the source DB by their `cid`, which is sequential and never changes. More details from this post onward:
https://meta.discourse.org/t/large-drupal-forum-migration-importer-errors-and-limitations/246939/24?u=rahim123
2023-02-08 22:20:46 -05:00
c3e978ada9 DEV: Add import script for Yammer (#20074)
Co-authored-by: Jay Pfaffman <jay@literatecomputing.com>
2023-01-31 10:12:01 +01:00
436b3b392b DEV: Apply syntax_tree formatting to script/* 2023-01-09 11:13:22 +00:00
d5491b13f5 DEV: Fix syntax/formatting in xenforo import script (#19761)
Followup to 7dfe85fc
2023-01-05 12:47:05 +00:00
cc5b4cd49a FIX: change drupal permalink creation to use /node/
Drupal URL scheme for nodes begins with `/node/` , not `/topic/` .
2022-12-02 16:03:00 +11:00
7c321d3aad PERF: Update Group#user_count counter cache outside DB transaction (#19256)
While load testing our user creation code path in production, we
identified that executing the DB statement to update the `Group#user_count` column within a
transaction is creating a bottleneck for us. This is because the
creation of a user and addition of the user to the relevant groups are
done in a transaction. When we execute the DB statement to update
`Group#user_count` for the relevant group, a row level lock is held
until the transaction completes. This row level lock acts like a global
lock when the server is creating users that will be added to the same
group in quick succession.

Instead of updating the counter cache within a transaction which the
default ActiveRecord `counter_cache` option does, we simply update the
counter cache outside of the committing transaction.

Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>

Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
2022-11-30 11:52:08 -03:00
bfecbde837 Fixes for vBulletin bulk importer (#17618)
* Allow taking table prefix from env var

* FIX: remove unused column references

The columns `filedata` and `extension` are not present in a v4.2.4
database, and they aren't used in the method anyways.

* FIX: report progress for tables without imported_id

* FIX: effectively check for AR validation errors

NOTE: other migration scripts also have this problem; see /t/58202

* FIX: properly count Posts when importing attachments

* FIX: improve logging

* Remove leftover comment

* FIX: show progress when exporting Permalink file

* PERF: stream Permalink file

The current way results in tons of memory usage; write once per line instead

* Document fixes needed

* WIP - deduplicate category names

* Ignore non alphanumeric chars for grouping

* FIX: properly deduplicate user emails by merging accounts

* FIX: don't merge empty UserEmails

* Improve logging

* Merge users AFTER fixing primary key sequences

* Parallelize user merging

* Save duplicated users structure for debugging purposes

* Add progress logging for the (multiple hour) user merging step
2022-11-28 16:30:19 -03:00
7dfe85fcc7 DEV: Xenforo importer improvements (#18457)
* Fix: make expressions non-greedy
* Feature: import Xenforo avatars
* Feature: import Xenforo likes
* Feature: import Xenforo private messages
* Feature: Xenforo create permalinks
* Feature: Xenforo migrate view counts
* Fix: Xenforo list regexes
* Fix: Xenforo import all attachments
2022-11-28 16:42:39 +01:00
9e9235ca62 FEATURE: Add import script for Elgg (#19140) 2022-11-28 16:28:08 +01:00
067c4deb4c Fix comment to include phpbb 3.3, which is now supported (#18006) 2022-08-19 16:42:32 -04:00
ef842a4b29 FEATURE: Adding a simple CSV importer (#17993) 2022-08-19 13:09:30 -04:00
8836c8bcdf FIX: the phpbbb import script was not parsing youtube tags (#17787) 2022-08-05 15:20:32 -04:00
603f36ca4a DEV: Support phpBB 3.3 imports (#17641)
* handle polls with duplicate items
* handle polls with incorrect poll_option_total values
* handle group IDs in personal messages
* support for version 3.3
2022-07-25 22:07:03 +02:00
7ab5dcf82f FEATURE: my_bb import supports avatars (#17617) 2022-07-25 15:22:25 +02:00
b9ac8e5748 Adding 3.2 to the versions of phpbb supported by the migration script (#17483) 2022-07-14 18:06:47 +05:30
fcc2e7ebbf FEATURE: Promote polymorphic bookmarks to default and migrate (#16729)
This commit migrates all bookmarks to be polymorphic (using the
bookmarkable_id and bookmarkable_type) columns. It also deletes
all the old code guarded behind the use_polymorphic_bookmarks setting
and changes that setting to true for all sites and by default for
the sake of plugins.

No data is deleted in the migrations, the old post_id and for_topic
columns for bookmarks will be dropped later on.
2022-05-23 10:07:15 +10:00
222c8d9b6a FEATURE: Polymorphic bookmarks pt. 3 (reminders, imports, exports, refactors) (#16591)
A bit of a mixed bag, this addresses several edge areas of bookmarks and makes them compatible with polymorphic bookmarks (hidden behind the `use_polymorphic_bookmarks` site setting). The main ones are:

* ExportUserArchive compatibility
* SyncTopicUserBookmarked job compatibility
* Sending different notifications for the bookmark reminders based on the bookmarkable type
* Import scripts compatibility
* BookmarkReminderNotificationHandler compatibility

This PR also refactors the `register_bookmarkable` API so it accepts a class descended from a `BaseBookmarkable` class instead. This was done because we kept having to add more and more lambdas/properties inline and it was very messy, so a factory pattern is cleaner. The classes can be tested independently as well.

Some later PRs will address some other areas like the discourse narrative bot, advanced search, reports, and the .ics endpoint for bookmarks.
2022-05-09 09:37:23 +10:00
3e5faffb0d DEV: mbox importer improvements (#16557)
* FIX: support specifying parent_category_id in mbox import metadata
* FIX: elide tabs from topic titles
* FIX: optionally fix Mailman from: addresses
* DEV: optionally elide anything up to the last = in email addresses
* Fix Mailmain broken from: detection
2022-04-29 13:24:29 -03:00
2fc70c5572 DEV: Correctly tag heredocs (#16061)
This allows text editors to use correct syntax coloring for the heredoc sections.

Heredoc tag names we use:

languages: SQL, JS, RUBY, LUA, HTML, CSS, SCSS, SH, HBS, XML, YAML/YML, MF, ICS
other: MD, TEXT/TXT, RAW, EMAIL
2022-02-28 20:50:55 +01:00
6f6406ea03 DEV: Fix random typos (#16066) 2022-02-28 10:20:58 +08:00
3bf3b9a4a5 DEV: pull email address validation out to a new EmailAddressValidator
We validate the *format* of email addresses in many places with a match against
a regex, often with very slightly different syntax.

Adding a separate EmailAddressValidator simplifies the code in a few spots and
feels cleaner.

Deprecated the old location in case someone is using it in a plugin.

No functionality change is in this commit.

Note: the regex used at the moment does not support using address literals, e.g.:
* localpart@[192.168.0.1]
* localpart@[2001:db8::1]
2022-02-17 21:49:22 -05:00
6394d7cddf DEV: Improve phpBB3 import script (#15956)
* Optional import of custom user fields from phpBB 3.1+
* Optional import of likes from phpBB3
  Requires the phpBB "Thanks for posts" extension
* Fix import of bookmarks from phpBB3
* Update `created_at` of existing user
* Support mapping of phpBB forums to existing Discourse categories
  This is in addition to the ability of merging phpBB forums and importing into newly created Discourse categories.
2022-02-16 13:04:31 +01:00
33d6ed60a4 DEV: Don't import year of birth (#15937)
The cakeday plugin doesn't use the year.
2022-02-14 18:10:35 +01:00
6a41ec179c FIX: Default settings for phpBB3 import were broken (#15913) 2022-02-11 18:18:54 +01:00
ea2fd75d10 DEV: Fix some regexes in phpBB3 import script (#15829)
1. bbcode hashes don't always have exactly 8 characters.

2. colors aren't always hex values, it can be a color string ("red", "blue", etc).

3. The closing tag of smileys doesn't always include a `:` character (the start of the regex was already right for this particular issue)
2022-02-07 16:16:46 +01:00
c5fd8c42db DEV: Fix methods removed in Ruby 3.2 (#15459)
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
2022-01-05 18:45:08 +01:00
48a08cc397 FIX: Vanilla importer fixes (#14699)
Import script was out of date
2021-10-27 14:22:37 +02:00
69f0f48dc0 DEV: Fix rubocop issues (#14715) 2021-10-27 11:39:28 +03:00
46d96c9feb DEV: Apply rubocop to script/import_scripts/phorum.rb (#14727)
Followup to b24002018a10e12b7bdee3a8f5218345716af6ff
2021-10-26 19:16:52 +01:00
b24002018a Update phorum.rb
Add attachment/file/upload handling to bring them in from phorum to discourse
2021-10-26 12:41:50 -04:00
97178cd777 FIX: phpbb import - attachments not embedded in posts (#14570) 2021-10-11 14:27:54 +02:00
a413a1e015 DEV: process image uploads in the Zendesk API import script (#14524) 2021-10-06 12:24:12 -04:00
cd9262b7d3 DEV: minor improvements in the vanilla import script. (#14026)
We're parsing the post raw based on the record format now.
2021-08-12 15:07:44 +05:30
f9aaed7020 FIX: MyBB importer exposes deleted posts (#13700)
The MyBB importer exposes posts marked as deleted in MyBB. This may lead to a privacy issue.
2021-07-22 09:55:02 +02:00
37b8ce79c9 FEATURE: Add last visit indication to topic view page. (#13471)
This PR also removes grey old unread bubble from the topic badges by
dropping `TopicUser#highest_seen_post_number`.
2021-07-05 14:17:31 +08:00