Commit Graph

565 Commits

Author SHA1 Message Date
869f9b20a2 PERF: Dematerialize topic_reply_count (#9769)
* PERF: Dematerialize topic_reply_count

It's only ever used for trust level promotions that run daily, or compared to 0. We don't need to track it on every post creation.

* UX: Add symbol in TL3 report if topic reply count is capped

* DEV: Drop user_stats.topic_reply_count column
2020-05-14 15:42:00 -07:00
c6f68e4006 Fix syntax error in fluxbb.rb (#9727) 2020-05-11 11:07:57 -04:00
9bff0882c3 FEATURE: Nokogumbo (#9577)
* FEATURE: Nokogumbo

Use Nokogumbo HTML parser.
2020-05-05 13:46:57 +10:00
867bc3b48e FIX: Change base importer to create new Bookmark records (#9603)
Also add a spec and fixture with a mock importer that we can use to test the create_X methods of the base importer
2020-05-01 11:34:55 +10:00
d0d5a138c3 DEV: stop freezing frozen strings
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.

For files with `# frozen_string_literal: true`

```
puts %w{a b}[0].frozen?
=> true

puts "hi".frozen?
=> true

puts "a #{1} b".frozen?
=> true

puts ("a " + "b").frozen?
=> false

puts (-("a " + "b")).frozen?
=> true
```

For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
2020-04-30 16:48:53 +10:00
094ddb1c1f VBulletin5 importer improvements (#9477)
- no more hard coded contenttypes
- permalinks for topics, categories, subcategories
- better uploads handling
- tag support
2020-04-22 22:04:59 +02:00
3c72cbc5de FIX: Google groups import changed login URL (#9432)
I'm not clear why changing only the `wait_for_url` address was necessary and not also the `get` a few lines above, but this change seems to work for me on both literatecomputing.com Groups and a public group.
2020-04-15 16:45:14 -04:00
2b2584912a Improve Telligent import script
* Imports private messages
* Replaces internal links for topics and replies
* Allows incremental import of accepted answers
2020-04-03 18:10:52 +02:00
739430c01e FIX: mbox import failed if no tags were configured 2020-03-26 16:41:11 +01:00
d216483c53 FIX: Importing with pgbouncer failed
Checking if all records have been imported uses a temp table in PostgreSQL. This fails when pgbouncer is used unless the temp table is created inside a transaction.
2020-03-26 16:41:09 +01:00
c94b63bc75 DEV: Improve import of attachments from Telligent 2020-03-26 16:37:55 +01:00
5b2b769eb7 DEV: Ensure uploads aren't deleted during imports
Sidekiq might delete uploads if you, for some reason, create upload records before using them in posts.
2020-03-24 17:14:16 +01:00
445b35381d Improve Telligent import script
* Detects mostly all attachments and it's a lot faster
* Parses user properties in Ruby instead of the DB, because that's less errorprone
* Imports user avatars
* Imports topic views by users
* Better handling of quotes and YouTube links
2020-03-23 09:18:12 +01:00
d27ece9ded FIX: Method from Telligent import script was deleted by accident 2020-03-14 22:10:40 +01:00
e825f47daa DEV: Better handling of incremental scrapes for Google Groups 2020-03-14 00:00:36 +01:00
0a88232e87 DEV: Improve mbox import script
* Better documentation of settings
* Add option to exclude trimmed parts of emails (enabled by default) to not revail email addresses
2020-03-14 00:00:36 +01:00
36062f43c8 DEV: Improve Telligent import script
* Adds ability to map forums to categories and tags as well as ignore forums.
* Fixes regular expression for detecting attachments in posts.
* Handles "remote attachments" 😮 by inserting a link.
* Imports view counts for topics.
* Handles incorrect references of parent posts.
* Better handling of quotes.
* Finds a lot more attachments by trying to replace various Unicode characters in filenames.
2020-03-14 00:00:36 +01:00
ba1b840816 DEV: Don't deactivate suspended users during import
Otherwise a cleanup job might delete those deactivated users.
2020-03-14 00:00:36 +01:00
6c948f27ea FIX: Missing constant in SMF2 importer (#9178) 2020-03-11 10:19:59 -05:00
0e752db411 DEV: Improve mbox import script
* Customizable email subject prefixes to remove "Re" and "Fwd" as well as localized prefixes.
* Configuration option for prefixes like [FOO] or (BAR) which can be replaced with tags during import.
* Bugfix: Import script might have skipped some users due to missing ORDER BY.
2020-03-09 10:26:45 +01:00
edc8d58ac3 FEATURE: Add site setting to disable staged user cleanup
... and disabled the cleanup during imports, otherwise a running Sidekiq might delete users before posts are created
2020-03-09 10:26:41 +01:00
f7ea2fdea5 FIX: Import posts of missing users from phpbb3 (#9085)
Posts without a user probably shouldn't happen unless there was some direct database tampering, but data like that has been seen in the wild.

The importer will assign those posts to the "system" user.
2020-03-06 22:54:40 +01:00
d7ccb58559 FIX: Google Groups scraper failed to login 2020-03-02 17:24:48 +01:00
ff5ff8d0d2 fix invalid byte sequence in UTF-8 (ArgumentError) (#9077) 2020-02-28 10:26:18 -05:00
f35ee5e887 DEV: Improvements to SMF2 script (#9006) 2020-02-24 12:51:45 -06:00
0843e3e6ce FIX: add support for sub-sub-categories in base_importer
Also delegates 'post_already_imported?' and 'user_already_imported?' to the base importer.
2020-02-05 10:40:28 +01:00
1e4a83cc2a FEATURE: add mybb.ru import script (#8609) 2019-12-20 11:10:18 -05:00
edbc356593 FIX: Replace deprecated URI.encode, URI.escape, URI.unescape and URI.unencode (#8528)
The following methods have long been deprecated in ruby due to flaws in their implementation per http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/29293?29179-31097:

URI.escape
URI.unescape
URI.encode
URI.unencode
escape/encode are just aliases for one another. This PR uses the Addressable gem to replace these methods with its own encode, unencode, and encode_component methods where appropriate.

I have put all references to Addressable::URI here into the UrlHelper to keep them corralled in one place to make changes to this implementation easier.

Addressable is now also an explicit gem dependency.
2019-12-12 12:49:21 +10:00
0c52537f10 DEV: update rubocop to version 0.77
We like to stay as close as possible to latest with rubocop cause the cops
get better.

This update required some code changes, specifically the default is to avoid
explicit returns where implicit is done

Also this renames a few rules
2019-12-10 11:48:39 +11:00
c218036107 FIX: Make Google Groups scraper work for G Suite users 2019-11-28 02:09:51 +01:00
3bb7ad4be1 FEATURE: remove support for 'suppress_from_latest' category setting. (#8308) 2019-11-18 12:28:35 +05:30
067696df8f DEV: Apply Rubocop redundant return style 2019-11-14 15:10:51 -05:00
2d4c9bbaac import_scripts: add fluxbb prefix to missing query (#8163)
Signed-off-by: Jelle van der Waa <jelle@archlinux.org>
2019-10-08 11:46:00 +11:00
427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
b48ca9dee9 DEV: Simplify username validation in base importer
The `UsernameValidator` does already all the hard work. No need to do any additional checks in the import script. The checks were out-of-date anyway.
2019-10-01 20:33:09 +02:00
ed1e5ef6cc FIX: By default, don't abort Google Groups crawling on error 2019-09-18 18:14:04 +02:00
ab96239f2a FIX: Google Groups crawler failed to login
Trying to automate the login into a Google account is quite hard. This makes the crawler use the content of a cookies.txt file instead. It also removes a couple of deprecation warnings and adds some color to the output.
2019-09-18 13:09:20 +02:00
888b635cfc Import avatars and likes in the Zendesk AP importer
Co-authored-by: Justin DiRose <justin@justindirose.com>
2019-08-14 10:42:52 +02:00
4ed517a344 DEV: Make Rubocop happy
Follow-up to 6cc9fe42
2019-08-12 23:10:58 +02:00
6cc9fe42ce add mongo adapter to nodebb importer (#8000) 2019-08-12 14:15:11 -04:00
dcb47d902b REFACTOR: Rename SiteSetting.disable_edit_notifications to disable_system_edit_notifications (#7958)
* REFACTOR: Rename SiteSetting.disable_edit_notifications to disable_system_edit_notifications

- The older name could cause some confusion because the setting does not disable all edit notifications, only system ones.

* FIX: Add frozen_string_literal: true in the migration

* DEV: Deprecate 'disable_edit_notifications'
2019-07-31 20:20:41 +05:30
fd12c414e7 DEV: Refactor helper methods for upload markdown
Follow-up to a61ff167
2019-07-25 16:36:35 +02:00
a61ff16740 DEV: Make attachment markdown reusable 2019-07-25 14:04:18 +02:00
f0fea5991f FIX: Latest Selenium gem broke Google Groups import script
Selenium uses Keep-Alive since version 3.141, so the net-http-persistent gem shouldn't be needed anymore.
2019-07-10 09:45:33 +02:00
6d30be1f94 Improve XenForo import script.
- ensure only active, unbanned users are imported.
- ensure only visible threads/posts are imported.
2019-06-18 15:52:34 +05:30
77f5577e30 DEV: Improvements to AnswerHub import script. 2019-06-13 11:46:17 +05:30
36c0cfa890 FIX: Use new attachment markdown format in ImportScripts::Uploader. 2019-06-11 14:49:28 +08:00
0955d9ece9 create answerhub importer (#7671) 2019-06-03 12:17:22 +10:00
0f3c3bc309 Make import scripts work with frozen strings 2019-05-30 22:22:24 +02:00
c70d0c6659 Use an invalid domain for fake email addresses in importers 2019-05-30 22:22:24 +02:00