Commit Graph

30 Commits

Author SHA1 Message Date
3e5faffb0d DEV: mbox importer improvements (#16557)
* FIX: support specifying parent_category_id in mbox import metadata
* FIX: elide tabs from topic titles
* FIX: optionally fix Mailman from: addresses
* DEV: optionally elide anything up to the last = in email addresses
* Fix Mailmain broken from: detection
2022-04-29 13:24:29 -03:00
69f0f48dc0 DEV: Fix rubocop issues (#14715) 2021-10-27 11:39:28 +03:00
59097b207f DEV: Correct typos and spelling mistakes (#12812)
Over the years we accrued many spelling mistakes in the code base. 

This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change 

- comments
- test descriptions
- other low risk areas
2021-05-21 11:43:47 +10:00
e7892df10d FIX: handle charset=windows-1252 in mbox import script (#12832)
Co-authored-by: Loïc Dachary <loic@dachary.org>
2021-04-27 15:43:31 +02:00
d0d5a138c3 DEV: stop freezing frozen strings
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.

For files with `# frozen_string_literal: true`

```
puts %w{a b}[0].frozen?
=> true

puts "hi".frozen?
=> true

puts "a #{1} b".frozen?
=> true

puts ("a " + "b").frozen?
=> false

puts (-("a " + "b")).frozen?
=> true
```

For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
2020-04-30 16:48:53 +10:00
739430c01e FIX: mbox import failed if no tags were configured 2020-03-26 16:41:11 +01:00
0a88232e87 DEV: Improve mbox import script
* Better documentation of settings
* Add option to exclude trimmed parts of emails (enabled by default) to not revail email addresses
2020-03-14 00:00:36 +01:00
0e752db411 DEV: Improve mbox import script
* Customizable email subject prefixes to remove "Re" and "Fwd" as well as localized prefixes.
* Configuration option for prefixes like [FOO] or (BAR) which can be replaced with tags during import.
* Bugfix: Import script might have skipped some users due to missing ORDER BY.
2020-03-09 10:26:45 +01:00
0f3c3bc309 Make import scripts work with frozen strings 2019-05-30 22:22:24 +02:00
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
2349ba3bc4 Improve Google Groups scraper
* Better error detection during login phase
* Experimental support for 2FA and SMS codes
* Detect missing permissions to scrape email addresses
2019-03-24 23:15:13 +01:00
65db9326b4 FEATURE: Add download script for Google Groups 2018-10-31 01:12:05 +01:00
341836eb42 Fix the rake task and importer instead 2018-10-17 16:48:09 +02:00
ee18d9ace0 FIX: mbox importer and rake task were broken 2018-10-17 16:34:18 +02:00
ac743dab10 Improve mbox import script
* emails weren't sorted in correct order
* better default regex for splitting mbox files
* output Message-ID if email is skipped because it doesn't have a Date
2018-08-23 09:46:28 +02:00
f2d00e5eff FEATURE: Use Message-ID for detecting email replies to group
Ignores the site setting "find_related_post_with_key" and always tries to honor the `In-Reply-To` and `References` header for emails sent to a group.

The senders email address must be included in the `To` or `CC` header of a previous email sent to the group and the `Message-ID` of that email must be included in the current email's `In-Reply-To` or `References` header.
2018-04-05 11:00:38 +02:00
9b651adadb FIX: mbox importer should ignore emails without date 2018-03-13 13:42:57 +01:00
dc32ee5cbf Improvements to mbox import script
* Ignore errors during indexing and show information about the message causing the problem
* Always activate imported users if they aren't staged
2018-03-06 11:32:12 +01:00
479f7ed18f Ignore case when removing mailing list name from subject 2018-02-12 21:41:58 +01:00
6500343431 FIX: mbox importer didn't detected already indexed files 2018-01-17 17:03:53 +01:00
bb54eb1192 Improvements to mbox importer
* store time it took to index message in DB (to find performance issues)
* ignore listserv specific files
* better examples for split_regex
* first email in mbox shouldn't contain the split string
* always lock the DB in exclusive mode
* save email within transaction
* messages can be grouped by subject and use original order (for Listserv)
* adds option to index emails without running the import
2018-01-17 12:04:57 +01:00
77a92e8878 Allow user staging via setting (#5468) 2018-01-04 09:17:35 +01:00
cafe69caac Refactor mbox import script 2017-12-13 22:03:31 +01:00
3190c13c22 import staged users as inactive in mbox import 2017-12-13 08:45:43 +05:30
16738cfb1b FEATURE: convert plain text emails to markdown 2017-12-06 01:47:51 +01:00
32dd1e66be improvements to the mbox import script
* ignores dot-files and empty emails
* new setting to prefer HTML over plaintext emails during import
* restore original site settings at the end of import
* elided content of HTML mails was not put inside details block
2017-11-18 17:16:44 +01:00
06a6ddc3ba handle plaintext and HTML emails in mbox importer 2017-11-15 20:22:11 +01:00
6c829c24d7 escaping the subject isn't needed in the mbox importer 2017-10-19 15:25:20 +02:00
c41880ab19 Improvements to the experimental mbox importer
* Disable journaling to improve performance in Docker
* Use the email cooking method
* Store IncomingEmail in order find related posts by Message-ID
* Escape HTML in imported messages
2017-10-19 14:27:40 +02:00
8299e7e8c3 Add new, experimental version of mbox importer 2017-05-29 20:59:18 +02:00