mirror of
https://github.com/discourse/discourse.git
synced 2025-05-22 22:43:33 +08:00
FEATURE: better email reply parsing
This commit is contained in:
2
Gemfile
2
Gemfile
@ -47,8 +47,6 @@ gem 'aws-sdk', require: false
|
|||||||
gem 'excon', require: false
|
gem 'excon', require: false
|
||||||
gem 'unf', require: false
|
gem 'unf', require: false
|
||||||
|
|
||||||
gem 'email_reply_parser'
|
|
||||||
|
|
||||||
# note: for image_optim to correctly work you need to follow
|
# note: for image_optim to correctly work you need to follow
|
||||||
# https://github.com/toy/image_optim
|
# https://github.com/toy/image_optim
|
||||||
# pinned due to https://github.com/toy/image_optim/pull/75, docker image must be upgraded to upgrade
|
# pinned due to https://github.com/toy/image_optim/pull/75, docker image must be upgraded to upgrade
|
||||||
|
@ -115,7 +115,6 @@ GEM
|
|||||||
railties
|
railties
|
||||||
docile (1.1.5)
|
docile (1.1.5)
|
||||||
dotenv (2.0.2)
|
dotenv (2.0.2)
|
||||||
email_reply_parser (0.5.8)
|
|
||||||
ember-data-source (1.0.0.beta.16.1)
|
ember-data-source (1.0.0.beta.16.1)
|
||||||
ember-source (~> 1.8)
|
ember-source (~> 1.8)
|
||||||
ember-handlebars-template (0.1.5)
|
ember-handlebars-template (0.1.5)
|
||||||
@ -451,7 +450,6 @@ DEPENDENCIES
|
|||||||
byebug
|
byebug
|
||||||
certified
|
certified
|
||||||
discourse-qunit-rails
|
discourse-qunit-rails
|
||||||
email_reply_parser
|
|
||||||
ember-rails
|
ember-rails
|
||||||
ember-source (= 1.12.1)
|
ember-source (= 1.12.1)
|
||||||
excon
|
excon
|
||||||
|
278
lib/email/email_reply_parser.rb
Normal file
278
lib/email/email_reply_parser.rb
Normal file
@ -0,0 +1,278 @@
|
|||||||
|
require 'strscan'
|
||||||
|
|
||||||
|
# https://github.com/github/email_reply_parser/blob/master/lib/email_reply_parser.rb
|
||||||
|
#
|
||||||
|
# EmailReplyParser is a small library to parse plain text email content. The
|
||||||
|
# goal is to identify which fragments are quoted, part of a signature, or
|
||||||
|
# original body content. We want to support both top and bottom posters, so
|
||||||
|
# no simple "REPLY ABOVE HERE" content is used.
|
||||||
|
#
|
||||||
|
# Beyond RFC 5322 (which is handled by the [Ruby mail gem][mail]), there aren't
|
||||||
|
# any real standards for how emails are created. This attempts to parse out
|
||||||
|
# common conventions for things like replies:
|
||||||
|
#
|
||||||
|
# this is some text
|
||||||
|
#
|
||||||
|
# On <date>, <author> wrote:
|
||||||
|
# > blah blah
|
||||||
|
# > blah blah
|
||||||
|
#
|
||||||
|
# ... and signatures:
|
||||||
|
#
|
||||||
|
# this is some text
|
||||||
|
#
|
||||||
|
# --
|
||||||
|
# Bob
|
||||||
|
# http://homepage.com/~bob
|
||||||
|
#
|
||||||
|
# Each of these are parsed into Fragment objects.
|
||||||
|
#
|
||||||
|
# EmailReplyParser also attempts to figure out which of these blocks should
|
||||||
|
# be hidden from users.
|
||||||
|
#
|
||||||
|
# [mail]: https://github.com/mikel/mail
|
||||||
|
class EmailReplyParser
|
||||||
|
|
||||||
|
# Public: Splits an email body into a list of Fragments.
|
||||||
|
#
|
||||||
|
# text - A String email body.
|
||||||
|
#
|
||||||
|
# Returns an Email instance.
|
||||||
|
def self.read(text)
|
||||||
|
Email.new.read(text)
|
||||||
|
end
|
||||||
|
|
||||||
|
# Public: Get the text of the visible portions of the given email body.
|
||||||
|
#
|
||||||
|
# text - A String email body.
|
||||||
|
#
|
||||||
|
# Returns a String.
|
||||||
|
def self.parse_reply(text)
|
||||||
|
self.read(text).visible_text
|
||||||
|
end
|
||||||
|
|
||||||
|
### Emails
|
||||||
|
|
||||||
|
# An Email instance represents a parsed body String.
|
||||||
|
class Email
|
||||||
|
# Emails have an Array of Fragments.
|
||||||
|
attr_reader :fragments
|
||||||
|
|
||||||
|
def initialize
|
||||||
|
@fragments = []
|
||||||
|
end
|
||||||
|
|
||||||
|
# Public: Gets the combined text of the visible fragments of the email body.
|
||||||
|
#
|
||||||
|
# Returns a String.
|
||||||
|
def visible_text
|
||||||
|
fragments.select{|f| !f.hidden?}.map{|f| f.to_s}.join("\n").rstrip
|
||||||
|
end
|
||||||
|
|
||||||
|
# Splits the given text into a list of Fragments. This is roughly done by
|
||||||
|
# reversing the text and parsing from the bottom to the top. This way we
|
||||||
|
# can check for 'On <date>, <author> wrote:' lines above quoted blocks.
|
||||||
|
#
|
||||||
|
# text - A String email body.
|
||||||
|
#
|
||||||
|
# Returns this same Email instance.
|
||||||
|
def read(text)
|
||||||
|
# in 1.9 we want to operate on the raw bytes
|
||||||
|
text = text.dup.force_encoding('binary') if text.respond_to?(:force_encoding)
|
||||||
|
|
||||||
|
# Normalize line endings.
|
||||||
|
text.gsub!("\r\n", "\n")
|
||||||
|
|
||||||
|
# Check for "On DATE, NAME <EMAIL> wrote:"
|
||||||
|
# or "---- Original Message ----" and strip
|
||||||
|
# email content after that part
|
||||||
|
if text =~ /^(On\s.+wrote:.*)$/nm || text =~ /^([\s_-]+Original (?i)message?[\s_-]+$.*)/nm
|
||||||
|
text.gsub!($1, "")
|
||||||
|
end
|
||||||
|
|
||||||
|
# Some users may reply directly above a line of underscores.
|
||||||
|
# In order to ensure that these fragments are split correctly,
|
||||||
|
# make sure that all lines of underscores are preceded by
|
||||||
|
# at least two newline characters.
|
||||||
|
text.gsub!(/([^\n])(?=\n_{7}_+)$/m, "\\1\n")
|
||||||
|
|
||||||
|
# The text is reversed initially due to the way we check for hidden
|
||||||
|
# fragments.
|
||||||
|
text = text.reverse
|
||||||
|
|
||||||
|
# This determines if any 'visible' Fragment has been found. Once any
|
||||||
|
# visible Fragment is found, stop looking for hidden ones.
|
||||||
|
@found_visible = false
|
||||||
|
|
||||||
|
# This instance variable points to the current Fragment. If the matched
|
||||||
|
# line fits, it should be added to this Fragment. Otherwise, finish it
|
||||||
|
# and start a new Fragment.
|
||||||
|
@fragment = nil
|
||||||
|
|
||||||
|
# Use the StringScanner to pull out each line of the email content.
|
||||||
|
@scanner = StringScanner.new(text)
|
||||||
|
while line = @scanner.scan_until(/\n/n)
|
||||||
|
scan_line(line)
|
||||||
|
end
|
||||||
|
|
||||||
|
# Be sure to parse the last line of the email.
|
||||||
|
if (last_line = @scanner.rest.to_s).size > 0
|
||||||
|
scan_line(last_line)
|
||||||
|
end
|
||||||
|
|
||||||
|
# Finish up the final fragment. Finishing a fragment will detect any
|
||||||
|
# attributes (hidden, signature, reply), and join each line into a
|
||||||
|
# string.
|
||||||
|
finish_fragment
|
||||||
|
|
||||||
|
@scanner = @fragment = nil
|
||||||
|
|
||||||
|
# Now that parsing is done, reverse the order.
|
||||||
|
@fragments.reverse!
|
||||||
|
self
|
||||||
|
end
|
||||||
|
|
||||||
|
private
|
||||||
|
EMPTY = "".freeze
|
||||||
|
SIGNATURE = '(?m)(--\s*$|__\s*$)|(^(\w+\s*){1,3} ym morf tneS$)'
|
||||||
|
|
||||||
|
begin
|
||||||
|
require 're2'
|
||||||
|
SIG_REGEX = RE2::Regexp.new(SIGNATURE)
|
||||||
|
rescue LoadError
|
||||||
|
SIG_REGEX = Regexp.new(SIGNATURE)
|
||||||
|
end
|
||||||
|
|
||||||
|
### Line-by-Line Parsing
|
||||||
|
|
||||||
|
# Scans the given line of text and figures out which fragment it belongs
|
||||||
|
# to.
|
||||||
|
#
|
||||||
|
# line - A String line of text from the email.
|
||||||
|
#
|
||||||
|
# Returns nothing.
|
||||||
|
def scan_line(line)
|
||||||
|
line.chomp!("\n")
|
||||||
|
line.lstrip! unless SIG_REGEX.match(line)
|
||||||
|
|
||||||
|
# We're looking for leading `>`'s to see if this line is part of a
|
||||||
|
# quoted Fragment.
|
||||||
|
is_quoted = !!(line =~ /(>+)$/n)
|
||||||
|
|
||||||
|
# Mark the current Fragment as a signature if the current line is empty
|
||||||
|
# and the Fragment starts with a common signature indicator.
|
||||||
|
if @fragment && line == EMPTY
|
||||||
|
if SIG_REGEX.match @fragment.lines.last
|
||||||
|
@fragment.signature = true
|
||||||
|
finish_fragment
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# If the line matches the current fragment, add it. Note that a common
|
||||||
|
# reply header also counts as part of the quoted Fragment, even though
|
||||||
|
# it doesn't start with `>`.
|
||||||
|
if @fragment &&
|
||||||
|
((@fragment.quoted? == is_quoted) ||
|
||||||
|
(@fragment.quoted? && (quote_header?(line) || line == EMPTY)))
|
||||||
|
@fragment.lines << line
|
||||||
|
|
||||||
|
# Otherwise, finish the fragment and start a new one.
|
||||||
|
else
|
||||||
|
finish_fragment
|
||||||
|
@fragment = Fragment.new(is_quoted, line)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
# Detects if a given line is a header above a quoted area. It is only
|
||||||
|
# checked for lines preceding quoted regions.
|
||||||
|
#
|
||||||
|
# line - A String line of text from the email.
|
||||||
|
#
|
||||||
|
# Returns true if the line is a valid header, or false.
|
||||||
|
def quote_header?(line)
|
||||||
|
line =~ /^:etorw.*nO$/n
|
||||||
|
end
|
||||||
|
|
||||||
|
# Builds the fragment string and reverses it, after all lines have been
|
||||||
|
# added. It also checks to see if this Fragment is hidden. The hidden
|
||||||
|
# Fragment check reads from the bottom to the top.
|
||||||
|
#
|
||||||
|
# Any quoted Fragments or signature Fragments are marked hidden if they
|
||||||
|
# are below any visible Fragments. Visible Fragments are expected to
|
||||||
|
# contain original content by the author. If they are below a quoted
|
||||||
|
# Fragment, then the Fragment should be visible to give context to the
|
||||||
|
# reply.
|
||||||
|
#
|
||||||
|
# some original text (visible)
|
||||||
|
#
|
||||||
|
# > do you have any two's? (quoted, visible)
|
||||||
|
#
|
||||||
|
# Go fish! (visible)
|
||||||
|
#
|
||||||
|
# > --
|
||||||
|
# > Player 1 (quoted, hidden)
|
||||||
|
#
|
||||||
|
# --
|
||||||
|
# Player 2 (signature, hidden)
|
||||||
|
#
|
||||||
|
def finish_fragment
|
||||||
|
if @fragment
|
||||||
|
@fragment.finish
|
||||||
|
if !@found_visible
|
||||||
|
if @fragment.quoted? || @fragment.signature? ||
|
||||||
|
@fragment.to_s.strip == EMPTY
|
||||||
|
@fragment.hidden = true
|
||||||
|
else
|
||||||
|
@found_visible = true
|
||||||
|
end
|
||||||
|
end
|
||||||
|
@fragments << @fragment
|
||||||
|
end
|
||||||
|
@fragment = nil
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
### Fragments
|
||||||
|
|
||||||
|
# Represents a group of paragraphs in the email sharing common attributes.
|
||||||
|
# Paragraphs should get their own fragment if they are a quoted area or a
|
||||||
|
# signature.
|
||||||
|
class Fragment < Struct.new(:quoted, :signature, :hidden)
|
||||||
|
# This is an Array of String lines of content. Since the content is
|
||||||
|
# reversed, this array is backwards, and contains reversed strings.
|
||||||
|
attr_reader :lines,
|
||||||
|
|
||||||
|
# This is reserved for the joined String that is build when this Fragment
|
||||||
|
# is finished.
|
||||||
|
:content
|
||||||
|
|
||||||
|
def initialize(quoted, first_line)
|
||||||
|
self.signature = self.hidden = false
|
||||||
|
self.quoted = quoted
|
||||||
|
@lines = [first_line]
|
||||||
|
@content = nil
|
||||||
|
@lines.compact!
|
||||||
|
end
|
||||||
|
|
||||||
|
alias quoted? quoted
|
||||||
|
alias signature? signature
|
||||||
|
alias hidden? hidden
|
||||||
|
|
||||||
|
# Builds the string content by joining the lines and reversing them.
|
||||||
|
#
|
||||||
|
# Returns nothing.
|
||||||
|
def finish
|
||||||
|
@content = @lines.join("\n")
|
||||||
|
@lines = nil
|
||||||
|
@content.reverse!
|
||||||
|
end
|
||||||
|
|
||||||
|
def to_s
|
||||||
|
@content
|
||||||
|
end
|
||||||
|
|
||||||
|
def inspect
|
||||||
|
to_s.inspect
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
@ -1,5 +1,6 @@
|
|||||||
require_dependency 'new_post_manager'
|
require_dependency 'new_post_manager'
|
||||||
require_dependency 'email/html_cleaner'
|
require_dependency 'email/html_cleaner'
|
||||||
|
require_dependency 'email/email_reply_parser'
|
||||||
|
|
||||||
module Email
|
module Email
|
||||||
|
|
||||||
|
@ -100,53 +100,6 @@ It will also be my *only* reply."
|
|||||||
)
|
)
|
||||||
end
|
end
|
||||||
|
|
||||||
it "handles inline reply" do
|
|
||||||
expect(test_parse_body(fixture_file("emails/inline_reply.eml"))).
|
|
||||||
to eq(
|
|
||||||
"On Wed, Oct 8, 2014 at 11:12 AM, techAPJ <info@unconfigured.discourse.org> wrote:
|
|
||||||
|
|
||||||
> techAPJ <https://meta.discourse.org/users/techapj>
|
|
||||||
> November 28
|
|
||||||
>
|
|
||||||
> Test reply.
|
|
||||||
>
|
|
||||||
> First paragraph.
|
|
||||||
>
|
|
||||||
> Second paragraph.
|
|
||||||
>
|
|
||||||
> To respond, reply to this email or visit
|
|
||||||
> https://meta.discourse.org/t/testing-default-email-replies/22638/3 in
|
|
||||||
> your browser.
|
|
||||||
> ------------------------------
|
|
||||||
> Previous Replies codinghorror
|
|
||||||
> <https://meta.discourse.org/users/codinghorror>
|
|
||||||
> November 28
|
|
||||||
>
|
|
||||||
> We're testing the latest GitHub email processing library which we are
|
|
||||||
> integrating now.
|
|
||||||
>
|
|
||||||
> https://github.com/github/email_reply_parser
|
|
||||||
>
|
|
||||||
> Go ahead and reply to this topic and I'll reply from various email clients
|
|
||||||
> for testing.
|
|
||||||
> ------------------------------
|
|
||||||
>
|
|
||||||
> To respond, reply to this email or visit
|
|
||||||
> https://meta.discourse.org/t/testing-default-email-replies/22638/3 in
|
|
||||||
> your browser.
|
|
||||||
>
|
|
||||||
> To unsubscribe from these emails, visit your user preferences
|
|
||||||
> <https://meta.discourse.org/my/preferences>.
|
|
||||||
>
|
|
||||||
|
|
||||||
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over
|
|
||||||
the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown
|
|
||||||
fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
|
|
||||||
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over
|
|
||||||
the lazy dog. The quick brown fox jumps over the lazy dog."
|
|
||||||
)
|
|
||||||
end
|
|
||||||
|
|
||||||
it "can retrieve the first part of multiple replies" do
|
it "can retrieve the first part of multiple replies" do
|
||||||
expect(test_parse_body(fixture_file("emails/inline_mixed.eml"))).to eq(
|
expect(test_parse_body(fixture_file("emails/inline_mixed.eml"))).to eq(
|
||||||
"The quick brown fox jumps over the lazy dog. The quick brown fox jumps over
|
"The quick brown fox jumps over the lazy dog. The quick brown fox jumps over
|
||||||
@ -173,6 +126,14 @@ the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown"
|
|||||||
expect(test_parse_body(fixture_file("emails/iphone_signature.eml"))).not_to match(/Sent from my iPhone/)
|
expect(test_parse_body(fixture_file("emails/iphone_signature.eml"))).not_to match(/Sent from my iPhone/)
|
||||||
end
|
end
|
||||||
|
|
||||||
|
it "strips regular signature" do
|
||||||
|
expect(test_parse_body(fixture_file("emails/signature.eml"))).not_to match(/Arpit/)
|
||||||
|
end
|
||||||
|
|
||||||
|
it "strips 'original message' context" do
|
||||||
|
expect(test_parse_body(fixture_file("emails/original_message_context.eml"))).not_to match(/Context/)
|
||||||
|
end
|
||||||
|
|
||||||
it "properly renders email reply from gmail web client" do
|
it "properly renders email reply from gmail web client" do
|
||||||
expect(test_parse_body(fixture_file("emails/gmail_web.eml"))).
|
expect(test_parse_body(fixture_file("emails/gmail_web.eml"))).
|
||||||
to eq(
|
to eq(
|
||||||
|
30
spec/fixtures/emails/original_message_context.eml
vendored
Normal file
30
spec/fixtures/emails/original_message_context.eml
vendored
Normal file
@ -0,0 +1,30 @@
|
|||||||
|
Delivered-To: test@mail.com
|
||||||
|
Return-Path: <walter.white@googlemail.com>
|
||||||
|
From: Walter White <walter.white@googlemail.com>
|
||||||
|
Content-Type: multipart/alternative;
|
||||||
|
boundary=Apple-Mail-8E182EEF-9DBC-41DE-A593-DF2E5EBD3975
|
||||||
|
Content-Transfer-Encoding: 7bit
|
||||||
|
Mime-Version: 1.0 (1.0)
|
||||||
|
Subject: Re: Signature in email replies!
|
||||||
|
Date: Thu, 23 Oct 2014 14:43:49 +0530
|
||||||
|
References: <1234@mail.gmail.com>
|
||||||
|
In-Reply-To: <1234@mail.gmail.com>
|
||||||
|
To: Arpit Jalan <test@mail.com>
|
||||||
|
X-Mailer: iPhone Mail (12A405)
|
||||||
|
|
||||||
|
|
||||||
|
--Apple-Mail-8E182EEF-9DBC-41DE-A593-DF2E5EBD3975
|
||||||
|
Content-Type: text/plain;
|
||||||
|
charset=us-ascii
|
||||||
|
Content-Transfer-Encoding: 7bit
|
||||||
|
|
||||||
|
This post should not include signature.
|
||||||
|
----Original Message----
|
||||||
|
|
||||||
|
Context here.
|
||||||
|
|
||||||
|
> On 23-Oct-2014, at 9:45 am, Arpit Jalan <test@mail.com> wrote:
|
||||||
|
>
|
||||||
|
> Signature in email replies!
|
||||||
|
|
||||||
|
--Apple-Mail-8E182EEF-9DBC-41DE-A593-DF2E5EBD3975
|
29
spec/fixtures/emails/signature.eml
vendored
Normal file
29
spec/fixtures/emails/signature.eml
vendored
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
Delivered-To: test@mail.com
|
||||||
|
Return-Path: <walter.white@googlemail.com>
|
||||||
|
From: Walter White <walter.white@googlemail.com>
|
||||||
|
Content-Type: multipart/alternative;
|
||||||
|
boundary=Apple-Mail-8E182EEF-9DBC-41DE-A593-DF2E5EBD3975
|
||||||
|
Content-Transfer-Encoding: 7bit
|
||||||
|
Mime-Version: 1.0 (1.0)
|
||||||
|
Subject: Re: Signature in email replies!
|
||||||
|
Date: Thu, 23 Oct 2014 14:43:49 +0530
|
||||||
|
References: <1234@mail.gmail.com>
|
||||||
|
In-Reply-To: <1234@mail.gmail.com>
|
||||||
|
To: Arpit Jalan <test@mail.com>
|
||||||
|
X-Mailer: iPhone Mail (12A405)
|
||||||
|
|
||||||
|
|
||||||
|
--Apple-Mail-8E182EEF-9DBC-41DE-A593-DF2E5EBD3975
|
||||||
|
Content-Type: text/plain;
|
||||||
|
charset=us-ascii
|
||||||
|
Content-Transfer-Encoding: 7bit
|
||||||
|
|
||||||
|
This post should not include signature.
|
||||||
|
|
||||||
|
----Arpit
|
||||||
|
|
||||||
|
> On 23-Oct-2014, at 9:45 am, Arpit Jalan <test@mail.com> wrote:
|
||||||
|
>
|
||||||
|
> Signature in email replies!
|
||||||
|
|
||||||
|
--Apple-Mail-8E182EEF-9DBC-41DE-A593-DF2E5EBD3975
|
Reference in New Issue
Block a user