DEV: Update nokogiri to 1.18.1 (#30554)

Nokogiri/libxml is now more strict in terms of params it receives. It uses kwargs vs options object (I fixed an issue there in #30545) doesn't accept nil/blank html (fixed here) and most importantly handles encoding in a different way. It seems to require explicitly specifying UTF8. * Build(deps): Bump nokogiri from 1.16.8 to 1.18.1 Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.16.8 to 1.18.1. - [Release notes](https://github.com/sparklemotion/nokogiri/releases) - [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md) - [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.16.8...v1.18.1) --- updated-dependencies: - dependency-name: nokogiri dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-23 19:11:14 +08:00 · 2025-01-07 12:05:39 +01:00
parent c1a46995a7
commit affe26f0dd
7 changed files with 14 additions and 11 deletions
--- a/lib/excerpt_parser.rb
+++ b/lib/excerpt_parser.rb
@ -27,10 +27,11 @@ class ExcerptParser < Nokogiri::XML::SAX::Document
  end

  def self.get_excerpt(html, length, options)
-    html ||= ""
+    return "" if html.blank?
+
    length = html.length if html.include?("excerpt") && CUSTOM_EXCERPT_REGEX === html
    me = self.new(length, options)
-    parser = Nokogiri::HTML::SAX::Parser.new(me)
+    parser = Nokogiri::HTML4::SAX::Parser.new(me, Encoding::UTF_8)
    catch(:done) { parser.parse(html) }
    excerpt = me.excerpt.strip
    excerpt = excerpt.gsub(/\s*\n+\s*/, "\n\n") if options[:keep_onebox_source] ||