FEATURE: Direct S3 multipart uploads for backups (#14736)

This PR introduces a new `enable_experimental_backup_uploads` site setting (default false and hidden), which when enabled alongside `enable_direct_s3_uploads` will allow for direct S3 multipart uploads of backup .tar.gz files.

To make multipart external uploads work with both the S3BackupStore and the S3Store, I've had to move several methods out of S3Store and into S3Helper, including:

* presigned_url
* create_multipart
* abort_multipart
* complete_multipart
* presign_multipart_part
* list_multipart_parts

Then, S3Store and S3BackupStore either delegate directly to S3Helper or have their own special methods to call S3Helper for these methods. FileStore.temporary_upload_path has also removed its dependence on upload_path, and can now be used interchangeably between the stores. A similar change was made in the frontend as well, moving the multipart related JS code out of ComposerUppyUpload and into a mixin of its own, so it can also be used by UppyUploadMixin.

Some changes to ExternalUploadManager had to be made here as well. The backup direct uploads do not need an Upload record made for them in the database, so they can be moved to their final S3 resting place when completing the multipart upload.

This changeset is not perfect; it introduces some special cases in UploadController to handle backups that was previously in BackupController, because UploadController is where the multipart routes are located. A subsequent pull request will pull these routes into a module or some other sharing pattern, along with hooks, so the backup controller and the upload controller (and any future controllers that may need them) can include these routes in a nicer way.
This commit is contained in:
Martin Brennan
2021-11-11 08:25:31 +10:00
committed by GitHub
parent d4e35f50c2
commit e4350bb966
22 changed files with 725 additions and 356 deletions

View File

@ -78,6 +78,10 @@ class S3Helper
[path, etag.gsub('"', '')]
end
def path_from_url(url)
URI.parse(url).path.delete_prefix("/")
end
def remove(s3_filename, copy_to_tombstone = false)
s3_filename = s3_filename.dup
@ -282,6 +286,92 @@ class S3Helper
get_path_for_s3_upload(path)
end
def abort_multipart(key:, upload_id:)
s3_client.abort_multipart_upload(
bucket: s3_bucket_name,
key: key,
upload_id: upload_id
)
end
def create_multipart(key, content_type, metadata: {})
response = s3_client.create_multipart_upload(
acl: "private",
bucket: s3_bucket_name,
key: key,
content_type: content_type,
metadata: metadata
)
{ upload_id: response.upload_id, key: key }
end
def presign_multipart_part(upload_id:, key:, part_number:)
presigned_url(
key,
method: :upload_part,
expires_in: S3Helper::UPLOAD_URL_EXPIRES_AFTER_SECONDS,
opts: {
part_number: part_number,
upload_id: upload_id
}
)
end
# Important note from the S3 documentation:
#
# This request returns a default and maximum of 1000 parts.
# You can restrict the number of parts returned by specifying the
# max_parts argument. If your multipart upload consists of more than 1,000
# parts, the response returns an IsTruncated field with the value of true,
# and a NextPartNumberMarker element.
#
# In subsequent ListParts requests you can include the part_number_marker arg
# using the NextPartNumberMarker the field value from the previous response to
# get more parts.
#
# See https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Client.html#list_parts-instance_method
def list_multipart_parts(upload_id:, key:, max_parts: 1000, start_from_part_number: nil)
options = {
bucket: s3_bucket_name,
key: key,
upload_id: upload_id,
max_parts: max_parts
}
if start_from_part_number.present?
options[:part_number_marker] = start_from_part_number
end
s3_client.list_parts(options)
end
def complete_multipart(upload_id:, key:, parts:)
s3_client.complete_multipart_upload(
bucket: s3_bucket_name,
key: key,
upload_id: upload_id,
multipart_upload: {
parts: parts
}
)
end
def presigned_url(
key,
method:,
expires_in: S3Helper::UPLOAD_URL_EXPIRES_AFTER_SECONDS,
opts: {}
)
Aws::S3::Presigner.new(client: s3_client).presigned_url(
method,
{
bucket: s3_bucket_name,
key: key,
expires_in: expires_in,
}.merge(opts)
)
end
private
def fetch_bucket_cors_rules