diff --git a/docs/content/s3.md b/docs/content/s3.md index e1620fd5a..f7033d1ce 100644 --- a/docs/content/s3.md +++ b/docs/content/s3.md @@ -435,6 +435,83 @@ If you are doing a server-side copy, you can also increase the number of transfe You will need to experiment with these values to find the optimal settings for your setup. +### Data integrity + +Rclone does its best to verify every part of an upload or download to +the s3 provider using various hashes. + +Every HTTP transaction to/from the provider has a +`X-Amz-Content-Sha256` or a `Content-Md5` header to guard against +corruption of the HTTP body. The HTTP Header is protected by the +signature passed in the `Authorization` header. + +All communications with the provider is done over https for encryption +and additional error protection. + +#### Single part uploads + +- Rclone uploads single part uploads with a `Content-Md5` using the + MD5 hash read from the source. The provider checks this is correct + on receipt of the data. + +- Rclone then does a HEAD request (disable with `--s3-no-head`) to + read the `ETag` back which is the MD5 of the file and checks that with + what it sent. + +Note that if the source does not have an MD5 then the single part +uploads will not have hash protection. In this case it is recommended +to use `--s3-upload-cutoff 0` so all files are uploaded as multipart +uploads. + +#### Multipart uplaods + +For files above `--s3-upload-cutoff` rclone splits the file into +multiple parts for upload. + +- Each part is protected with both an `X-Amz-Content-Sha256` and a + `Content-Md5` + +When rclone has finished the upload of all the parts it then completes +the upload by sending: + +- The MD5 hash of each part +- The number of parts +- This info is all protected with a `X-Amz-Content-Sha256` + +The provider checks the MD5 for all the parts it has received against +what rclone sends and if it is good it returns OK. + +Rclone then does a HEAD request (disable with `--s3-no-head`) and +checks the ETag is what it expects (in this case it should be the MD5 +sum of all the MD5 sums of all the parts with the number of parts on +the end). + +If the source has an MD5 sum then rclone will attach the +`X-Amz-Meta-Md5chksum` with it as the `ETag` for a multipart upload +can't easily be checked against the file as the chunk size must be +known in order to calculate it. + +#### Downloads + +Rclone checks the MD5 hash of the data downloaded against either the +ETag or the `X-Amz-Meta-Md5chksum` metadata (if present) which rclone +uploads with multipart uploads. + +#### Further checking + +At each stage rclone and the provider are sending and checking hashes of +**everything**. Rclone deliberately HEADs each object after upload to +check it arrived safely for extra security. (You can disable this with +`--s3-no-head`). + +If you require further assurance that your data is intact you can use +`rclone check` to check the hashes locally vs the remote. + +And if you are feeling ultimately paranoid use `rclone check --download` +which will download the files and check them against the local copies. +(Note that this doesn't use disk to do this - it streams them in +memory). + ### Versions When bucket versioning is enabled (this can be done with rclone with