This CL adds the SSE2 optimized implementation for fully connected
(FC) layers. The change includes a weights re-alignment op done once
at construction time. It is required in order to optimize the load op
to fill 128 bit registers.
This CL also includes unit test adaptations and a benchmark test
(disabled by default).
Bug: webrtc:10480
Change-Id: I5ed87f0a629faaaf4c8bffbce1cea5557518f8c8
Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/141862
Commit-Queue: Alessio Bazzica <alessiob@webrtc.org>
Reviewed-by: Gustaf Ullberg <gustaf@webrtc.org>
Cr-Commit-Position: refs/heads/master@{#29712}