Reason for revert:
Compile Error.
Original issue's description:
> The simulator puts into action the schedule of speech turns encoded in a MultiEndCall instance. The output is a set of audio track pairs. There is one set for each speaker and each set contains one near-end and one far-end audio track. The tracks are directly written into wav files instead of creating them in memory. To speed up the creation of the output wav files, *all* the source audio tracks (i.e., the atomic speech turns) are pre-loaded.
>
> The ConversationalSpeechTest.MultiEndCallSimulator unit test defines a conversational speech sequence and creates two wav files (with pure tones at 440 and 880 Hz) that are used as atomic speech turn tracks.
>
> This CL also patches MultiEndCall in order to allow input audio tracks with same sample rate and single channel only.
>
> BUG=webrtc:7218
>
> Review-Url: https://codereview.webrtc.org/2790933002
> Cr-Commit-Position: refs/heads/master@{#18480}
> Committed: 6b648c4697
TBR=minyue@webrtc.org,alessiob@webrtc.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=webrtc:7218
Review-Url: https://codereview.webrtc.org/2925123003
Cr-Commit-Position: refs/heads/master@{#18481}
Conversational Speech generator tool
Tool to generate multiple-end audio tracks to simulate conversational speech with two or more participants.
The input to the tool is a directory containing a number of audio tracks and a text file indicating how to time the sequence of speech turns (see the Example section).
Since the timing of the speaking turns is specified by the user, the generated tracks may not be suitable for testing scenarios in which there is unpredictable network delay (e.g., end-to-end RTC assessment).
Instead, the generated pairs can be used when the delay is constant (obviously including the case in which there is no delay). For instance, echo cancellation in the APM module can be evaluated using two-end audio tracks as input and reverse input.
By indicating negative and positive time offsets, one can reproduce cross-talk and silence in the conversation.
IMPORTANT: the whole code has not been landed yet.
Example
For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) and b1, b2 (speaker B). The text file with the timing information may look like this:
A a1 0
B b1 0
A a2 100
B b2 -200
A a3 0
A a4 0
The first column indicates the speaker name, the second contains the audio track file names, and the third the offsets (in milliseconds) used to concatenate the chunks.
Assume that all the audio tracks in the example above are 1000 ms long. The tool will then generate two tracks (A and B) that look like this:
Track A
a1 (1000 ms)
silence (1100 ms)
a2 (1000 ms)
silence (800 ms)
a3 (1000 ms)
a4 (1000 ms)
Track B
silence (1000 ms)
b1 (1000 ms)
silence (900 ms)
b2 (1000 ms)
silence (2000 ms)
The two tracks can be also visualized as follows (one characheter represents 100 ms, "." is silence and "*" is speech).
t: 0 1 2 3 4 5 6 (s)
A: **********...........**********........********************
B: ..........**********.........**********....................
^ 200 ms cross-talk
100 ms silence ^