Files
platform-external-webrtc/webrtc/modules/audio_processing/test/conversational_speech
alessiob 8d23c050f2 MultiEndCall::CheckTiming() verifies that a set of audio tracks and timing information is valid to simulate conversational speech. Unordered turns are rejected. Self cross-talk and cross-talk with 3 or more speakers are not permitted since it would require mixing at the simulation step.
This CL includes extensive tests to match accept or reject decisions on several different timing setups. The setups are simulated using mocks (by far more light-weight than using actual timing and audio track files).

The client code, the unit tests in this case, passes information about the fake audio tracks to MockWavReaderFactory. MockWavReader instances are then created using the parameters defined in the client code. To improve the readability of the tests, generator_unittest.cc includes a docstring explaining how each MultiEndCallSetup* test is documented.

Run tests as follows:
$ out/Default/modules_unittests --gtest_filter=ConversationalSpeechTest.*

BUG=webrtc:7218

Review-Url: https://codereview.webrtc.org/2781573002
Cr-Commit-Position: refs/heads/master@{#17592}
2017-04-07 19:05:08 +00:00
..

Conversational Speech generator tool

Tool to generate multiple-end audio tracks to simulate conversational speech with two or more participants.

The input to the tool is a directory containing a number of audio tracks and a text file indicating how to time the sequence of speech turns (see the Example section).

Since the timing of the speaking turns is specified by the user, the generated tracks may not be suitable for testing scenarios in which there is unpredictable network delay (e.g., end-to-end RTC assessment).

Instead, the generated pairs can be used when the delay is constant (obviously including the case in which there is no delay). For instance, echo cancellation in the APM module can be evaluated using two-end audio tracks as input and reverse input.

By indicating negative and positive time offsets, one can reproduce cross-talk and silence in the conversation.

IMPORTANT: the whole code has not been landed yet.

Example

For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) and b1, b2 (speaker B). The text file with the timing information may look like this:

A a1 0
B b1 0
A a2 100
B b2 -200
A a3 0
A a4 0

The first column indicates the speaker name, the second contains the audio track file names, and the third the offsets (in milliseconds) used to concatenate the chunks.

Assume that all the audio tracks in the example above are 1000 ms long. The tool will then generate two tracks (A and B) that look like this:

Track A

  a1 (1000 ms)
  silence (1100 ms)
  a2 (1000 ms)
  silence (800 ms)
  a3 (1000 ms)
  a4 (1000 ms)

Track B

  silence (1000 ms)
  b1 (1000 ms)
  silence (900 ms)
  b2 (1000 ms)
  silence (2000 ms)

The two tracks can be also visualized as follows (one characheter represents 100 ms, "." is silence and "*" is speech).

t: 0         1        2        3        4        5        6 (s)
A: **********...........**********........********************
B: ..........**********.........**********....................
                                ^ 200 ms cross-talk
        100 ms silence ^