Files

charujain 4c72cf43df Revert of Conversational speech tool, simualtor + unit tests (patchset #12 id:220001 of https://codereview.webrtc.org/2790933002/ )

Reason for revert:
Compile Error.

Original issue's description:
> The simulator puts into action the schedule of speech turns encoded in a MultiEndCall instance. The output is a set of audio track pairs. There is one set for each speaker and each set contains one near-end and one far-end audio track. The tracks are directly written into wav files instead of creating them in memory. To speed up the creation of the output wav files, *all* the source audio tracks (i.e., the atomic speech turns) are pre-loaded.
>
> The ConversationalSpeechTest.MultiEndCallSimulator unit test defines a conversational speech sequence and creates two wav files (with pure tones at 440 and 880 Hz) that are used as atomic speech turn tracks.
>
> This CL also patches MultiEndCall in order to allow input audio tracks with same sample rate and single channel only.
>
> BUG=webrtc:7218
>
> Review-Url: https://codereview.webrtc.org/2790933002
> Cr-Commit-Position: refs/heads/master@{#18480}
> Committed: 6b648c4697

TBR=minyue@webrtc.org,alessiob@webrtc.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=webrtc:7218

Review-Url: https://codereview.webrtc.org/2925123003
Cr-Commit-Position: refs/heads/master@{#18481}

2017-06-07 18:59:09 +00:00

BUILD.gn

Revert of Conversational speech tool, simualtor + unit tests (patchset #12 id:220001 of https://codereview.webrtc.org/2790933002/ )

2017-06-07 18:59:09 +00:00

config.cc

Conversational Speech generator, adding unit test.

2017-03-18 10:45:31 +00:00

config.h

Conversational Speech generator, adding unit test.

2017-03-18 10:45:31 +00:00

generator_unittest.cc

Revert of Conversational speech tool, simualtor + unit tests (patchset #12 id:220001 of https://codereview.webrtc.org/2790933002/ )

2017-06-07 18:59:09 +00:00

generator.cc

Conversational Speech generator, adding unit test.

2017-03-18 10:45:31 +00:00

mock_wavreader_factory.cc

MultiEndCall::CheckTiming() verifies that a set of audio tracks and timing information is valid to simulate conversational speech. Unordered turns are rejected. Self cross-talk and cross-talk with 3 or more speakers are not permitted since it would require mixing at the simulation step.

2017-04-07 19:05:08 +00:00

mock_wavreader_factory.h

2017-04-07 19:05:08 +00:00

mock_wavreader.cc

WavReaderAdaptor is a simple adaptor of the existing class WavReader from webrtc/common_audio/wav_file.h. The adaptor was mainly needed to use dependency injection and easily test the MultiEndCall class (see https://codereview.webrtc.org/2761853002/).

2017-04-10 07:53:53 +00:00

mock_wavreader.h

2017-04-10 07:53:53 +00:00

multiend_call.cc

Revert of Conversational speech tool, simualtor + unit tests (patchset #12 id:220001 of https://codereview.webrtc.org/2790933002/ )

2017-06-07 18:59:09 +00:00

multiend_call.h

Revert of Conversational speech tool, simualtor + unit tests (patchset #12 id:220001 of https://codereview.webrtc.org/2790933002/ )

2017-06-07 18:59:09 +00:00

OWNERS

Reland "C++ porting of the initial python script for conversational speech

2017-03-18 09:29:13 +00:00

README.md

Reland "C++ porting of the initial python script for conversational speech

2017-03-18 09:29:13 +00:00

timing.cc

MultiEndCall is responsible for analyzing and validating timing information and audiotracks with which a multi-end call can be simulated.

2017-03-28 12:39:59 +00:00

timing.h

Conversational speech tool: timing model with data access.

2017-03-22 15:23:46 +00:00

wavreader_abstract_factory.h

MultiEndCall is responsible for analyzing and validating timing information and audiotracks with which a multi-end call can be simulated.

2017-03-28 12:39:59 +00:00

wavreader_factory.cc

2017-04-10 07:53:53 +00:00

wavreader_factory.h

MultiEndCall is responsible for analyzing and validating timing information and audiotracks with which a multi-end call can be simulated.

2017-03-28 12:39:59 +00:00

wavreader_interface.h

2017-04-10 07:53:53 +00:00

README.md

Conversational Speech generator tool

Tool to generate multiple-end audio tracks to simulate conversational speech with two or more participants.

The input to the tool is a directory containing a number of audio tracks and a text file indicating how to time the sequence of speech turns (see the Example section).

Since the timing of the speaking turns is specified by the user, the generated tracks may not be suitable for testing scenarios in which there is unpredictable network delay (e.g., end-to-end RTC assessment).

Instead, the generated pairs can be used when the delay is constant (obviously including the case in which there is no delay). For instance, echo cancellation in the APM module can be evaluated using two-end audio tracks as input and reverse input.

By indicating negative and positive time offsets, one can reproduce cross-talk and silence in the conversation.

IMPORTANT: the whole code has not been landed yet.

Example

For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) and b1, b2 (speaker B). The text file with the timing information may look like this:

A a1 0
B b1 0
A a2 100
B b2 -200
A a3 0
A a4 0

The first column indicates the speaker name, the second contains the audio track file names, and the third the offsets (in milliseconds) used to concatenate the chunks.

Assume that all the audio tracks in the example above are 1000 ms long. The tool will then generate two tracks (A and B) that look like this:

Track A

  a1 (1000 ms)
  silence (1100 ms)
  a2 (1000 ms)
  silence (800 ms)
  a3 (1000 ms)
  a4 (1000 ms)

Track B

  silence (1000 ms)
  b1 (1000 ms)
  silence (900 ms)
  b2 (1000 ms)
  silence (2000 ms)

The two tracks can be also visualized as follows (one characheter represents 100 ms, "." is silence and "*" is speech).

t: 0         1        2        3        4        5        6 (s)
A: **********...........**********........********************
B: ..........**********.........**********....................
                                ^ 200 ms cross-talk
        100 ms silence ^