Readme file with example that explains how the conversational speech generator tool should work.

BUG=webrtc:7218 NOTRY=True Review-Url: https://codereview.webrtc.org/2722173003 Cr-Commit-Position: refs/heads/master@{#17010}
2017-03-03 07:31:10 -08:00
parent db23ea69b6
commit 810025f9e9
2 changed files with 70 additions and 0 deletions
--- a/webrtc/modules/audio_processing/test/py_conversational_speech/OWNERS
+++ b/webrtc/modules/audio_processing/test/py_conversational_speech/OWNERS
@ -0,0 +1,6 @@
+alessiob@webrtc.org
+henrik.lundin@webrtc.org
+peah@webrtc.org
+
+per-file *.gn=*
+per-file *.gni=*
--- a/webrtc/modules/audio_processing/test/py_conversational_speech/README.md
+++ b/webrtc/modules/audio_processing/test/py_conversational_speech/README.md
@ -0,0 +1,64 @@
+#Conversational Speech generator tool
+
+Python tool to generate multiple-end audio tracks to simulate conversational
+speech with two or more participants.
+
+The input to the tool is a directory containing a number of audio tracks and
+a text file indicating how to time the sequence of speech turns (see the Example
+section).
+
+Since the timing of the speaking turns is specified by the user, the generated
+tracks may not be suitable for testing scenarios in which there is unpredictable
+network delay (e.g., end-to-end RTC assessment).
+
+Instead, the generated pairs can be used when the delay is constant (obviously
+including the case in which there is no delay).
+For instance, echo cancellation in the APM module can be evaluated using two-end
+audio tracks as input and reverse input.
+
+By indicating negative and positive time offsets, one can reproduce cross-talk
+and silence in the conversation.
+
+IMPORTANT: **the whole code has not been landed yet.**
+
+###Example
+
+For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A)
+and b1, b2 (speaker B).
+The text file with the timing information may look like this:
+```  A a1 0
+  B b1 0
+  A a2 100
+  B b2 -200
+  A a3 0
+  A a4 0```
+The first column indicates the speaker name, the second contains the audio track
+file names, and the third the offsets (in milliseconds) used to concatenate the
+chunks.
+
+Assume that all the audio tracks in the example above are 1000 ms long.
+The tool will then generate two tracks (A and B) that look like this:
+
+```Track A:
+  a1 (1000 ms)
+  silence (1100 ms)
+  a2 (1000 ms)
+  silence (800 ms)
+  a3 (1000 ms)
+  a4 (1000 ms)```
+
+```Track B:
+  silence (1000 ms)
+  b1 (1000 ms)
+  silence (900 ms)
+  b2 (1000 ms)
+  silence (2000 ms)```
+
+The two tracks can be also visualized as follows (one characheter represents
+100 ms, "." is silence and "*" is speech).
+
+```t: 0         1        2        3        4        5        6 (s)
+A: **********...........**********........********************
+B: ..........**********.........**********....................
+                                ^ 200 ms cross-talk
+        100 ms silence ^```