DoubleTalk data and location specification
Overview
EMATviewer uses the HDF5 EMA position data files and the wav audio files created for the data gathered by the DoubleTalk project. Each HDF5 position file and associated audio file contain the recorded data gathered during an experiment section, such as the subject reading a passage of text.
The directory structure for the HDF5 and audio files should mirror that of the Yeah DoubleTalk repository available on DataStore in the folder PlanArt/DoubleTalk. The data can be copied to a local folder to speed up access. Only the data in the ’ema’ and ‘audio’ directories are required, so a copy of the other data files under directories containing the word ‘chunk’ or ‘textgrid’ are not needed.
Directory Structure
The HDF5 and audio files are arranged in a directory structure similar to these examples:
\R0033_cs5\ema
\R0033_cs5\audio
Where the parent directory is composed of “R00” + Recording number + “_” + Speaker I.e. in the above example, R0033_cs5, the “Recording number” is 33 and the “Speaker” is cs5
EMA datafile name
The filename for the HDF5 ema data files have a similar structure to this example:
R0033_cs5_jaw_0026.h5
Where the filename is composed of ‘R00’ + Recording number + ‘_’ + Speaker + ‘_’ + Sensor + ‘_’ + File Number + ‘.h5’ I.e. in the above example, R0033_cs5_jaw_0026.h5, the “Recording number” is 33, the “Speaker” is cs5, the “Sensor” is jaw and the “File Number” is 26
Audio file name
The filename for the audio files has a similar structure to the ema data files, but a sensor name is not required:
R0033_cs5_0026.wav
Where the filename is composed of ‘R00’ + Recording number + ‘_’ + Speaker + ‘_’ + File Number + ‘.wav’ I.e. in the above example, R0033_cs5_0026.wav, the “Recording number” is 33, the “Speaker” is cs5, and the “File Number” is 26
Columns in the data spreadsheet
These are the required and optional columns used in the spreadsheet for DoubleTalk data. The columns can be arranged in any order within the spreadsheet. Other columns may also be present, if the extra column name is not used for outputting data then it will be ignored.
For an example of the required and optional columns see the DoubleTalk-example-data-spreadsheet.xlsx
Required Columns
| Column Name |
Description
|
Type |
|---|---|---|
| RecNo | Recording Number for the experiment. e.g. for experiment R0033 the recording number would be 33 | number |
| Speaker | The subject speaking in the experiment. This will be either cs5 or cs6. Only the last 3 characters are used, i.e. ‘2cs5’ will be taken as ‘cs5’ | text |
| File Number | The file number within the given experiment | number |
| Beg | The time in seconds when the item starts within the data file | number |
| End | The time in seconds when the item ends within the data file | number |
Optional Columns
|
Column Name
|
Description
|
Type
|
|---|---|---|
| A_Offset_Time | The time in seconds when the first movement ‘A’ ends within the data file. If this column is omitted then the items will be assumed to have only one movement | number |
| B_Onset_Time | The time in seconds when the second movement ‘B’ starts within the data file. This can be left blank if the B movement starts immediately after, or overlaps with, the proceeding A movement. If this column is omitted then the items will be assumed to have only one movement | number |
HDF5 EMA data file format
Each HDF5 ema data files contains the carstens x, y an z axis position data gathered for one sensor. The HDF5 container structure has as single dataset ‘/position’ which is an array of floating point numbers containing the position data for the 3 axis.