Smart Speaker – Embedded Microphone Test Sequence
A test sequence to measure the microphone array performance in a smart speaker.

Final display for Smart Speaker – Embedded Microphone Sequence
This sequence demonstrates a method by which SoundCheck can measure the performance of a microphone embedded in a so-called “smart speaker”. This example assumes that the DUT is an Amazon Echo but it can be adapted for use with virtually any other type of smart speaker by substituting the Echo’s voice activation phrase WAV file (“Alexa”) with one specific to the desired make and model.
The sequence begins by playing a voice activation phrase out of a source speaker, prompting the DUT to record both the voice command and the ensuing stepped sine sweep stimulus. A message step then prompts the operator to retrieve this recording from the DUT’s cloud storage system. This is accomplished by playing back the recording from the cloud and capturing it with a Triggered Record step in the SoundCheck test sequence. The Recorded Time Waveform is then windowed (to remove the voice command) and frequency shifted prior to analysis and the result (Frequency Response) is shown on the final display step.
We recommend reading our AES paper on this subject prior to continuing as it contains additional details on the test methods devised for this sequence.
Hardware
- Listen AudioConnect audio interface part #4050 or similar
- Listen SCM-3 reference microphone part #4002 or similar
- Calibrated mouth simulator or source speaker
- Listen SCAmp audio test amplifier part #4060 or similar (not needed if mouth/speaker is self-powered)
Software
- SoundCheck Basic or above, version 15.0 or later
- Post Processing module Part # 2004
- Virtual Audio Cable (setup instructions below)
Setup & Calibration
- Calibrate the mouth or source speaker per the instructions in the SoundCheck manual
- Set up the hardware per the system diagram below
Virtual Audio Cable Setup
Virtual Audio Cable (VAC – available here) is a 3rd party software application which is essentially a WDM driver that allows you to route audio streams between applications or devices. Since Windows recognizes it as a WDM device, it is easy to configure in SoundCheck. The default VAC settings (single cable) will work fine for the purpose of running this sequence. Note: VAC is not available for MacOS.
Once you have installed VAC on your SoundCheck computer, take the following steps to configure it:
- Windows: go to Control Panel > Sound and make VAC the default Playback and Recording device.
- SoundCheck – Hardware: go to Setup > Hardware. Add an Input and Output WDM Channel and assign each to VAC configured as follows:
- Select Ch: Mono
- Vp: 1
- A/D: Digital
- Sampling rate: same as your Source Speaker’s Output Channel
- Bit depth: 16 bit (minimum)
- Input Latency: 0
- SoundCheck – Calibration: When you first open this example sequence, you will likely be confronted by a Relink Dialog, informing you that the Input Signal Path “VAC In 1” is not present on your system. In the dialog’s Input relink table, under System Signal Path to Use, click on the dropdown menu and select Add to System Calibration. Next, click OK and then click Yes when asked if you want to import the calibrated device files associated with this new signal path. At this point, the Calibration table will open and you should assign whatever Hardware Input channel you created for VAC to VAC In 1 and confirm that the Unity Digital In (AES17) calibrated device file is assigned to this Signal Path. Note that VAC is only used as an input in this example sequence.
Relink Dialog – Add VAC In 1 to System Calibration
You are ready to start the sequence.
Notes:
- Step 1, WAV – Default activation phrase is for Amazon Echo. If testing another device, replace this WAV file with one containing the proper command phrase for the make/model of the desired DUT. The WAV sample rate is 44.1 kHz so you will need to resample if your Output Channel sample rate is different.
- Step 5, Triggered Record – Default value is 100m FS. It may need to be adjusted to properly trigger from the playback level of the DUT recording. If the trigger threshold is not exceeded within 60 seconds, the step will time out and the sequence will stop running.
References:
- Glenn Hess et al., “Challenges of IoT Smart Speaker Testing”, presented at the AES 143rd Convention, October 2017.