I’ve finally spent some time on implementing CoreAudio taps in SoundPusher 1.5. These let you listen in on (“tap”) the output that is generated by certain processes or sent to certain devices.
So far, SoundPusher has done this with a loopback device that reflects any output it receives back into an input stream that looks to the system like a (6-channel) microphone. This has required increasingly more permissions from the system to access microphone input (since the OS doesn’t know what the SoundPusher Audio
device records), and results in a big orange blob in the menu bar to warn the user that they’re being recorded.
Apple has started to provide APIs for recording system audio through audio “taps”, and if they’re good enough for Rogue Amoeba than maybe they’re good enough for me as well.
Unfortunately, Apple’s developer documentation is terrible as usual even though the actual C headers contain some more information. So this is mostly what I’ve figured out.
Taps work by attaching them to an aggregate device using the kAudioAggregateDeviceTapListKey
in the dictionary provided to AudioHardwareCreateAggregateDevice()
. See https://github.com/q-p/SoundPusher/blob/v1.5.1/SoundPusher/AudioTap.mm#L52 for an example.
The aggregate device with the tap (and nothing else — initially I put the device I was tapping into the aggregate device as well but that only confused matters) will then provide an input stream that contains the tapped data. This is similar to reading from a (virtual) microphone, so I didn’t have to change the rest of the code much. Starting to read from such a tap requires a TCC / privacy authorization and results in a less obnoxious purple dot compared to the orange microphone blob. There doesn’t seem to be a way to query this authorization status as is possible for the microphone access using [AVCaptureDevice authorizationStatusForMediaType:AVMediaTypeAudio]
, so if the user denies this request all you will get is silence (and you cannot inform them about the authorization status since you don’t know).
Do note that if the device you’re tapping also contains input streams (like my SoundPusher Audio
loopback device did), then you will in addition still need microphone access to tap that device. I have tried all sorts of ways to disable the input streams to avoid the microphone permissions when tapping a device with inputs, but no such luck. That’s why I have removed all input (and therefore loopback) support from SoundPusher Audio
.
Why is this dummy device then needed at all? Because I need 5.1 channels of input data, and unless the user has a real 6 channel audio device, there won’t be anything good to tap. The system can provide a mono or stereo mix-down when tapping, but if you want something more specific than that, then you need a real (fake) device to prescribe the shape and format you want tapped…