Forum rss-feed

Forum

Developers: Voice Control Agent

Most Recent

written by: 0beron

Nice work! Amazing result after such a short time. Now I'm going to have to seriously think about getting the microphone for the Alpha...

written by: jsn

Thu, 26 Jan 2012 17:41:53 +0000 GMT

So JohnL threw out the "I'd love to be able to talk to the Eigenharp" at the DevCon and I was somewhat skeptical having doen stuff with voice-control systems and finding it extremely frustrating. However, having said that I thought I'd give it a go and got a fairly plausible non-Agent based thing listening and spitting out Belcanto.

However, there are a number of things to discuss about the best way of doing it before I endeavor to make it a real agent:
- I don't have an Alpha with a microphone, only a Tau & Pico. So I can't test it effectively - but even so the Agent should be availble to Tau/Pico players too, I would suggest (in a moral manner). Is there an Agent that will take the microphone buffer form the host computer and stream it like other audio?
- if its an Agent can it send the belcanto phrase straight to the interpreter via a connection (i.e. does it have an output port). If so what's the data structure? Can someone point me in the right direction?
- there needs to be at least wo other outputs : a recognition status (did it understand) and a success/fail (did the phrase get recognized). Are these just statusdata_t again?

Started a wiki page for the spec


written by: geert

Thu, 26 Jan 2012 18:03:25 +0000 GMT

Hi John,

If the agent just has an audio input, you can use the audio agent to get audio into EigenD (it supports both in and out now). This could allow you to use your Mac microphone and wire it into the agent, just as you would wire the Alpha microphone into it. The audio agent allows you to create channels that it streams audio from, if those channels can be found on your actual audio device, you'll get the audio.

How to send the Belcanto, how to determine the context, etc etc ... is probably going to take quite a bit of discussion. I'm sure that John L has some good ideas about that.

The result of the operation is probably something that you don't want to wrap in a statusdata_t as that's tailored towards indicators on a 2D plane. I suspect this will tie into the better error reporting we're going to add to executing Belcanto phrases and using Workbench. That's still under discussion here though, we'll keep you posted.

Take care,

Geert


written by: jsn

Thu, 26 Jan 2012 18:09:39 +0000 GMT

Aha! Hadn't spotted that audio did both.

I have no problem with statusdata_t using the 0 row to indicate a linear (or enum) of status values. In fact, I can see using it such that it could be a light or a keypress for a tone generator.

What is the current best way to send out a signal that can be handled by another agent to create a light or tone output?


written by: jsn

Thu, 26 Jan 2012 18:34:30 +0000 GMT

Can't seem to get that working. Can you send me a wiring diagram of a working version? (or script)


written by: stbohne

Thu, 26 Jan 2012 18:35:09 +0000 GMT

The Eigenharp should answer in Belcanto ... just a thought.


written by: john

Thu, 26 Jan 2012 19:20:28 +0000 GMT

Stefan - Yes, we think so too. We'd actually like to get error feedback happening from Agents using Belcanto. This could then be either spoken (via speech synthesis) to the musician in their in-ear monitoring, or played as the note sequences (which are fairly easily learned) in the same way. There's no particular need for that feedback to be syntactically correct so the lexicon could be used pretty freely.

John - Belcanto can be input direct into the Interpreter as notes on it's keygroup in, which would be the neat way to do it from a speech Agent - one could do all sorts of fun learning type things then as well. It would also make the data stream routable via the normal mechanisms in Workbench.

John


written by: mikemilton

Thu, 26 Jan 2012 19:44:15 +0000 GMT

as long as it does not speak G, A, F, (oct down) F, C

;-p - pffft


written by: jsn

Fri, 27 Jan 2012 09:39:56 +0000 GMT

Agent sending out Belcanto phrase as note sequence. Nice! Didn't think of that - cheers Stefan/John

(have changed spec on wiki)


written by: jsn

Fri, 27 Jan 2012 10:57:56 +0000 GMT

I am perilously close to doing this. It's the audio stream I'm struggling with. Above Geert says:
"...you can use the audio agent to get audio into EigenD (it supports both in and out now). This could allow you to use your Mac microphone and wire it into the agent, just as you would wire the Alpha microphone into it. The audio agent allows you to create channels that it streams audio from, if those channels can be found on your actual audio device, you'll get the audio. "

So how do I do this? Tried adding outputs - nothing. (used VU meter agent to check levels - haha!)

Can't find any example/comments/code.


written by: geert

Fri, 27 Jan 2012 11:43:57 +0000 GMT

Hi John,

I forgot that Apple dissociated the built-in output and inputs. A way to get this to work is to use the Audio Midi Setup application and to create an aggregate device in the 'Audio Devices' section.

Here are some screenshots that should make this easier to set up.

http://eigenzone.org/eigend_aggregate_1.png
http://eigenzone.org/eigend_aggregate_2.png
http://eigenzone.org/eigend_aggregate_3.png

Hope this helps,

Geert


written by: jsn

Fri, 27 Jan 2012 12:00:56 +0000 GMT

Perfect - works as expected.

NOTE: I was trying to use the 'audio 1' unit on the Workbench for setting the audio port, etc. and it wasn't working. Had to use the 'Window > Audio Settings' from the EigenD app itself.


written by: jsn

Fri, 27 Jan 2012 12:39:15 +0000 GMT

Add a 'HowTo Tip' to the 2.0 Documentation wiki on this technique for audio cpature


written by: jsn

Fri, 27 Jan 2012 19:25:15 +0000 GMT

What is the best technique for a start/stop behaviour?

- implement start/stop verbs and have a talker connected to a button ? If so doesn't the talker have to change the script of the talker once pressed to create toggle like behaviour ? (or should one implement 'toggle' as a verb)

- use a pressure input (say) with a threshold - over=on, under=off (seems wrong to use a continuous signal for a binary operation? But I quite like the possible mad uses of this...yaw any key to speak a command)



written by: jim

Fri, 27 Jan 2012 19:21:13 +0000 GMT

We tend to use the 'toggle start ' idiom for that.

Ill dig out an example from the code next week. Youll want it fast from end to end to ensure a snappy response to the 'talk ' key.


written by: jsn

Fri, 27 Jan 2012 19:25:27 +0000 GMT

Cool.

And how do a trigger something in the slow thread from the fast thread? I don't want to do the voice analysis in the fast thread and spoil the audio. So what's the best way of triggering something to happen in the slow thread?


written by: jim

Fri, 27 Jan 2012 19:26:21 +0000 GMT

And I would have thought a helper thread (not the main slow thread) would be the best place for the actual analysis. You dont want to hang up the slow thread for too long either.


written by: jim

Fri, 27 Jan 2012 19:29:32 +0000 GMT

We have the safe_worker_t class for auxilliary processing.


written by: jim

Fri, 27 Jan 2012 20:35:54 +0000 GMT

Oh, and also, it's possible for talker actions to control the talker light directly. An example is how the metronome toggle start can flash the light on the beat.

So having the light go from standby to working to suceeded/failed is perfectly doable.

I'm less sure about feeding the interpreter via notes. if we want a success/fail from the interp to be reflected in the light, might be easier just to rpc the interp.


written by: jim

Fri, 27 Jan 2012 20:41:03 +0000 GMT

And finally for tonight, the unfortunately named ´thing´ class will let your aux worker thread signal back to the slow or fast threads.


written by: john

Fri, 27 Jan 2012 21:21:29 +0000 GMT

Hi John

It occurs to me that the momentary diversion of an audio signal from one path to another could be usefully another, independent agent. It'd be a really simple agent, audio in to 2 x audio out and a control input, but I think it would have wider uses than just diverting the mic signal for your agent and it might be worth implementing it for that reason rather then building something like it into a voice recogniser. One could also use it to talk to the mix engineer, or another band member for example. I'm sure here are a multitude of possible uses, if you did it that way.

John



Please log in to join the discussions