Forum rss-feed

Forum

Developers: Voice Control Agent

Most Recent

written by: 0beron

Nice work! Amazing result after such a short time. Now I'm going to have to seriously think about getting the microphone for the Alpha...

written by: jsn

Sat, 28 Jan 2012 00:19:48 +0000 GMT

I think that is a good idea. There is a question of design philosophy : shouldn't it have 1 input and N outputs, taking a continuous signal as the control input and thresholds per output (e.g. 0-0.1 is output 1, 0.1-0.5 is output 2, etc...) This way the creativity can ensue but the norm is accommodated (simple to do 2 outputs on pressure signal from key, but you could switch output on the RMS power of an audio signal)

I need a simple start/stop marker to do the simplest thing for phrase marking, so I'll do that then see how effective I can make continuous listening.


written by: john

Sat, 28 Jan 2012 15:27:47 +0000 GMT

We have a concept of a 'key activaton' in Eigenland and I would think that this would be the best thing to use. It's binary on off (though I think and perhaps Jim or Geert could enlighten me on this that it also has a soft/hard activation and that those thresholds are set in the instrument agent at the moment) and I think the logical way for a momentary switch Agent to work would be to take an incoming keygroup and use those key activations as binary switches. I think the detection of thresholds and generation of activations would be properly done in a separate Agent - its one of those little 'mathematical' style agents we talked about at the Devcon. One could imagine a variety of these, and one of them could indeed produce a kind of ladder of keygroup key activations in the style you mentioned. I do think that keeping that separate from the switching would make it more flexible. I don't think that you need that to do this right now though as key activation thresholds are already generated elsewhere.

Several nice possible configurable behaviours spring to mind. The immediately useful one is of a momentary switch for for just one active output - ie, activate the key for that output and the signal goes there, I guess 'highest key wins' or some such. We could have a straight through output that is active when no other output is selected in this mode, which gives you the toggle switching behaviour you need now as the case when key one of the incoming switch control keygroup is pressed.

The second mode could be multiple momentary switching, ie the signal goes to every output whose key is held down. The third is toggle for each one, press on, press off.

I could see a lot of neat setup options being enabled by an agent like this and it could start very simply with just one in and two outs and get more functional as time went on without breaking backwards compatibility, always a concvern with these things.

Of course the most awkward thing about this is dealing with audio, which can't just be switched or you'll get clicks. A fast crossfade is usually the best (followed by zero crossed switching, which is harder than crossfading usually as finding a zero crossing is not as straightforward as it sounds since actual crossings don't correspond to zero data values, we have to keep in mind we're in a sampled world), so there's a bit of signal processing involved.

I wouldn't bother with any of this right now though - I am so keen to see if the idea functions at all that I'm chomping at the bit to see it work - the convenience of being able to switch the mic in and out of it is icing on the cake in my book! If you just have a keygroup input and take one key press (using it's activation signal) as a 'listen' button, we can arrange all the nice stuff later I think.

John


written by: jsn

Sat, 28 Jan 2012 17:41:57 +0000 GMT

Agree with most of that, and have got on with it.

Thanks for the pointers, Jim. That was exactly what I needed. SImple use of a safe_worker_t made the offloading really simple. As usual its knowing the names of things that helps.

And I am pleased to say I have a working version that listens to audio streams inside EigenD and generates plausible text from it. Hoorah!

Now need to do the laborious bit of setting up the grammar and pronunciation dictionary more extensively. Then I'll YouTube a demo.


written by: jsn

Sun, 29 Jan 2012 17:12:55 +0000 GMT

Success !!!

see my demo on YouTube

That's done using the open mic on my MacBook Pro. Works surprisingly well.

The way I have implemented it also allows for users to alter the pronunciation dictionary so you can customize it to your accent be it foreign or domestic. (see this for a truly excellent rendition of the problem with being Scottish and dealing with voice recognition)

Potentially this could also handle translation from spoken French to Belcanto for example (!)

Now I just need to extend the dictionary to cover it all, clean up the code, figure out how to package it, figure out where the resources should sit....blah blah...


written by: barnone

Sun, 29 Jan 2012 17:13:20 +0000 GMT

Bravo!


written by: alistair

Sun, 29 Jan 2012 17:37:08 +0000 GMT

fantastic!


written by: keyman

Sun, 29 Jan 2012 17:43:07 +0000 GMT

Awesome !!!
Only now I truly understand the importance of that whisky dispensor

So many Breakthrough...


written by: geert

Sun, 29 Jan 2012 17:49:18 +0000 GMT

Amazing!!!!


written by: NothanUmber

Sun, 29 Jan 2012 18:32:59 +0000 GMT

Very cool!
I am awaiting the day where my Eigenharp tells me:

Ferdinand listen
Eigenharp think so
Eigenharp exist do


written by: john

Sun, 29 Jan 2012 19:26:18 +0000 GMT

That is seriously cool.

John


written by: mikemilton

Sun, 29 Jan 2012 19:46:50 +0000 GMT

*very* neat


written by: 0beron

Mon, 30 Jan 2012 09:56:13 +0000 GMT

Nice work! Amazing result after such a short time. Now I'm going to have to seriously think about getting the microphone for the Alpha...



Please log in to join the discussions