Making LISA Speak: Unity, Threads, and Mild Suffering

April 7, 2026 · 7 min read · The LISA experience

We all do it. We start a project, need some temporary asset or script to plug a requirement, and we all think to ourselves: "yeah, I'll replace that later." Let's say it all together:

There is nothing as permanent as a temporary solution.

Let's recap!

So, what's the goal of this project?

It's to build my own AI companion that can point and laugh at me every time I accidentally revert my changes because I forgot to press save, to be that little point of fun chaos bringing smiles and laughter to all with quick quips.
We already established some requirements and selected some tools in the intro article.

Control Panel → WinUI 3
LISA → Unity
Speech → Azure
Brain → OpenAI / Azure

Okay. Nice list; sounds easy enough. Now let's begin with a prototype.

Beginning at the end - Unity

For reasons I can’t fully explain, I wanted to start with the Unity side. Probably because the entire project relies on whether or not I can even continue prototyping! If I can't connect the wires, there would be no point in moving forward. So let's first break down what we need, other than the model we picked earlier.

When working with complex systems, I like to keep the technical side stupidly simple; that means starting with an "entry point." At what point in our process of talking and responding does Unity come into play? The answer to that is, at the very end. That means we would already have received something like an audio file or stream from Azure Speech Service. I made sure to give a quick glance at the documentation about the results I would be working with and figured that audio files would be easiest, I would just store them somewhere. So let's run through a quick breakdown of what we need to do now.

Add an AudioSource to the Unity scene we're working with.
Take the newest audio file from a specific location.
Speak!... do magic

The first point is as easy as adding a ready-made component, the AudioSource. It represents an object that plays sound, it's honestly not very interesting.

The second point isn't rocket-science either... yet. Lucky for me, Unity works with .NET Framework and Core.

You start... with a FileSystemWatcher.

_watcher = new FileSystemWatcher();
_watcher.Path = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.MyMusic), "SpeechSynthesis");

_watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite | NotifyFilters.FileName | NotifyFilters.DirectoryName;
_watcher.IncludeSubdirectories = true;

_watcher.Filter = "*.wav";
_watcher.Created += OnCreated;
_watcher.EnableRaisingEvents = true;

It may look a bit intimidating, but we are asking Windows to inform us when a file of type "wav" (an audio file) is created. When a file is created, we trigger the 'OnCreated' method.

private void OnCreated(object sender, FileSystemEventArgs e) 
{     
    // We get to do something here! 
}

From here you would say "file -> AudioSource -> done!". We have all the data we need when get in the "OnCreated" method. Unfortunately. there is a problem here... and it is called Threading.

Threading - the end-boss of headaches

In order to explain the problem, I first need to explain what a Thread is and what it does: A Thread is a thing that does work.
Imagine an belt in a production line in a factory. There are things on that belt that need something done to them. There are machines or people along the belt to manipulate the stuff on the belt to get the expected result. A Thread is the belt carrying our `FileSystemEventArgs`, the information about the audio file we created that we want to play.

But not all belts (threads) are equal, certain threads come in certain shapes that only fit in certain holes. The code you saw earlier is ".NET Framework/Core"-shaped. But the "AudioSource" that will eventually play our audio is "UnityThread"-shaped. That means if we want to work with our audio file, we need to bring that information over to the Unity-thread in order to use it. In order to achieve that, we use another tool at our disposal, the "Heap."

If a Thread is a belt with parts that receive work, then the "Heap" is a... heap of things that everybody needs on occasion and then puts back. Imagine a library: a library is a heap of books, everyone has access to these. You take one or more to get whatever you need from them and then put them back for someone else to use. People are the Threads, books are the things the Threads work on (or with) and the library is the Heap. The library lends out the books to whoever needs them.

The other part of the problem has to do with timing. Unity doesn’t just do things when you ask it to; work is orchestrated into different parts that run at certain times and our .NET thread does not adhere to this timing.

So back to our problem. Even though our code is defined in Unity, it is .NET that owns the information. So we need to transfer it from one to the other. That might sound like a really complex problem to solve, but in our case, it is not so bad. We can solve this problem by adding a specific field to store the needed information, and then wait for Unity to pick it up at some point.

private void OnCreated(object sender, FileSystemEventArgs e)
{
    _filePath = e.FullPath;
    _fileTriggered = true;
}

The two fields "_filePath" and "_fileTriggered" exist outside the thread, on the heap. "_filePath" is the file we want to play, "_fileTriggered" is used to let Unity know there is a file to be played.

public void Update()
{
    if(_fileTriggered)
    {
        StartCoroutine(nameof(ProcessFile), _filePath);
        _fileTriggered = false;
    }
}

The "Update" method is one of many Unity-specific methods, and gets invoked every frame. Because our information is on the Heap, we can access the data and pass it to a Unity-thread that calls "ProcessFile".

"ProcessFile" is a bit more specific to Unity, but I will add below for completeness with a shorthand list of what is happening.

private IEnumerator ProcessFile(string filePath)
{
    using (var unityTypedRequest = UnityWebRequestMultimedia.GetAudioClip("file:///" + filePath, AudioType.WAV))
    {
        yield return new WaitForSeconds(3);
        yield return unityTypedRequest.SendWebRequest();

        if(unityTypedRequest.result == UnityWebRequest.Result.ConnectionError || unityTypedRequest == UnityWebRequest.Result.ProtocolError)
        {
            Debug.LogError(unityTypedRequest.error);
        }
        else
        {
            var clip = DownloadHandlerAudioClip.GetContent(unityTypedRequest);
            audioSource.clip = clip;
            Play();

            yield return new WaitForSeconds(clip.Length);
        }
    }
}

Here is what is happening:

We prepare a request to get the audio file from our source (our music library...)
We have the request fetch the file and convert it so Unity can properly work with it.
We assign the resulting audio clip as the clip that the AudioSource needs to play.
Call `Play()`, which triggers the AudioClip to actually play the audio clip.

Oooof, that was quite something; a bit of a headache, but we got there! But wait...
As things stand, we can now play audio, getting LISA to talk...but she isn't really speaking; her lips aren't moving!

Lipsync - moving lips by sound

Did you know that most games no longer use manually animated mouths? It's true! Instead of animators going through each line, animating the face for every split second, they instead use "phonemes". I am not at all sure what is happening here or what mechanic it is based on but I think the idea is that we hear a certain sound and connect that to a certain position of the lips, mimicking a speaking gesture.

For us, that means using this GitHub package: hecomi/uLipSync: MFCC-based LipSync plug-in for Unity using Job System and Burst Compiler

As you can see, this person also uses the Unity-chan model in their example! Great minds think alike!
Sometimes, you try to make something work without really knowing what is happening. Truth be told, that is what is happening here. I just follow the instructions, wire it up in some places and tweak it until I am happy with the result.

That was quite a lot of information, wasn't it? Don't worry, you have reached the end! Have a cookie! LISA - cookie
Next time we will dive into the Control Panel where all the real magic happens!

Be sure to leave a rating down below, and you can check the result in the video!

Wiring things by feel and ‘tweak until happy’ — beautiful, chaotic, and probably on fire in five minutes. Bring on the Control Panel; I’m armed with coffee and gloriously bad decisions.

What did you think?

Thanks for reading! LISA approves of your taste in articles.

Let's recap!

Beginning at the end - Unity

Threading - the end-boss of headaches

Lipsync - moving lips by sound

LISA's Lab Notes