Fork me on GitHub

UNDERSTANDING CORTICAL NETWORKS RELATED TO SPEECH USING DEEP LEARNING ON ECOG DATA

OUR PAPER WAS PUBLISHED on ISBI 2020 as Best Paper Finalist LINK

ECoG Project Summary

In this project I work with Ran Wang and Amirhossein Khalilian-Gourtani and supervised by Prof. Yao Wang and Prof. Adeen Flinker.

The ECoG (electrocorticography) project aims to understand the dynamics by which neural activity propagates across cortex while we think of a word and produce it remains poorly understood via some computational tools. One of the main goal of the project is to developing neural decoders for language processing: The neural decoders will be based on deep-learning architectures able to learn a transformation between neural signals and the speech heard by the patient, the speech produced by the patient, or the semantic concept represented by the stimulus word.

Recent Progress

  • VAE based transfer learning with GAN

In our recent progress, we propose a transfer learning approach with a pre-trained GAN to disentangle representation and generation layers for ECoG to audio decoding. Our approach could achieve high reconstruction accuracy.

By visualizing the attention mask embedded in the encoder, we observe brain dynamics that are consistent with findings from previous studies investigating dynamics in the superior temporal gyrus (STG), pre-central gyrus (motor) and inferior frontal gyrus (IFG).

For more detailed progress please check this page: Stimulus Speech Decoding From Human Cortex With Generative Adversarial Network Transfer Learning, ISBI 2020 Best Paper Finalist and the DEMO

We also explore the speech decoding architecture during multiple language tasks (audio repetition, audio naming, sentence completion, word reading, and picture naming) using the above approach. Here is the summary of the decoding result.

  • Differentiable Audio Synthesizer Based Model

To improve the ECoG to audio decoding, we realized that VAE and GAN based models are not good enough. Auto encoders tend to predict an average spectrogram and it will lose a lot of fine details which make the decoded audio unintelligible. GAN model tries to make the spectrogram more realistic but still there are trade-offs between decoding overall accuracy and intelligibility. Another approach to address this issue is to use a differentiable speech synthesizer which tries to map audio to an acoustic latent space (instead of less regularized latent space). Such model could use much less parameters but achieve more realistic spectrograms in audio to audio reconstruction and downstream ECoG to audio reconstruction. By leveraging speech processing knowledge we could improve the ECoG to audio decoding result by a lot (compared with VAE-GAN based model). This method is still under development and here are some preliminary results:

The work was supported by the National Science Foundation

-----The ---- end ------- Thanks --- for --- Reading----