Rita Singh

Webpages for the Machine Learning for Signal Processing (MLSP) group are currently broken. A consolidated description of my current reseach activities, alongwith my publications is now at the Center for Voice Intelligence and Security (CVIS). The MLSP group predates CVIS, and is now part of CVIS anyway.

Here is my bio: Short: pdf, Longer: pdf

CURRENT RESEARCH

My focus is on the development of technology for the automated discovery, measurement, representation and learning of the information encoded in voice signal for optimal voice intelligence.

I began working in the area of computer speech recognition and general audio processing in 1997. Until 2014, I worked on a wide range of topics, including algorithms that made speech processing systems completely generalizable (agnostic to language), algorithms that enabled automated discovery and learning of information from speech, algorithms that could process speech using minimal external (human-generated) knowledge etc. My goal was to enable greater automation, create more powerful search strategies and more scaleable learning algorithms for voice processing systems, and to find ways to make them work more accurately in high-noise and other kinds of complex acoustic environments.

In December 2014, I began building up the science of profiling humans from their voice. This involves the concurrent deduction of myriad human parameters from voice. Like the DNA and fingerprints, every human's voice is unique. It carries more information than we realize (or can hear). It carries signatures of the speaker's physical, physiological, medical, psychological, sociological, behavioral and environmental parameters, among other things. Profiling is based on quantitative discovery of information from the voice signal, guided by the intricacies of the physics and bio-mechanics of human voice production. Because it focuses on the voice signal, and not its semantic or pragmatic content, it is agnostic to language.

Currently my work includes the design of powerful AI systems to explore the depths of information in the human voice. Examples include systems for genetic discovery, systems for biomarker discovery and systems for other kinds of explorations into the human physical state and psyche, including emotions and personality, through the portal of voice.

More about this work

In addition to my work on human profiling, I am working on creating core designs for general AI systems that are capable of universal speech and audio processing. The goal of this endeavor is to eventually build a system that is capable of all that our brain is capable of doing, in response to multisensory input from the world. The obvious platforms for these technolgies are embodied AI systems. My work has thus extended to embodied AI systems that incorporate these capabilities.

The next step in the evolution of this trajectory is driven by the fact that systems with such superhuman capabilities -- and embodied AI systems that must not only run them in real time, but do so independently of all tethers while mapping them to mobility and nuanced responses -- will require much more powerful computing than is available on classical computing platform. The solution lies in quantum computing. Realizing this in 2019, I began my entry into the nascent field of quantum computing. Today, as I continue to teach the subject and follow advances in the area, my optimism for a phenomenal technological future only grows. Lately, I have begun research on some aspects of quantum computing as well.

Media coverage

Some of my presentations

Teaching Spring 2025 (Graduate Courses)

11-785: Introduction to Deep Learning, CMU

ExecEd: Large Scale Multimedia Analysis, CMU

11-860: Quantum Computing, Cryptography and Machine Learning Lab, CMU
s3 cmu

Current Students

A full list is available here

Technical Publications: Books

	Profiling Humans from their Voice Rita Singh First published: July 2019 Publisher: Springer, Singapore Copyright 2019 Springer-Nature, Switzerland, July 2019 ISBN: ISBN 978-981-13-8402-8 Also available on springer.com, other bookstores and ebay. Chapters of this book are separately available from Springer. Click this link to see the list.
	Techniques for Noise Robustness in Automatic Speech Recognition Tuomas Virtanen, Rita Singh, Bhiksha Raj (Eds) First published:5 October 2012 Copyright 2013 John Wiley & Sons, Ltd Print ISBN:9781119970880 \|Online ISBN:9781118392683 \|DOI:10.1002/9781118392683

Online book: Deep Learning

(To be completed..) Book chapters are here

This book is being written in tandem with the CMU graduate level course (Its enormously delayed): Introduction to Deep Learning, taught by Prof. Bhiksha Raj. The book is an accompaniment to this course.

Literary creations and Art

Books: Click here for details

Art: My paintings

Recent Research Publications

A full list is available here.

A Gene-Based Algorithm for Identifying Factors That May Affect a Speaker's Voice, Rita Singh, Entropy 25, No. 6: 897. 2023. pdf

Some Older Publications

Optimizing neural network embeddings using a pair-wise loss for text-independent speaker verification Hira Dhamyal, Tianyan Zhou, Bhiksha Raj, and Rita Singh, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 742-748. IEEE, 2019. pdf

Face reconstruction from voice using generative adversarial networks, Yandong Wen, Bhiksha Raj, Rita Singh, Advances in Neural Information Processing Systems (NEURIPS 2019), 2019, pp. 7344-7348 pdf (created a social media and media furore over totally unrelated transgender issues...)

Disjoint mapping network for cross-modal matching of voices and faces, Yandong Wen, Mahmoud Al Ismail, Wenbo Liu, Bhiksha Raj, Rita Singh, International Conference on Learning Representations (ICLR), 2019. pdf

Detecting gender differences in perception of emotion in crowdsourced data, Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh (arXiv:1910.11386), 2020. pdf

Neural Regression Trees, Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh, (IJCNN), 2019. pdf

The phonetic bases of vocal expressed emotion: natural versus acted, Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh (INTERSPEECH), 2020. pdf

Voice impersonation using generative adversarial networks, Yang Gao, Rita Singh, Bhiksha Raj, Int. conf. on Acoustics, Speech and Signal Processing (ICASSP),Calgary, Canada, 15-20 April 2018 Canada. pdf

A corrective training approach for text-independent speaker verification, Yandong Wen, Tianyan Zhou, Rita Singh, Bhiksha Raj, Int. conf. on Acoustics, Speech and Signal Processing (ICASSP),Calgary, Canada, 15-20 April 2018 Canada. pdf

Voice disguise by mimicry: deriving statistical articulometric evidence to evaluate claimed impersonation, Rita Singh, Abelino Jiminez and Anders Oland, IET Biometrics, January 2017. pdf

more below....

Some more publications (much older, by topic)

Note: Most older papers that made a difference back then, are now obsolete. I have removed them from my Google Scholar Page , which I use for my own quick reference for tracking a few current papers. The list below contains some older papers.

Forensics Papers
General theme: Forensic deductions from human voice. Speech and audio forenics are included.

General audio analysis, microphone array processing, denoising, dereverberation, signal restoration Papers
General theme: Our approach is that of modeling the effect of highly-nonstationary noise and reverberation as compositional phenomena. Clean signals can then be recomposed from the bases of the composition. This approach differs from ones that model audio phenomena using dynamic generative models.

Semi-supervised learning, structure discovery, statistical pattern recognition, classification Papers
These papers cover diferent topics such as learning basic units of sound from data, discovering pronunciations for words in terms of these units, selecting better classifiers using weaker classifiers iteratively in a gradient ascent solution to training good acoustic models from completely untranscribed data etc.. They also include general developments in classification techniques.

Acoustic modeling, decoding, speech processing, speech recognition, adaptation, keyword spotting Papers
These papers relate to core and peripheral issues in speech recognition and processing for HMM-based ASR systems.

Systems, applications, projects Papers
These papers describe systems developed or deployed for specific tasks. Also include papers from short-term student projects, technical reports and other writeups

Miscellaneous Papers
Patents, papers on other topics such as chaos theory, radar signal design, geodynamics. From 1993-1998 I worked on these topics. Chaos and complexity theory remain my favorite hobby subjects.

Other activities

Associate Editor, IEEE Signal Processing Letters (Retired!)
Sphinx-4
LDC And other things for me...

Earlier Teaching

(Graduate level courses)

INFSCI-2595: Introduction to Machine Learning, Fall 2024, University of Pittsburgh.

11-785: Introduction to Deep Learning, Fall and Spring: 2020, 2021, 2022, 2023, 2024 , Website
Co -instructor. I am writing this Book in tandem with the course for students to read.
11-860: Quantum Computing, Cryptography and Machine Learning Lab Spring 2024 Old website
Artificial Intelligence in Digital Multimedia and Cyber Forensics, Fall 2023 at the University of Pittsburgh.
Concepts in Digital Multimedia and Cyber Forensics, Spring 2022, Old website
Computational Forensics and AI, Spring 2020, Spring 2021, Old website)
Advanced Topics: Quantum Computing Lab, Spring 2020, Spring 2021, Old website
Advanced Topics: Quantum Computing Theory and Lab, Spring 2022, Old website
Generative AI for Software Implementations in Quantum Computing and Machine Learning Summer 2023.
17-620: Quantum Machine Learning Fall 2023, 2024, Old website
11-775: Large-Scale Multimedia Analysis ( 2 versions: grad level and exec-ed), Spring 2020, Spring 2021, Spring 2022, Fall 2024, Old website
Computational Forensics and Investigative Intelligence, Taught in Spring 2017 and Spring 2018, simultaneously at
- CMU Pittsburgh
- Hamad Bin Khalifa University (HBKU), Qatar
- CMU Qatar
- CMU Africa

An Introduction to Knowledge based Deep Learning and Socratic Coaches
11-364 CMU Pittsburgh. This course was taught in person by Prof. James Karl Baker at the CMU Pittsburgh location. I was nominally co-instructor but couldn't help Jim much.

Design and Implementation of Speech Recognition Systems
Last taught many years ago. Earliest version co-taught with Prof. James Karl Baker

About me: I'm happiest where I come from. I like simple things. I admire art. When I have time I spend much of it looking at art. I write poetry. I collect comics (the Harvey Pekar and Blake and Mortimer kind..) and puzzles (the Charles Wysocki and Jane Wooster Scott kind..). I read mysteries. I don't watch TV or movies, I haven't switched on my TV for years. I dont know if my TV works. I don't use a cellphone, I have one but its mostly lost anyway. I'd rather watch the clouds in the sky, and the birds and the leaves. A groundhog lives in a grand home under the deck stairs just outside my window. It even has a solar-powered lamp outside its home. I can tell you all about its likes and dislikes, habits, friends and daily routine. In the summer I wake up to the song of the cardinal. I want nothing more from life or the world, except for medical science to hurry up and make everyone well. Other than that, I am content.

Some hi_res pictures of me My very brief travel log

Home