AI Model Mimics Human Sound Simulation Inspired by Larynx Mechanisms

Chỉnh sửa bởi: Vera Mo

Researchers at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT have developed an artificial intelligence (AI) system capable of producing and interpreting human-like sound simulations. This innovation draws inspiration from cognitive science regarding human communication.

The AI model simulates how sounds are formed through the larynx, throat, tongue, and lips. It generates sound simulations without prior training or exposure to human-produced sounds. The team constructed a model that reflects the nuances of human sound production to create realistic sound simulations, such as mimicking an ambulance siren or a crow's call.

The system operates in two ways: it generates sound simulations and can also deduce real sounds from human vocal imitations. For instance, it accurately distinguishes between a cat's 'meow' and 'purr' based on human mimicry.

The research team developed three increasingly sophisticated versions of the model. The first version focused solely on producing sounds similar to real-life sounds but did not align with human behavior. The second version, termed the 'communication model,' took into account the characteristics of sounds as perceived by listeners.

The final iteration incorporated a reasoning layer, acknowledging that the effort put into sound production affects the outcome. This model avoids producing sounds that are too quick, loud, or exaggerated, resulting in more human-like simulations.

The implications of this research extend to creating more expressive sound interfaces for artists, aiding filmmakers and content creators in generating contextually appropriate AI sounds. Future applications may include language development studies, how infants learn to speak, and the mimicry behaviors of birds.

Despite its advancements, the model faces challenges, particularly with certain consonants, leading to inaccuracies in simulating sounds like buzzing bees. Additionally, it struggles to replicate how humans imitate speech, music, or sounds across different languages.

Professor Robert Hawkins from Stanford University remarked on the complexity of translating real sounds into words, highlighting the intricate interplay of physiology, social reasoning, and communication in language evolution. This model represents a significant step in formalizing and validating theories surrounding these processes.

Bạn có phát hiện lỗi hoặc sai sót không?

Chúng tôi sẽ xem xét ý kiến của bạn càng sớm càng tốt.