Neural Audio Theory

By Eduardo J. Barrios

Neural Audio Theory is an open educational reference for both people making music with AI and people engineering the systems behind it.

Choose Your Path

User Guides

For musicians, producers, and curious readers who want plain-language explanations, better prompts, practical workflows, troubleshooting, and responsible release guidance. No machine-learning background is required.

Engineering Docs

For developers, researchers, and technical readers who want signal processing, representations, architectures, training methods, evaluation, APIs, and system-design details.

Both paths describe the same field from different levels of abstraction. You can switch between them whenever a practical question needs a technical explanation—or an engineering concept needs a musical example.

What You Will Learn

This handbook covers the full landscape of AI music — from foundational audio theory to production workflows to cutting-edge research:

Foundations

Audio Fundamentals — digital audio, psychoacoustics, music theory, and codecs
Concepts — embeddings, latent spaces, neural codecs, text-audio alignment, and music representations

Engineering

Mathematics — FFT, mel spectrograms, attention math, loss functions, and signal processing
Architecture — transformers, diffusion models, VAEs, GANs, and U-Nets for audio
Training — dataset curation, augmentation, training strategies, and evaluation metrics

Systems

Model Zoo — MusicLM, MusicGen, Stable Audio, Jukebox, Suno, and Udio
Advanced Topics — multimodal generation, real-time inference, fine-tuning, and controllable generation

Practice

Producer Handbook — workflows, troubleshooting, genre prompting, mixing, stem separation, and vocal synthesis
Prompt Engineering Guide — conditioning embeddings and prompt structure
Tools & Ecosystem — DAW integration, open-source tools, and API patterns

Reference

Ethics & Legal — copyright, training data rights, and responsible use
Glossary — comprehensive A–Z reference of AI music terminology

Core Engineering View

Most AI music systems follow a practical pipeline:

Data preparation: normalize, segment, and annotate large music/audio corpora
Representation: transform audio into spectrograms, codec tokens, or latents
Modeling: train sequence or diffusion networks with conditional inputs
Inference control: steer generation with prompts, structure tags, and guidance scales
Post-processing: mixing, mastering, and quality assurance
Evaluation: combine objective metrics and human listening tests

If You Just Want to Make AI Music

If your main goal is creating songs quickly, start with the beginner page:

For Dummies: AI Music in Plain Language

It translates the same engineering foundations into plain language while staying technically accurate, so you can move from "just prompting" to more consistent, controllable outputs.

Then dive into the Producer Handbook for practical workflows, genre-specific prompting tips, and mixing techniques for AI-generated audio.

Example Mathematical Building Blocks

Continuous Fourier transform for a signal $x(t)$ :

X(f) = \int_{-\infty}^{\infty} x(t) \, e^{-j2\pi ft} \, dt

Cosine similarity for embedding vectors $\mathbf{a}$ and $\mathbf{b}$ :

\text{sim}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\| \, \|\mathbf{b}\|}

Use the sidebar to explore all sections for a full engineering-level understanding.

Choose Your Path​

User Guides​

Engineering Docs​

What You Will Learn​

Foundations​

Engineering​

Systems​

Practice​

Reference​

Core Engineering View​

If You Just Want to Make AI Music​

Example Mathematical Building Blocks​