Extension for Creating New Data: Variational Autoencoders
Models That Create Data Through Competition: Generative Adversarial Networks
The Magic of Slowly Blurring Then Becoming Clear Again: Diffusion Models

AI can recognize cats in photos, understand speech, and create natural sentences. For these capabilities, computers must look at complex data made of many numbers and discover meaningful rules within them. Seemingly complex data like photos — made of thousands of pixels — often follows simple rules. Cat photos of all poses and colors all share the common characteristic of ''cat.'' This is the concept of ''manifold'': complex data is actually gathered in simpler, lower-dimensional structures. Manifold learning is AI learning these hidden structures on its own.

In this chapter we examine AI technologies that understand data and create new data based on manifold concepts.

Autoencoders: Extracting Only the Core from Complex Data

An autoencoder extracts only the most important characteristics from complex data and learns to recreate original data from them — like reducing a book to a summary then restoring the content from that summary. It consists of an encoder (compressing to a latent vector) and decoder (restoring from the vector). What autoencoders truly learn is not memorizing data but essential characteristics and rules penetrating through it. This is unsupervised learning — no correct answers needed. Applications include data compression, image restoration, noise removal, and anomaly detection.

Variational Autoencoders: Extension for Creating New Data

Standard autoencoders compress data to one fixed vector, making it difficult to generate new data. Variational Autoencoders (VAE) solve this by expressing data not as one point but as a probability distribution — a range where similar data likely exists. This enables generating completely new data by sampling from within this range. VAE simultaneously optimizes data reconstruction accuracy and maintaining the latent space distribution close to a standard normal distribution, marking a major shift from data restoration to data generation.

Generative Adversarial Networks: Creating Data Through Competition

GAN uses two competing neural networks: a Generator creating realistic fake data from random values, and a Discriminator distinguishing real from fake. Through continuous competition both improve — the generator creates increasingly realistic data while the discriminator becomes increasingly discerning. Eventually the generator creates data nearly indistinguishable from real. Applications include photorealistic face generation, artwork creation, and deepfakes. Challenges include training instability and mode collapse (limited diversity). Distance measures used include KL Divergence, Jensen-Shannon Divergence, and Wasserstein Distance (used in Wasserstein GAN for more stable training).

Normalizing Flow Models: Calculating Probability While Changing Structure

VAE and GAN cannot accurately calculate how realistic generated data is as a probability. Normalizing Flow models solve this by starting from a simple known distribution and transforming it through precisely calculable, reversible steps into complex distributions. This enables exact probability calculation for generated data — useful for anomaly detection and data selection. However, transforming complex data at once makes calculations difficult, raising the question: "Must everything be changed at once?"

Diffusion Models: The Magic of Slowly Blurring Then Becoming Clear Again

Diffusion models think in the opposite direction: rather than directly creating complex data, they start from clean data and gradually add small amounts of noise (making it blurry) until it becomes pure noise. Then they learn the reverse process — gradually restoring noisy data to clean data. After sufficient learning, they can restore realistic images starting from pure noise.

This step-by-step approach is much more stable than GAN''s single-shot generation. Popular tools like Stable Diffusion, DALL·E-2, and Midjourney all operate on this principle. Diffusion models are now expanding beyond images to video generation.

The Journey of Learning Structures Hidden Within Data

From Autoencoders mapping manifolds, to VAE enabling generation, to GAN using competition, to Normalizing Flows calculating exact probabilities, to Diffusion Models with step-by-step stable generation — each approach has different strengths and limitations, but all share the common goal: discovering the essential structure within complex data and creating something new based on it. Going forward, AI will handle increasingly complex data types including emotions, situations, and abstract concepts, making these generative model foundations ever more important.