DEEP LEARNING | Notion

1. What Is Deep Learning?

Short definition – Deep Learning (DL) is the part of Machine Learning (ML) that learns automatically by building many layers of artificial neurons.
Why “deep” – Each extra layer lets the network capture progressively higher‑level patterns, just like how the human brain processes visual, auditory, and linguistic data in stages.
From human intuition to computer code – A computer can’t “see a cat” until it’s trained on thousands of cat images, while a human can identify a cat instantly. DL teaches the computer this skill.
Common applications
- Computer vision – facial recognition, object detection, autonomous driving.
- Speech recognition – transcribing calls or dictation.
- Natural Language Processing – chatbots, language translation, sentiment analysis.
- Image & text detection – automatically extracting labels or captions from photos or documents.

Building Block	What it Does	How it Works
Neurons	Basic decision units	Each neuron receives several weighted inputs, adds a bias, then applies an “activation function” to decide whether or not it should fire.
Layers	Structural organization	• Input Layer – takes raw data (pixel values, word tokens, etc.).• Hidden Layers – one or more intermediates that learn features and patterns.• Output Layer – delivers the final prediction (a probability, a class label, a real number, etc.).
Weights	Connection strengths	Numbers that scale the influence of each input on the neuron’s output. They’re learned during training.
Bias	Offset term	A small adjustment added before the activation; it lets a neuron fire even if all its inputs are zero.

Send the data into the first layer – the raw image pixels or text tokens become inputs.
Move through the hidden layers – each neuron multiplies its incoming signals by its weight, adds its bias, and then applies an activation (e.g., turning a raw sum into a squashed value).
Reach the output layer – the final neuron(s) produce a clean, interpretable result (a probability between 0 and 1 for “is a cat?”).
Evaluate the result – calculate a loss by comparing the prediction with the known correct answer. A common loss is the “mean squared error” for regression or “cross‑entropy” for classification.

Compute the gradient – figure out how tiny changes to each weight would change the loss. Think of the gradient as a slope that tells the network which way to adjust.
Adjust the weights – move a tiny step in the opposite direction of the slope. The size of the step is governed by the learning rate (a small number like 0.0001).
Repeat for many batches and epochs – “epoch” means the entire training set has passed through the network once. Over many epochs the network’s predictions improve and the loss shrinks, ideally settling at a deep minimum in the loss landscape.

Quick tip – Too high a learning rate → huge, unstable jumps; too low → painfully slow progress.