course_model

Neural Networks

Outcomes

Links:

What Is a Neural Network?

A neural network is a computational model loosely inspired by the brain. It learns to map inputs to outputs by adjusting the strength of connections between layers of simple processing units called neurons.

The key insight: instead of writing rules by hand, you show the network many examples and let it figure out the rules itself.

Core Concepts

Neurons and Layers

A network is organized into layers:

Each neuron computes a weighted sum of its inputs, adds a bias term, then passes the result through an activation function to introduce non-linearity.

output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + b)

Activation Functions

Function Formula Use case
ReLU max(0, x) Hidden layers (most common)
Sigmoid 1 / (1 + e⁻ˣ) Binary classification output
Softmax eˣᵢ / Σeˣⱼ Multi-class classification output

Learning: Forward Pass → Loss → Backpropagation

  1. Forward pass — data flows through the network to produce a prediction
  2. Loss function — measures how wrong the prediction is (e.g., mean squared error, cross-entropy)
  3. Backpropagation — computes gradients of the loss with respect to each weight
  4. Optimizer — adjusts weights to reduce loss (e.g., stochastic gradient descent, Adam)

This cycle repeats for many epochs (full passes over the training data).

PyTorch Example: Classifying the Iris Dataset

PyTorch is the dominant research framework. This example classifies the Iris dataset.

You may need to run the pip install command below to get the required libraries.

#%pip install torch scikit-learn
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# --- Load and prepare data ---
iris = load_iris()
X, y = iris.data, iris.target

scaler = StandardScaler()
X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train = torch.tensor(X_train, dtype=torch.float32)
X_test  = torch.tensor(X_test,  dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
y_test  = torch.tensor(y_test,  dtype=torch.long)

# --- Define the network ---
class IrisNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(4, 16),   # 4 features → 16 hidden neurons
            nn.ReLU(),
            nn.Linear(16, 8),   # 16 → 8
            nn.ReLU(),
            nn.Linear(8, 3),    # 8 → 3 classes
        )

    def forward(self, x):
        return self.net(x)

model = IrisNet()

# --- Loss and optimizer ---
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# --- Training loop ---
for epoch in range(200):
    model.train()
    optimizer.zero_grad()           # clear old gradients
    outputs = model(X_train)        # forward pass
    loss = criterion(outputs, y_train)
    loss.backward()                 # backpropagation
    optimizer.step()                # update weights

    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1:3d} | Loss: {loss.item():.4f}")

# --- Evaluation ---
model.eval()
with torch.no_grad():
    preds = model(X_test).argmax(dim=1)
    accuracy = (preds == y_test).float().mean()
    print(f"\nTest accuracy: {accuracy:.1%}")


Epoch  10 | Loss: 0.8406
Epoch  20 | Loss: 0.4869
Epoch  30 | Loss: 0.2794
Epoch  40 | Loss: 0.1609
Epoch  50 | Loss: 0.0929
Epoch  60 | Loss: 0.0647
Epoch  70 | Loss: 0.0547
Epoch  80 | Loss: 0.0501
Epoch  90 | Loss: 0.0472
Epoch 100 | Loss: 0.0453
Epoch 110 | Loss: 0.0438
Epoch 120 | Loss: 0.0426
Epoch 130 | Loss: 0.0415
Epoch 140 | Loss: 0.0403
Epoch 150 | Loss: 0.0390
Epoch 160 | Loss: 0.0375
Epoch 170 | Loss: 0.0359
Epoch 180 | Loss: 0.0340
Epoch 190 | Loss: 0.0321
Epoch 200 | Loss: 0.0301

Test accuracy: 100.0%

Practical Advice

Data preprocessing matters more than architecture. Normalize inputs (zero mean, unit variance) before training — neural networks are sensitive to scale.

Start simple. One or two hidden layers with 16–64 neurons usually beat a complex architecture on small tabular datasets.

Watch for overfitting. If training loss drops but validation loss rises, add dropout (nn.Dropout(0.3)) or reduce model size.

Use a validation set. Never tune hyperparameters on the test set.

Common hyperparameters to tune:

Parameter Typical starting values
Hidden layer size 16, 32, 64, 128
Learning rate 0.001 (Adam), 0.01 (SGD)
Batch size 32, 64
Epochs 100–500 for small data
Dropout rate 0.2–0.5

When to Use Neural Networks

Neural networks are not always the right tool. On small tabular datasets (< 10,000 rows), gradient-boosted trees (XGBoost, LightGBM) often win. Neural networks shine when:

Further Reading