Summary: A computational graph of nodes (neurons) connected by weighted edges, trained to approximate functions by adjusting its weights and biases.

Core abstraction

A neural network is a directed graph where:

  • Nodes (neurons) hold scalar values called activations — the output of the neuron’s activation-function, and the number that gets passed forward along outgoing edges.
  • Edges carry weights — each weight scales the activation flowing along that connection.
  • Each neuron computes a weighted sum of its inputs, adds a bias, and passes the result through a nonlinear activation-function to produce its activation.

The network as a whole is a parameterised function , where is the collection of all weights and biases. Training adjusts to minimise a cost-function via gradient-descent, with gradients computed by backpropagation.

Single neuron diagram

Historically, two types of artificial neuron are foundational (Nielsen, Ch. 1): the perceptron, which uses a step function (output is 0 or 1 depending on whether the weighted sum exceeds a threshold), and the sigmoid neuron, which uses the sigmoid function to produce a continuous output in . The sigmoid neuron’s smooth, differentiable output is what makes gradient-descent and backpropagation possible — you can’t take useful gradients through a hard step. Modern networks generalise this idea with other activation functions (ReLU, GELU, etc.), but the sigmoid neuron is the historical bridge from the perceptron to trainable deep networks.

Terminology

TermMeaning
ActivationThe scalar value a neuron holds
WeightScalar multiplier on a connection between two neurons
BiasAdditive offset — shifts how large the weighted sum must be before the neuron activates
LayerA group of neurons at the same depth in the graph
Hidden layerAny layer between input and output

Architectures

Different ways of wiring neurons give rise to different architectures, each suited to different data and tasks:

  • multilayer-perceptron — Fully-connected feed-forward network. Every neuron in one layer connects to every neuron in the next. The simplest architecture and the foundation for understanding all others.
  • CNN (convolutional neural network) — Weight-sharing and local connectivity for spatial data (images). (page TBD)
  • RNN (recurrent neural network) — Connections that loop back, giving the network memory over sequences. (page TBD)
  • transformer-architecture — Attention-based architecture; no recurrence, processes sequences in parallel.