Math Solver AI

Using handwritten images, we extract spatial features of expressions and recognize sequences to solve a basic mathematical expression.

Component	Purpose
CNN	Extract spatial features from image
BiLSTM	Capture sequence relationships left-to-right
CTC	Predict characters without needing aligned labels

The task breakdown includes the architecture, labelling strategy, and a plan to train it from our dataset.

Architecutre Breakdown: CRNN for Handwritten Math Expression Recognition

Input

Full expression image (e.g. 120 + 324 rendered as one horizontal image).
Size: Say $32 * 32$ grayscale. (height x varable width)

CNN Feature Extractor

We extract spatial features from the image. After this, flatten the height dimension and treat the width as a time sequence.

Sequence Model (BiLSTM)

After feature extraction, we process the feature sequence and capture temporal relationships.

Input: T x B x F (time steps, batch size, features)
Output: For each time step, we get a prediction for a character.

Linear Projection

Mapping LSTM output to our character set.

num_classes = len(vocab) + 1 (for CTC blank)
Vocab: ['0'-'9', '+', '-', '×', '÷', '='] → 14 tokens

CTC Loss

We have used CTC loss because we don't know exact position of each character in the image. CTC allows the model to align predicted character sequences with actual text labels like "$3+4$" without needing character bounding boxes.

Labelling Strategy

We need to encode text labels into integer sequences (training, test, validation labels).

{
    'add': 0,         → '+'
    'divide': 1,      → '÷'
    'eight': 2,       → '8'
    'five': 3,        → '5'
    'four': 4,        → '4'
    'multiply': 5,    → '×'
    'nine': 6,        → '9'
    'one': 7,         → '1'
    'seven': 8,       → '7'
    'six': 9,         → '6'
    'subtract': 10,   → '-'
    'three': 11,      → '3'
    'two': 12,        → '2'
    'zero': 13        → '0'
}

Character decoding map from your numeric labels to output expression

label_to_char = {
    0: '+',   1: '÷',  2: '8',  3: '5',
    4: '4',   5: '×',  6: '9',  7: '1',
    8: '7',   9: '6', 10: '-', 11: '3',
   12: '2',  13: '0'
}

Training Steps

Expression Image Generator: Generate a dataset of synthetic expression images and labels.
Build the CRNN model using PyTorch (as above).
Training loop (data loading, label encoding, CTC loss)
Inference logic to decode expression back to text. During inference, use CTC greedy decode or beam search to get best character sequence.

CRNN Output Class

For CTC, we need:

1 class for each label (14 total)
1 additional "blank" class (for CTC) → total of 15 classes

Label Sequences for CTC Training

If our expression is 3 + 5, we’ll encode it like:

label_sequence = [11, 0, 3]  # 'three', 'add', 'five'

During inference, we’ll decode back using label_to_char and remove duplicate/blank predictions from the CTC output.

Expression Generator (Pseudocode)

def create_expression_image(digit_imgs, operator_imgs):
    # Randomly pick: '4 + 7' → [img4, img+, img7]
    imgs = [get_digit(4), get_operator('+'), get_digit(7)]
    # Concatenate horizontally
    expr_img = concatenate_images(imgs)
    # Label = '4+7'
    return expr_img, "4+7"

We can add random spacing, jitter, and noise to mimic real handwriting variation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
model		model
references		references
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Math Solver AI

Architecutre Breakdown: CRNN for Handwritten Math Expression Recognition

Input

CNN Feature Extractor

Sequence Model (BiLSTM)

Linear Projection

CTC Loss

Labelling Strategy

Training Steps

CRNN Output Class

Label Sequences for CTC Training

Expression Generator (Pseudocode)

About

Uh oh!

Releases

Packages

Languages

License

harshrajhrj/math-expression-recognition

Folders and files

Latest commit

History

Repository files navigation

Math Solver AI

Architecutre Breakdown: CRNN for Handwritten Math Expression Recognition

Input

CNN Feature Extractor

Sequence Model (BiLSTM)

Linear Projection

CTC Loss

Labelling Strategy

Training Steps

CRNN Output Class

Label Sequences for CTC Training

Expression Generator (Pseudocode)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages