A containerised version of the tools required to train/fine tune Tesseract for a new font.
Based on: https://www.youtube.com/watch?v=TpD76k2HYms
- Clone this repo (
git clone https://github.com/artdevgame/tesseract-trainer.git) - Copy your selected font into the
src/fontsdirectory - Configure docker-compose.yml with your preferences (see below)
- Download and install Docker for your OS (https://www.docker.com/products/docker-desktop)
- From the project root directory, run
docker-compose up - After the process has finished, you will have a
final.traineddatain thesrc/outputdirectory. Use this in your Tesseract project
Change the following environment values in docker-compose.yml:
| Property | Example | Description |
|---|---|---|
| TESSTRAIN_FONT | Agency FB Condensed | The name of the font (not the filename) |
| TESSTRAIN_LANG | eng | The language of the training data |
| TESSTRAIN_MAX_PAGES | 10 | Training text size |
| TESSTRAIN_MAX_ITERATIONS | 400 | Number of iterations for the neural network, more will give a better result but may also lead to overfitting (bad) |