I think it would be nice to break stuff up and organize it into separate files and packages better. This will make code modifications clearer. Maybe also consider renaming things. Maybe the project should be called corpus search or something else instead.