Version 3.9
NOTE/IMPORT: some PRs in this version are not drop-in compatible with the previous version since their parameters have changed (supporting multiple classes for sequence tagging needed the class annotation type parameter to be a list instead of a single value). Also, models may not be compatible (a model trained with a previous version may not work with this version) since there have been chances of which features get generated and how they are named.
CHANGES (In reverse order of time):
- Fix issue #32
- Fix problem with handling the exception correctly when the trainer class cannot be instantiated
- Fix bug: incorrected trainer class for
MALLET_CL_NAIVE_BAYES
- Better handling of escaping/cleaning strings when exporting to tsv/csv format
- Better handling of missing values for number-coded nominal features when exporting to ARFF
- Properly escape empty string when exporting to ARFF, treat null like an empty string
- More runtime parameters are optional
- Add JSON exporter
- Implement START/STOP symbols
- In many places, move from assuming files to using URLs
- Major refactoring: the Engine instance now knows which CorpusRepresentation to use and returns it
- Added support to use previous target(s) as a feature
- Make classification using a sequence algorithm work
- Implement PRs for word shape features and affix features generation
- Implement multi-class sequence tagging