Open
Description
This issue was reported to my team's .NET port of your library but I confirmed that it is an issue here as well.
The example on the README shows that the N-Gram code expects values of 0.416666 and 0.97222. However, different results are given when the code is actually ran. I am not sure if this is a bug in the code, or that the README comment is outdated/incorrect.
I created a unit test for the README example, and sure enough it fails:
@Test
public void exampleFromReadme() {
// produces 0.416666
NGram twogram = new NGram(2);
assertEquals(0.416666, twogram.distance("ABCD", "ABTUIO"), 0.001);
// produces 0.97222
String s1 = "Adobe CreativeSuite 5 Master Collection from cheap 4zp";
String s2 = "Adobe CreativeSuite 5 Master Collection from cheap d1x";
NGram ngram = new NGram(4);
assertEquals(0.97222, ngram.distance(s1, s2), 0.001);
}
Results:
java.lang.AssertionError:
Expected :0.416666
Actual :0.5833333134651184
This result (0.583) is the same that we get on the .NET side of things. As I am not an expert in these algorithms, I am unsure if this is a code bug or a need to update the README.
Metadata
Metadata
Assignees
Labels
No labels