Class StringSimilarity


  • @Immutable
    public class StringSimilarity
    extends java.lang.Object
    The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos; it is (roughly) a variation of Damerau-Levenshtein, where the substitution of 2 close characters is considered less important then the substitution of 2 characters that a far from each other. Jaro-Winkler was developed in the area of record linkage (duplicate detection) (Winkler, 1990). It returns a value in the interval [0.0, 1.0]. The distance is computed as 1 - Jaro-Winkler similarity.
    • Constructor Summary

      Constructors 
      Constructor Description
      StringSimilarity()
      Instantiate with default threshold (0.7).
      StringSimilarity​(double threshold)
      Instantiate with given threshold to determine when Winkler bonus should be used.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      double distance​(java.lang.String s1, java.lang.String s2)
      Return 1 - similarity.
      double getThreshold()
      Returns the current value of the threshold used for adding the Winkler bonus.
      double similarity​(java.lang.String s1, java.lang.String s2)
      Compute Jaro-Winkler similarity.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • StringSimilarity

        public StringSimilarity()
        Instantiate with default threshold (0.7).
      • StringSimilarity

        public StringSimilarity​(double threshold)
        Instantiate with given threshold to determine when Winkler bonus should be used. Set threshold to a negative value to get the Jaro distance.
    • Method Detail

      • getThreshold

        public final double getThreshold()
        Returns the current value of the threshold used for adding the Winkler bonus. The default value is 0.7.
        Returns:
        the current value of the threshold
      • similarity

        public final double similarity​(java.lang.String s1,
                                       java.lang.String s2)
        Compute Jaro-Winkler similarity.
        Parameters:
        s1 - The first string to compare.
        s2 - The second string to compare.
        Returns:
        The Jaro-Winkler similarity in the range [0, 1]
      • distance

        public final double distance​(java.lang.String s1,
                                     java.lang.String s2)
        Return 1 - similarity.
        Parameters:
        s1 - The first string to compare.
        s2 - The second string to compare.
        Returns:
        1 - similarity.