Excel VBA: Jaro Winkler Distance (Fuzzy Matching)

Here’s a very nice implementation of the Jaro-Winkler algorithm in Excel VBA.

I use it for fuzzy matching, clustering data and looking for duplicates.

The code is modified from Jay Tracewell’s post here. I simplified the code by taking out the clean_string function and renaming the function to JW.

You can read more about the Jaro-Winkler Distance on Wikipedia.

Example:

JWtable

Here comes the code: (Since it’s a function, remember to insert it in a module)

Function JW(ByVal str1 As String, ByVal str2 As String) As Double
Dim l1, l2, lmin, lmax, m, i, j As Integer
Dim common As Integer
Dim tr As Double
Dim a1, a2 As String
l1 = Len(str1)
l2 = Len(str2)
If l1 > l2 Then
    aux = l2
    l2 = l1
    l1 = aux
    auxstr = str1
    str1 = str2
    str2 = auxstr
End If
lmin = l1
lmax = l2
Dim f1(), f2() As Boolean
ReDim f1(l1), f2(l2)
For i = 1 To l1
    f1(i) = False
Next i
For j = 1 To l2
    f2(j) = False
Next j
m = Int((lmax / 2) - 1)
common = 0
tr = 0
For i = 1 To l1
    a1 = Mid(str1, i, 1)
    If m >= i Then
        f = 1
        L = i + m
    Else
        f = i - m
        L = i + m
    End If
    If L > lmax Then
        L = lmax
    End If
    For j = f To L
        a2 = Mid(str2, j, 1)
        If (a2 = a1) And (f2(j) = False) Then
            common = common + 1
            f1(i) = True
            f2(j) = True
            GoTo linea_exit
        End If
    Next j
linea_exit:
Next i
Dim wcd, wrd, wtr As Double
L = 1
For i = 1 To l1
    If f1(i) Then
        For j = L To l2
            If f2(j) Then
                L = j + 1
                a1 = Mid(str1, i, 1)
                a2 = Mid(str2, j, 1)
            If a1 <> a2 Then
                tr = tr + 0.5
            End If
            Exit For
            End If
        Next j
    End If
Next i
wcd = 1 / 3
wrd = 1 / 3
wtr = 1 / 3
If common <> 0 Then
    JW = wcd * common / l1 + wrd * common / l2 + wtr * (common - tr) / common
Else
    JW = 0
End If
End Function




3 thoughts on “Excel VBA: Jaro Winkler Distance (Fuzzy Matching)

  1. Anna

    The code posted above calculates the Jaro distance and not the Jaro Winkler distance. Add in these lines of code between lines 76 and 77 above, and you’ll get the Jaro Winkler distance (I have found a few slight discrepancies when I compare to the JW function in R’s RecordLinkage package, but only when the strings have not been cleansed and special symbols/numbers come into play). Hope this helps:

    myl = 0 ‘myl is used in JW equation
    For i = 1 To 4 ‘only look at first 4 characters in string per definition
    If Left(str1, i) = Left(str2, i) Then
    myl = i
    Else
    Exit For
    End If
    Next i

    JW = JW + (myl * 0.1 * (1 – JW)) ‘ 0.1 is the standard p weight

    Reply
    1. admin Post author

      Hi Anna.

      Thank you very much for your comment. I am grateful, you took the time to comment :-)
      As of now, I haven’t had much time to study the algorithm, but I will definitely take a look at the ‘JW distance’ in the recordlinkage package! I love R and I try to pull it out every chance, I get! :-)

      Right now, I am mostly using R for geocoding data and mapping it on Google Maps. Other than that, I do some text mining and machine learning.

      Have a nice day.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *