Here’s a very nice implementation of the Jaro-Winkler algorithm in Excel VBA.

I use it for **fuzzy matching, clustering data and looking for duplicates.**

The code is modified from Jay Tracewell’s post here. I simplified the code by taking out the clean_string function and renaming the function to JW.

You can read more about the Jaro-Winkler Distance on Wikipedia.

Example:

Here comes the code: (Since it’s a function, remember to insert it in a module)

Function JW(ByVal str1 As String, ByVal str2 As String) As Double Dim l1, l2, lmin, lmax, m, i, j As Integer Dim common As Integer Dim tr As Double Dim a1, a2 As String l1 = Len(str1) l2 = Len(str2) If l1 > l2 Then aux = l2 l2 = l1 l1 = aux auxstr = str1 str1 = str2 str2 = auxstr End If lmin = l1 lmax = l2 Dim f1(), f2() As Boolean ReDim f1(l1), f2(l2) For i = 1 To l1 f1(i) = False Next i For j = 1 To l2 f2(j) = False Next j m = Int((lmax / 2) - 1) common = 0 tr = 0 For i = 1 To l1 a1 = Mid(str1, i, 1) If m >= i Then f = 1 L = i + m Else f = i - m L = i + m End If If L > lmax Then L = lmax End If For j = f To L a2 = Mid(str2, j, 1) If (a2 = a1) And (f2(j) = False) Then common = common + 1 f1(i) = True f2(j) = True GoTo linea_exit End If Next j linea_exit: Next i Dim wcd, wrd, wtr As Double L = 1 For i = 1 To l1 If f1(i) Then For j = L To l2 If f2(j) Then L = j + 1 a1 = Mid(str1, i, 1) a2 = Mid(str2, j, 1) If a1 <> a2 Then tr = tr + 0.5 End If Exit For End If Next j End If Next i wcd = 1 / 3 wrd = 1 / 3 wtr = 1 / 3 If common <> 0 Then JW = wcd * common / l1 + wrd * common / l2 + wtr * (common - tr) / common Else JW = 0 End If End Function

AnnaThe code posted above calculates the Jaro distance and not the Jaro Winkler distance. Add in these lines of code between lines 76 and 77 above, and you’ll get the Jaro Winkler distance (I have found a few slight discrepancies when I compare to the JW function in R’s RecordLinkage package, but only when the strings have not been cleansed and special symbols/numbers come into play). Hope this helps:

myl = 0 ‘myl is used in JW equation

For i = 1 To 4 ‘only look at first 4 characters in string per definition

If Left(str1, i) = Left(str2, i) Then

myl = i

Else

Exit For

End If

Next i

JW = JW + (myl * 0.1 * (1 – JW)) ‘ 0.1 is the standard p weight

adminPost authorHi Anna.

Thank you very much for your comment. I am grateful, you took the time to comment :-)

As of now, I haven’t had much time to study the algorithm, but I will definitely take a look at the ‘JW distance’ in the recordlinkage package! I love R and I try to pull it out every chance, I get! :-)

Right now, I am mostly using R for geocoding data and mapping it on Google Maps. Other than that, I do some text mining and machine learning.

Have a nice day.

OpHi,

It’s showing syntex error at JW = JW + (myl * 0.1 * (1 – JW)) ‘0.1 is the standard p weight

Please help me out.

Thanks

investeringsportföljHi! I would like to supply a huge thumb up with the wonderful info you will have here concerning this write-up. I’ll be returning your web blog site for much more shortly.

Jeewan GargI Think it is going to be very effective. But i am not sure how to implement it. If you can guide us how to implement it (step by step) , lot of people can get benefit from this.

Alberthi all,

I’ve copied the whole vba code into the file..my only problem is what is the function for this?

is it =JW(?.?)

So sorry, im still a newbie

thank you

adminPost authorHi Albert. Yes, you use the function like this: =JW(stringA, stringB). BR Herbert

GouravI want to use this in a way where I specify a string as the first argument, a table array as the second argument and it returns the value in the cell from the table array where the score is the highest.

Is it possible to do this? Please guide.

Alternatively, is there any other way to use this for a fuzzy lookup?