ColdFusion Levenshtein Distance: String comparison and highlighting

This is a fun project I put out there a while back. I recently went through and optimized the performance a bit so I could officially blog it. It is an implementation of the Levenshtein Distance Algorithm in CFScript that I based off of a C# version written by Siderite Zackwehdex. Finding the "distance" between two strings is a means of comparing two strings to see how similar they both are. This can be done by finding the Longest Common String or LCS. It is as much a brain bender as it can be occasionally useful.

The basic gist of the concept is this: Iterate over two strings making a note of how many characters were inserted, deleted, or transposed from one string to the other. When a difference is found, "bookmark" where you are and start looking ahead in each string to see if the strings are going to start matching up again down the line. How far down the line you look is controlled by in an input called maxOffset. The LCS is the number of characters between the two strings which were identical. The "distance" of the two strings is simply the average string length minus the longest common string. The similarity of the two strings can be expressed as a percentage given 1 minus the strings' distance divided the length of the longest string. Enough theory-- let's look at an example:

I love to go ride my go-kart. (29 chars)
I love to ride my go-cart outside. (34 chars)

  • There is a deletion of the word "go "
  • There is a substitution of "c" for "k" in go-cart
  • There is an addition of the word "outside"

Overall, the average length of the strings is 31.5 characters.
There are 25 characters in the two strings that are identical. This is our LCS.
String 1 is 4 characters different than string 2, and string 2 has 9 characters different than string 1.
So on average, we will say 6.5 characters would need to be changed to make the strings identical. This is our distance. (4 + 9 = 13 / 2 = 6.5)
Divide that distance by the length of our longest string and we can come to the conclusion that the strings are an 81% match. (1 - (6.5 / 34) = .8088 = 81%)

Here's another example:

The rain in Spain stays mainly on the plains
The rain in Spain stays mainly on the plains
The rain in Spain stays mainly on the plains
Lorum Ipsum, yadda yadda.
Lorum Ipsum, yadda yadda.
La La La La Luke, I am your father.
The rain in Madrid stays totally on the plains
The rain in Spain stays mainly on the plains
The rain in Barcelona stays entirely in the air
Lorum Ipsum, Yabba dabba doo.
Whatcha eatin? Nutin' Honey.
Da Da Da Duke, I am your father.

  • Roughly 67 characters are different between the two strings.
  • The Longest Common String (LCS) is 180.
  • The strings are a 73% match.

In addition to porting the logic over to CFScript, there are a few things I added in to my function:

  • If both strings are empty, I short circuit and return a 0 for distance and LCS. Similarity is 100%.
  • If either string is empty, I short circuit and return the length of the non-empty string as the distance. The LCS and similarity is 0.
  • To better detect differences at the start of the string I check the first three characters. That way a matching first letter wouldn't be confused in two strings like "top hat" and "the hat"
  • When looking ahead in the strings trying to reconcile a difference I search for the distance of the maxOffset until I find THREE contiguous matching characters. This is to try and eliminate false positives.
  • My function will highlight the differences between the strings by wrapping the deviations in the HTML tag of your choice. Default is <span style="background: yellow;"></span>

The function is fairly effective for finding basic insertions, deletions, and transpositions between to strings ranging form a few words, to a few paragraphs. Since the algorithm iterates through two strings without looking back, it WON'T find sentences that had their order rearranged. The maxOffset controls how hard the code tries to look and ahead and reconcile an insert or deletion. If you expect entire sentences to be inserted, your maxOffset needs to be at least as big as the largest insertion. Of course, the larger offset you allow for, the more performance will be impacted and the more likely you are to get a false positive when looking ahead.

Have fun with it. If you can think of a way to improve the code I would love to hear about it. I have included the code for the function, an example way to call it, and a zipped version of both below.

 <cfscript>
 
     /*
 
         StringSimilarity
         Brad Wood
         brad@bradwood.com
         May 2007
         Code adopted from Siderite Zackwehdex's Blog
10              http://siderite.blogspot.com/2007/04/super-fast-and-accurate-string-distance.html
11  
12          Parameters:
13              s1:            First string to be compared
14              s2:            Second string to be compared
15              maxOffset:    Average number of characters that s1 will deviate from s2 at any given point.
16                          This is used to control how far ahead the function looks to try and find the
17                          end of a peice of inserted text. Play with it to suit.
18  
19      */

20  
21   function stringSimilarity(s1,s2,maxOffset)
22   {
23   var c = 0;
24   var offset1 = 0;
25   var offset2 = 0;
26   var lcs = 0;
27              // These two strings will contain the "highlighted" version
28  
            var _s1 = createObject("java","java.lang.StringBuffer").init(javacast("int",len(s1)*3));
29              var _s2 = createObject("java","java.lang.StringBuffer").init(javacast("int",len(s2)*3));
30              // These chaactes will surround differences in the strings
31  
            // (Inserted into _s1 and _s2)
32  
            var h1 = "<span style=""background: yellow;"">";
33              var h2 = "</span>";
34              var return_struct = structNew();
35              // If both strings are empty
36  
if (not len(trim(s1)) and not len(trim(s2)))
37                  {    
38                      return_struct.lcs = 0;
39                      return_struct.similarity = 1;
40                      return_struct.distance = 0;
41                      return_struct.s1 = "";
42                      return_struct.s2 = "";
43           return return_struct;
44                  }
45              // If s2 is empty, but s1 isn't
46  
if (len(trim(s1)) and not len(trim(s2)))
47                  {
48                      return_struct.lcs = 0;
49                      return_struct.similarity = 0;
50                      return_struct.distance = len(s1);
51                      return_struct.s1 = h1 & s1 & h2;
52                      return_struct.s2 = "";
53           return return_struct;
54                  }
55              // If s1 is empty, but s2 isn't
56  
            else if (len(trim(s2)) and not len(trim(s1)))
57                  {
58                      return_struct.lcs = 0;
59                      return_struct.similarity = 0;
60                      return_struct.distance = len(s2);
61                      return_struct.s1 = "";
62                      return_struct.s2 = h1 & s2 & h2;
63           return return_struct;
64                  }
65                  
66              // Examine the strings, one character at a time, anding at the shortest string
67  
            // The offset adjusts for extra characters in either string.
68  
while ((c + offset1 lt len(s1))
69   and (c + offset2 lt len(s2)))
70   {
71                  // Pull the next charactes out of s1 anbd s2
72  
                next_s1 = mid(s1,c + offset1+1,iif(not c,3,1)); // First time through check the first three
73  
                next_s2 = mid(s2,c + offset2+1,iif(not c,3,1)); // First time through check the first three
74  
                // If they are equal
75  
if (compare(next_s1,next_s2) eq 0)
76                      {
77                          // Our longeset Common String just got one bigger
78  
                        lcs = lcs + 1;
79                          // Append the characters onto the "highlighted" version
80  
                        _s1.append(left(next_s1,1));
81                          _s2.append(left(next_s2,1));
82                      }
83                  // The next two charactes did not match
84  
                // Now we will go into a sub-loop while we attempt to
85  
                // find our place again. We will only search as long as
86  
                // our maxOffset allows us to.
87  
else
88       {
89                          // Don't reset the offsets, just back them up so you
90  
                        // have a point of reference
91  
     old_offset1 = offset1;
92       old_offset2 = offset2;
93                          _s1_deviation = "";
94                          _s2_deviation = "";
95                          // Loop for as long as allowed by our offset
96  
                        // to see if we can match up again
97  
     for (i = 0; i lt maxOffset; i=i+1)
98       {
99                              next_s1 = mid(s1,c + offset1 + i+1,3); // Increments each time through.
100  
                            len_next_s1 = len(next_s1);
101                              bookmarked_s1 = mid(s1,c + offset1+1,3); // stays the same
102  
                            next_s2 = mid(s2,c + offset2 + i+1,3); // Increments each time through.
103  
                            len_next_s2 = len(next_s2);
104                              bookmarked_s2 = mid(s2,c + offset2+1,3); // stays the same
105  
                            
106                              // If we reached the end of both of the strings
107  
                            if(not len_next_s1 and not len_next_s2)
108                                  {
109                                      // Quit
110  
                                    break;
111                                  }
112                              // These variables keep track of how far we have deviated in the
113  
                            // string while trying to find our match again.
114  
                            _s1_deviation = _s1_deviation & left(next_s1,1);
115                              _s2_deviation = _s2_deviation & left(next_s2,1);
116                              // It looks like s1 has a match down the line which fits
117  
                            // where we left off in s2
118  
     if (compare(next_s1,bookmarked_s2) eq 0)
119           {
120                                      // s1 is now offset THIS far from s2
121  
         offset1 = offset1+i;
122                                      // Our longeset Common String just got bigger
123  
                                    lcs = lcs + 1;
124                                      // Now that we match again, break to the main loop
125  
         break;
126           }
127                                  
128                              // It looks like s2 has a match down the line which fits
129  
                            // where we left off in s1
130  
     if (compare(next_s2,bookmarked_s1) eq 0)
131           {
132                                      // s2 is now offset THIS far from s1
133  
         offset2 = offset2+i;
134                                      // Our longeset Common String just got bigger
135  
                                    lcs = lcs + 1;
136                                      // Now that we match again, break to the main loop
137  
         break;
138           }
139       }
140                          //This is the number of inserted characters were found
141  
                        added_offset1 = offset1 - old_offset1;
142                          added_offset2 = offset2 - old_offset2;
143                          
144                          // We reached our maxoffset and couldn't match up the strings
145  
                        if(added_offset1 eq 0 and added_offset2 eq 0)
146                              {
147                                  _s1.append(h1 & left(_s1_deviation,added_offset1+1) & h2);
148                                  _s2.append(h1 & left(_s2_deviation,added_offset2+1) & h2);
149                              }
150                          // s2 had extra characters
151  
                        else if(added_offset1 eq 0 and added_offset2 gt 0)
152                              {
153                                  _s1.append(left(_s1_deviation,1));
154                                  _s2.append(h1 & left(_s2_deviation,added_offset2) & h2 & right(_s2_deviation,1));
155                              }
156                          // s1 had extra characters
157  
                        else if(added_offset1 gt 0 and added_offset2 eq 0)
158                              {
159                                  _s1.append(h1 & left(_s1_deviation,added_offset1) & h2 & right(_s1_deviation,1));
160                                  _s2.append(left(_s2_deviation,1));
161                              }
162       }
163   c=c+1;    
164   }
165              // Anything left at the end of s1 is extra
166  
            if(c + offset1 lt len(s1))
167                  {
168                      _s1.append(h1 & right(s1,len(s1)-(c + offset1)) & h2);
169                  }
170              // Anything left at the end of s2 is extra
171  
            if(c + offset2 lt len(s2))
172                  {
173                      _s2.append(h1 & right(s2,len(s2)-(c + offset2)) & h2);
174                  }
175                  
176              // Distance is the average string length minus the longest common string
177  
            distance = (len(s1) + len(s2))/2 - lcs;
178              // Whcih string was longest?
179  
            maxLen = iif(len(s1) gt len(s2),de(len(s1)),de(len(s2)));
180              // Similarity is the distance divided by the max length
181  
            similarity = iif(maxLen eq 0,1,1-(distance/maxLen));
182              // Return what we found.
183  
            return_struct.lcs = lcs;
184              return_struct.similarity = similarity;
185              return_struct.distance = distance;
186              return_struct.s1 = _s1.toString(); // "highlighted" version
187  
            return_struct.s2 = _s2.toString(); // "highlighted" version
188  
return return_struct;
189   }
190  
191  
192  
</cfscript>

 <cfset string1 = "The rain in Spain stays mainly on the plains
                 The rain in Spain stays mainly on the plains
                 The rain in Spain stays mainly on the plains
                 Lorum Ipsum, yadda yadda.
                 Lorum Ipsum, yadda yadda.
                 La La La La Luke, I am your father."
>

 
 <cfset string2 = "The rain in Madrid stays totally on the plains
                 The rain in Spain stays mainly on the plains
10                  The rain in Barcelona stays entirely in the air
11                  Lorum Ipsum, Yabba dabba doo.
12                  Whatcha eatin? Nutin' Honey.
13                  Da Da Da Duke, I am your father."
>

14  
15  <cfset comparison_result = stringSimilarity(string1,string2,10)>
16  
17  <cfoutput>
18  Roughly #comparison_result.distance# characters are different between the two strings.<br>
19  The strings are a #numberformat(comparison_result.similarity*100)#% match.<br>
20  The Longest Common String is #comparison_result.lcs#.<br>
21  <br>
22  <table border="1" cellpadding="10" cellspacing="0">
23      <tr>
24          <td>
25              #replacenocase(comparison_result.s1,chr(10),"<br>","all")#
26          </td>
27          <td>
28              #replacenocase(comparison_result.s2,chr(10),"<br>","all")#
29          </td>
30      <tr>
31  </table>
32  </cfoutput>

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Adrian Lynch's Gravatar Excellent work!

I was about to add a feature to a project that needs to tell the user that two pieces of text aren't different enough (for SEO purposes) and this looks like it'll work a treat.

Many thanks.
# Posted By Adrian Lynch | 8/1/08 2:45 PM
Brad Wood's Gravatar Glad it's useful to you!
# Posted By Brad Wood | 8/1/08 9:27 PM
Micah's Gravatar Thanks this worked perfectly and was very useful.
# Posted By Micah | 9/9/08 12:32 AM
George Jempty's Gravatar You know there is a Java implementation of the Levenshtein algorithm under StringUtils in the Jakarta Commons lang package that you ought to be able to access directly from ColdFusion? That being said, I'm very interested in how you did your highlighting, and almost wish I was using ColdFusion instead of JSP so I could just use yours ;)
# Posted By George Jempty | 11/5/08 11:19 PM
Topper's Gravatar Legend! You just saved me a monkey load of work.
I need to correct an error in some software that has led to a database being out of touch with another reference database - don't ask about the bad DB design.

If this works, I'll owe you a cupcake!
# Posted By Topper | 12/16/08 4:39 AM
Brad Wood's Gravatar @Topper: I hope it helps you. I like chocolate :)

Seriously though, if you need to compare two databases, Redgate software makes some very kick butt tools for that. They will show you line by line comparisons of stored procs and stuff, but when it comes to the differences in data I think it just tells you they aren't the same.
# Posted By Brad Wood | 12/16/08 3:23 PM
Hal Helms's Gravatar Very nice work, Brad. I was all set to dive into this when I found your blog post. Thanks much!
# Posted By Hal Helms | 1/8/09 10:27 AM
Siderite's Gravatar Hey, after a year I actually find this post :) Thanks for linking my blog and for using my algorithm! Your explanations are so much cooler and visually nice than mine. Good job!
# Posted By Siderite | 5/7/09 2:22 PM
Andy Bellenie's Gravatar Hey,
Nice script! .. .but you need to var the return_stuct too.
/ Andy
www.cfwheels.org
# Posted By Andy Bellenie | 3/29/10 6:46 AM
Brad Wood's Gravatar Good call, Andy.

I updated the post and the download.
# Posted By Brad Wood | 3/30/10 1:00 AM
Kerr's Gravatar Hey guys, I know I'm late to the game here, though wanted to extend my thanks. I was looking for a ColdFusion implementation of string comparison highlighting, and Brad's solution works great! I did add a couple function arguments for the highlighting markup used, but other than that the original function was left untouched.
# Posted By Kerr | 12/13/10 8:44 AM
HP's Gravatar Hi,
Anyone knows how to compare two strings, then returns the diff.
For example:
s1 = 'GIT-04 (INT) AMT DR'
s2 = 'GIT-04 (INT)'
it will returns 'AMT DR'
I spent hours using Javascript and Coldfusion, but there is no luck.
Any advise, anyone?
# Posted By HP | 3/10/11 8:36 PM
Nebu's Gravatar Great post. One comment:
177 distance = (len(s1) + len(s2))/2 - lcs;
178 // Whcih string was longest?
179 maxLen = iif(len(s1) gt len(s2),de(len(s1)),de(len(s2)));
180 // Similarity is the distance divided by the max length
181 similarity = iif(maxLen eq 0,1,1-(distance/maxLen));
Should/could be
177 var distance = (len(s1) + len(s2))/2 - lcs;
178 // Whcih string was longest?
179 var maxLen = iif(len(s1) gt len(s2),de(len(s1)),de(len(s2)));
180 // Similarity is the distance divided by the max length
181 var similarity = iif(maxLen eq 0,1,1-(distance/maxLen));

You might also be interested in http://cfdiff.googlecode.com/ . Keep up the good work.
# Posted By Nebu | 4/20/11 7:19 AM
Aiming Xu's Gravatar The script is great to compare char by char. Is there a script to highlight whole word if one char is different?
# Posted By Aiming Xu | 7/27/11 2:58 PM
Valeriy Nenov's Gravatar I needed this function to be in C# so I converted the code from the original CF but am getting index out of bounds error. Attached is the code. Can someone please try this out in C# and help me figure out why it is crashing. Perhaps my manual conversion is not adequate.

Val


public string GetFixedLengthString(string input, int length)
{
input = input ?? string.Empty;
input = input.Length > length ? input.Substring(0, length) : input;
return string.Format("{0,-" + length + "}", input);
}

// define the resulting variables
private int return_struct_lcs;
      private float return_struct_similarity;
      private int return_struct_distance;
      private string return_struct_s1;
      private string return_struct_s2;
/*
      StringSimilarity
      Brad Wood
      brad@bradwood.com

      May 2007

      Code adopted from Siderite Zackwehdex's Blog
         http://siderite.blogspot.com/2007/04/super-fast-an...

      Parameters:
         s1:         First string to be compared
         s2:         Second string to be compared
         maxOffset:   Average number of characters that s1 will deviate from s2 at any given point.
                  This is used to control how far ahead the function looks to try and find the
                  end of a peice of inserted text. Play with it to suit.
   */
/// <summary>
/// Val converted manually form the Cold Fusion code at
/// http://www.codersrevolution.com/index.cfm/2008/7/2...
/// </summary>
/// <param name="s1"></param>
/// <param name="s2"></param>
/// <param name="maxOffset"></param>
public void stringSimilarity(string s1, string s2, int maxOffset)
{
int c = 0;
int offset1 = 0;
int offset2 = 0;
int lcs = 0;

         // These two strings will contain the "highlighted" version
         // was: string _s1 = createObject("java","java.lang.StringBuffer").init(javacast("int",s1.Length*3));
         // Was: string _s2 = createObject("java","java.lang.StringBuffer").init(javacast("int",s2.Length*3));

string _s1 = ""; // Val
string _s2 = ""; // Val
_s1 = GetFixedLengthString(_s1, s1.Length * 3); // Val
_s2 = GetFixedLengthString(_s2, s2.Length * 3); // Val

         // These charactes will surround differences in the strings
         // (Inserted into _s1 and _s2)
//was: string h1 = "<span style="/"background: yellow;"/">";
//was: string h2 = "</span>";
string h1 = "<font color=red>";
string h2 = "</font>";

         // was: var return_struct = structNew();

s1 = s1.Trim();
s2 = s2.Trim();
         // If both strings are empty
if (String.IsNullOrEmpty(s1) && String.IsNullOrEmpty(s2)) // was: (!s1.Length && !s2.Length)
            {   
               return_struct_lcs = 0;
               return_struct_similarity = 1;
               return_struct_distance = 0;
               return_struct_s1 = "";
               return_struct_s2 = "";

       //return return_struct;
            }

         // If s2 is empty, but s1 isn't
if (!String.IsNullOrEmpty(s1) && String.IsNullOrEmpty(s2))// was:(s1.Length && !s2.Length)
            {
               return_struct_lcs = 0;
               return_struct_similarity = 0;
               return_struct_distance = s1.Length;
               return_struct_s1 = h1 + s1 + h2;
               return_struct_s2 = "";

       //return return_struct;
            }

         // If s1 is empty, but s2 isn't
else if (String.IsNullOrEmpty(s1) && !String.IsNullOrEmpty(s2))// was:(s2.Length && !s1.Length)
            {
               return_struct_lcs = 0;
               return_struct_similarity = 0;
               return_struct_distance = s2.Length;
               return_struct_s1 = "";
               return_struct_s2 = h1 + s2 + h2;

       //return return_struct;
            }

         
         // Examine the strings, one character at a time, anding at the shortest string
         // The offset adjusts for extra characters in either string.

while ((c + offset1 < s1.Length)
&& (c + offset2 < s2.Length))
{
            // Pull the next charactes out of s1 and s2
//was: string next_s1 = mid(s1,c + offset1+1,iif(!c,3,1)); // First time through check the first three
//was: string next_s2 = mid(s2,c + offset2+1,iif(!c,3,1)); // First time through check the first three

int iif; // Val
if (c == 0) iif = 3; else iif = 1; // Val

string next_s1 = s1.Substring(c + offset1 + 1, iif); // First time through check the first three
string next_s2 = s2.Substring(c + offset2 + 1, iif); // First time through check the first three

            // If they are equal
if (next_s1 == next_s2) //was:(compare(next_s1,next_s2) == 0)
               {
                  // Our longeset Common String just got one bigger
                  lcs = lcs + 1;

                  // Append the characters onto the "highlighted" version
                  // was: _s1.append(left(next_s1,1));
                  // was: _s2.append(left(next_s2,1));
_s1 = _s1 + next_s1.Substring(0,1);
_s2 = _s2 + next_s2.Substring(0,1);
               }

            // The next two charactes did not match
            // Now we will go into a sub-loop while we attempt to
            // find our place again. We will only search as long as
            // our maxOffset allows us to.

else
    {
                  // Don't reset the offsets, just back them up so you
                  // have a point of reference
    int old_offset1 = offset1;
    int old_offset2 = offset2;
                  string _s1_deviation = "";
                  string _s2_deviation = "";

                  // Loop for as long as allowed by our offset
                  // to see if we can match up again
    for (int i = 0; i < maxOffset; i++)
    {
                     //was: next_s1 = mid(s1,c + offset1 + i+1,3); // Increments each time through.
next_s1 = s1.Substring(c + offset1 + i + 1, 3); // Increments each time through.

int len_next_s1 = next_s1.Length;
                     // was: string bookmarked_s1 = mid(s1,c + offset1+1,3); // stays the same
string bookmarked_s1 = s1.Substring(c + offset1 + 1,3);

                     //was: next_s2 = mid(s2,c + offset2 + i+1,3); // Increments each time through.
next_s2 = s2.Substring(c + offset2 + i + 1, 3); // Increments each time through.

int len_next_s2 = next_s2.Length;
                     //was: string bookmarked_s2 = mid(s2,c + offset2+1,3); // stays the same
string bookmarked_s2 = s2.Substring(c + offset2 + 1, 3); // stays the same


                     // If we reached the end of both of the strings
if ((len_next_s1 == 0) && (len_next_s2 == 0)) // was: (!len_next_s1 && !len_next_s2)
                        {
                           // Quit
                           break;
                        }

                     // These variables keep track of how far we have deviated in the
                     // string while trying to find our match again.
_s1_deviation = _s1_deviation + next_s1.Substring(0, 1); // was: left(next_s1,1);
_s2_deviation = _s2_deviation + next_s2.Substring(0, 1); // was; left(next_s2,1);

                     // It looks like s1 has a match down the line which fits
                     // where we left off in s2

if (next_s1 == bookmarked_s2)// was: (compare(next_s1,bookmarked_s2) == 0)
       {
                           // s1 is now offset THIS far from s2
       offset1 = offset1+i;

                           // Our longeset Common String just got bigger
                           lcs = lcs + 1;

                           // Now that we match again, break to the main loop
       break;
       }
                        

                     // It looks like s2 has a match down the line which fits
                     // where we left off in s1

if (next_s2 == bookmarked_s1) // was: (compare(next_s2,bookmarked_s1) == 0)
       {
                           // s2 is now offset THIS far from s1
       offset2 = offset2+i;

                           // Our longeset Common String just got bigger
                           lcs = lcs + 1;

                           // Now that we match again, break to the main loop
       break;
       }
    }

                  //This is the number of inserted characters were found
                  int added_offset1 = offset1 - old_offset1;
                  int added_offset2 = offset2 - old_offset2;

                  
                  // We reached our maxoffset and couldn't match up the strings
                  if(added_offset1 == 0 && added_offset2 == 0)
                     {
                        // was: _s1.append(h1 + left(_s1_deviation,added_offset1+1) + h2);
                        // was: _s2.append(h1 + left(_s2_deviation,added_offset2+1) + h2);
_s1 = _s1 + h1 + _s1_deviation.Substring(0, added_offset1 + 1) + h2;
_s2 = _s2 + h1 + _s2_deviation.Substring(0, added_offset2 + 1) + h2;   
}

                  // s2 had extra characters
                  else if(added_offset1 == 0 && added_offset2 > 0)
{
// was: _s1.append(left(_s1_deviation,1));
// was: _s2.append(h1 + left(_s2_deviation,added_offset2) + h2 + right(_s2_deviation,1));
_s1 = _s1 + _s1_deviation.Substring(0, 1);
_s2 = _s2 + h1 + _s2_deviation.Substring(0, added_offset2) + h2 + _s2_deviation.Substring(_s2_deviation.Length - 1, 1); // ?? Length -1

}

                  // s1 had extra characters
                  else if(added_offset1 > 0 && added_offset2 == 0)
                     {
// was: _s1.append(h1 + left(_s1_deviation,added_offset1) + h2 + right(_s1_deviation,1));
// was: _s2.append(left(_s2_deviation,1));
_s1 = _s1 + h1 + _s1_deviation.Substring(0, added_offset1) + h2 + _s1_deviation.Substring(_s1_deviation.Length - 1, 1);
_s2 = _s2 + _s2_deviation.Substring(0, 1);
}
    }
c++;
// was: c=c+1;   
}

         // Anything left at the end of s1 is extra

         if(c + offset1 < s1.Length)
            {
               // was: _s1.append(h1 + right(s1,s1.Length-(c + offset1)) + h2);
_s1 = _s1 + h1 + s1.Substring(s1.Length - 1, s1.Length - (c + offset1)) + h2;
}

         // Anything left at the end of s2 is extra
         if(c + offset2 < s2.Length)
            {
               // was: _s2.append(h1 + right(s2,s2.Length-(c + offset2)) + h2);
_s2 = _s2 + h1 + s2.Substring(s2.Length - 1, s2.Length - (c + offset2)) + h2;
}

         // Distance is the average string length minus the longest common string
         int distance = (s1.Length + s2.Length)/2 - lcs;

         // Which string was longest?
         // was: int maxLen = iif(s1.Length > s2.Length,de(s1.Length),de(s2.Length));
int maxLen;
if(s1.Length > s2.Length) maxLen = s1.Length;
else maxLen = s2.Length;

         // Similarity is the distance divided by the max length
// was: similarity = iif(maxLen eq 0,1,1-(distance/maxLen));
         float similarity;
if(maxLen == 0) similarity =1;
else similarity = 1-(distance/maxLen);

         // Return what we found.
         return_struct_lcs = lcs;
         return_struct_similarity = similarity;
         return_struct_distance = distance;
         return_struct_s1 = _s1; // "highlighted" version
         return_struct_s2 = _s2; // "highlighted" version

//return return_struct;
}
# Posted By Valeriy Nenov | 8/3/11 8:41 PM
Brad Wood's Gravatar @Valeriy: Sorry, I'm not much of a C# guru, but one thing to keep in mind is that ColdFusion uses 1-based arrays instead of 0-based arrays. In other words, an array with only 1 item in it is accessed as myArray[1].
# Posted By Brad Wood | 8/3/11 9:14 PM
Siderite's Gravatar Lol! The original code was C#. You are converting something back. Although it would make an interesting analysis of how C# to CF to C# changes code.
# Posted By Siderite | 8/8/11 1:57 AM
Sam's Gravatar Wonderful. Thanks!
# Posted By Sam | 1/31/12 10:56 AM


BlogCFC was created by Raymond Camden. This blog is running version 5.9.5. Contact Blog Owner