Fuzzy String Compare
Geschrieben von: Peter Hellinger Montag, den 22. Januar 2007 um 15:29 Uhr
You know the problem: A simple type mismatch and the search in the database goes fail, because the search strings did not match… This function compares the single characters, counts the identical characters and calculates a procentual value, how similar the both strings are.To bring in a little fuzzy factor, the function checks if any identical characters is in the near of the actual compare position. This is calculated in a formula depending on the length of the strings (diff).
Some example results:
"John" and "Jon" = 75%
"Jim" and "James" = 40%
"Luke Skywalker" and "Darth Vader" = 0% (Hmm…)
function StrSimilar (s1, s2: string): Integer;
var hit: Integer; // Number of identical chars
p1, p2: Integer; // Position count
l1, l2: Integer; // Length of strings
pt: Integer; // for counter
diff: Integer; // unsharp factor
hstr: string; // help var for swap strings
test: array [1..255] of Boolean; // Array shows if position is tested
begin
// Test Length and swap, if s1 is smaller we alway search along the longer string
if Length(s1) < Length(s2) then begin
hstr:= s2; s2:= s1; s1:= hstr;
end;
// store length of strings to speed up the function
l1:= Length (s1);
l2:= Length (s2);
p1:= 1; p2:= 1; hit:= 0;
// calc the unsharp factor depending on the length of the strings. Its about a
// third of the whole length
diff:= Max (l1, l2) div 3 + ABS (l1 - l2);
// init the test array
for pt:= 1 to l1 do test[pt]:= False;
// loop through the string
repeat
// position tested?
if not test[p1] then begin
// found a matching character?
if (s1[p1] = s2[p2]) and (ABS(p1-p2) <= diff) then begin
test[p1]:= True;
Inc (hit); // increment the hit count next positions
Inc (p1); Inc (p2);
if p1 > l1 then p1:= 1;
end else begin
// Set test array
test[p1]:= False;
Inc (p1);
// Loop back to next test position if end of the string
if p1 > l1 then begin
while (p1 > 1) and not (test[p1]) do Dec (p1);
Inc (p2)
end;
end;
end else begin
Inc (p1);
// Loop back to next test position if end of string
if p1 > l1 then begin
repeat Dec (p1); until (p1 = 1) or test[p1];
Inc (p2);
end;
end;
until p2 > Length(s2);
// calc procentual value
Result:= 100 * hit DIV l1;
end;
| Zurück: Strings in case statements | Weiter: A litte bit of bits... |
|---|











You may feel disappointed somehow, as most of my pages are written in German. And it’s true, they should be understandable to much more people than they are now. Well, the truth is, I would provide two versions if I had the time to maintain both of them…