twitter

Fuzzy String Compare

Achtung, öffnet in einem neuen Fenster. PDFDruckenE-Mail
You know the problem: A simple type mismatch and the search in the database goes fail, because the search strings did not match… This function compares the single characters, counts the identical characters and calculates a procentual value, how similar the both strings are.

To bring in a little fuzzy factor, the function checks if any identical characters is in the near of the actual compare position. This is calculated in a formula depending on the length of the strings (diff).

Some example results:

"John" and "John" = 100%
"John" and "Jon" = 75%
"Jim" and "James" = 40%
"Luke Skywalker" and "Darth Vader" = 0% (Hmm…)
function StrSimilar (s1, s2: string): Integer;
var hit: Integer; // Number of identical chars
p1, p2: Integer; // Position count
l1, l2: Integer; // Length of strings
pt: Integer; // for counter
diff: Integer; // unsharp factor
hstr: string; // help var for swap strings
test: array [1..255] of Boolean; // Array shows if position is tested
begin
// Test Length and swap, if s1 is smaller we alway search along the longer string
if Length(s1) < Length(s2) then begin
hstr:= s2; s2:= s1; s1:= hstr;
end;
// store length of strings to speed up the function
l1:= Length (s1);
l2:= Length (s2);
p1:= 1; p2:= 1; hit:= 0;
// calc the unsharp factor depending on the length of the strings. Its about a
// third of the whole length
diff:= Max (l1, l2) div 3 + ABS (l1 - l2);
// init the test array
for pt:= 1 to l1 do test[pt]:= False;
// loop through the string
repeat
// position tested?
if not test[p1] then begin
// found a matching character?
if (s1[p1] = s2[p2]) and (ABS(p1-p2) <= diff) then begin
test[p1]:= True;
Inc (hit); // increment the hit count next positions
Inc (p1); Inc (p2);
if p1 > l1 then p1:= 1;
end else begin
// Set test array
test[p1]:= False;
Inc (p1);
// Loop back to next test position if end of the string
if p1 > l1 then begin
while (p1 > 1) and not (test[p1]) do Dec (p1);
Inc (p2)
end;
end;
end else begin
Inc (p1);
// Loop back to next test position if end of string
if p1 > l1 then begin
repeat Dec (p1); until (p1 = 1) or test[p1];
Inc (p2);
end;
end;
until p2 > Length(s2);
// calc procentual value
Result:= 100 * hit DIV l1;
end;

Über helli.de

Planet HelliNeulich stellte mir eine Freundin die Frage, warum jemand eigentlich eine private Homepage hat. So auf Anhieb wusste ich nicht was ich darauf antworten sollte - schließlich betreibe ich diese Site schon seit bald zehn Jahren, und die Gründe haben sich im Lauf der Zeit auch gewandelt.

Weiterlesen...
 

English spoken?

You may feel disappointed somehow, as most of my pages are written in German. And it’s true, they should be understandable to much more people than they are now. Well, the truth is, I would provide two versions if I had the time to maintain both of them…

Weiterlesen...
 

Ich mach was mit Büchern

Ich mach was mit Büchern Bücher haben ein enormes Potenzial, Menschen zu verbinden. Im Internet sind jene Menschen, die sich mit Büchern beruflich oder in ihrer Freizeit intensiv beschäftigen, vielfach nur schwach verbunden. Daher hat Leander Wattig im August 2009 eine Initiative angestoßen, die dazu beitragen soll, die Buchbranche noch besser sichtbar zu machen und die Buchmenschen noch stärker miteinander zu vernetzen.
Unterstützen Sie diese Aktion!