rapidfuzz.string_metric#
levenshtein#
- rapidfuzz.string_metric.levenshtein(s1, s2, *, weights=(1, 1, 1), processor=None, max=None)#
Calculates the minimum number of insertions, deletions, and substitutions required to change one sequence into the other according to Levenshtein with custom costs for insertion, deletion and substitution
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
weights (Tuple[int, int, int] or None, optional) – The weights for the three operations in the form (insertion, deletion, substitution). Default is (1, 1, 1), which gives all three operations a weight of 1.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
max (int or None, optional) – Maximum distance between s1 and s2, that is considered as a result. If the distance is bigger than max, max + 1 is returned instead. Default is None, which deactivates this behaviour.
- Returns:
distance – distance between s1 and s2
- Return type:
int
- Raises:
ValueError – If unsupported weights are provided a ValueError is thrown
.. deprecated: – 2.0.0: Use
rapidfuzz.distance.Levenshtein.distance()
instead. This function will be removed in v3.0.0.
Examples
Find the Levenshtein distance between two strings:
>>> from rapidfuzz.string_metric import levenshtein >>> levenshtein("lewenstein", "levenshtein") 2
Setting a maximum distance allows the implementation to select a more efficient implementation:
>>> levenshtein("lewenstein", "levenshtein", max=1) 2
It is possible to select different weights by passing a weight tuple.
>>> levenshtein("lewenstein", "levenshtein", weights=(1,1,2)) 3
normalized_levenshtein#
- rapidfuzz.string_metric.normalized_levenshtein(s1, s2, *, weights=(1, 1, 1), processor=None, score_cutoff=None)#
Calculates a normalized levenshtein distance using custom costs for insertion, deletion and substitution.
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
weights (Tuple[int, int, int] or None, optional) – The weights for the three operations in the form (insertion, deletion, substitution). Default is (1, 1, 1), which gives all three operations a weight of 1.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
- Raises:
ValueError – If unsupported weights are provided a ValueError is thrown
.. deprecated: – 2.0.0: Use
rapidfuzz.distance.Levenshtein.normalized_similarity()
instead. This function will be removed in v3.0.0.
See also
levenshtein
Levenshtein distance
Examples
Find the normalized Levenshtein distance between two strings:
>>> from rapidfuzz.string_metric import normalized_levenshtein >>> normalized_levenshtein("lewenstein", "levenshtein") 81.81818181818181
Setting a score_cutoff allows the implementation to select a more efficient implementation:
>>> normalized_levenshtein("lewenstein", "levenshtein", score_cutoff=85) 0.0
It is possible to select different weights by passing a weight tuple.
>>> normalized_levenshtein("lewenstein", "levenshtein", weights=(1,1,2)) 85.71428571428571
When a different processor is used s1 and s2 do not have to be strings
>>> normalized_levenshtein(["lewenstein"], ["levenshtein"], processor=lambda s: s[0]) 81.81818181818181
hamming#
- rapidfuzz.string_metric.hamming(s1, s2, *, processor=None, max=None)#
Calculates the Hamming distance between two strings. The hamming distance is defined as the number of positions where the two strings differ. It describes the minimum amount of substitutions required to transform s1 into s2.
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
max (int or None, optional) – Maximum distance between s1 and s2, that is considered as a result. If the distance is bigger than max, max + 1 is returned instead. Default is None, which deactivates this behaviour.
- Returns:
distance – distance between s1 and s2
- Return type:
int
- Raises:
ValueError – If s1 and s2 have a different length
.. deprecated: – 2.0.0: Use
rapidfuzz.distance.Hamming.distance()
instead. This function will be removed in v3.0.0.
normalized_hamming#
- rapidfuzz.string_metric.normalized_hamming(s1, s2, *, processor=None, score_cutoff=None)#
Calculates a normalized hamming distance
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
- Raises:
ValueError – If s1 and s2 have a different length
See also
hamming
Hamming distance
Use
func:rapidfuzz.distance.Hamming.normalized_similarity instead. This function will be removed in v3.0.0.
jaro_similarity#
- rapidfuzz.string_metric.jaro_similarity(s1, s2, *, processor=None, score_cutoff=None)#
Calculates the jaro similarity
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity (float) – similarity between s1 and s2 as a float between 0 and 100
.. deprecated:: 2.0.0 – Use
rapidfuzz.distance.Jaro.similarity()
instead. This function will be removed in v3.0.0.
jaro_winkler_similarity#
- rapidfuzz.string_metric.jaro_winkler_similarity(s1, s2, *, prefix_weight=0.1, processor=None, score_cutoff=None)#
Calculates the jaro winkler similarity
- Parameters:
s1 (Sequence[Hashable]) – First string to compare.
s2 (Sequence[Hashable]) – Second string to compare.
prefix_weight (float, optional) – Weight used for the common prefix of the two strings. Has to be between 0 and 0.25. Default is 0.1.
processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.
score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.
- Returns:
similarity – similarity between s1 and s2 as a float between 0 and 100
- Return type:
float
- Raises:
ValueError – If prefix_weight is invalid
.. deprecated: – 2.0.0: Use
rapidfuzz.distance.JaroWinkler.similarity()
instead. This function will be removed in v3.0.0.