rapidfuzz.fuzz#

ratio#

rapidfuzz.fuzz.ratio(s1, s2, *, processor=None, score_cutoff=None)#

Calculates the normalized Indel distance.

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

See also

rapidfuzz.string_metric.normalized_levenshtein

Normalized levenshtein distance

Notes

../_images/ratio.svg

Examples

>>> fuzz.ratio("this is a test", "this is a test!")
96.55171966552734

partial_ratio#

rapidfuzz.fuzz.partial_ratio(s1, s2, *, processor=None, score_cutoff=None)#

Searches for the optimal alignment of the shorter string in the longer string and returns the fuzz.ratio for this alignment.

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

Depending on the length of the needle (shorter string) different implementations are used to improve the performance.

short needle (length ≤ 64):

When using a short needle length the fuzz.ratio is calculated for all alignments that could result in an optimal alignment. It is guaranteed to find the optimal alignment. For short needles this is very fast, since for them fuzz.ratio runs in O(N) time. This results in a worst case performance of O(NM).

../_images/partial_ratio_short_needle.svg
long needle (length > 64):

For long needles a similar implementation to FuzzyWuzzy is used. This implementation only considers alignments which start at one of the longest common substrings. This results in a worst case performance of O(N[N/64]M). However usually most of the alignments can be skipped. The following Python code shows the concept:

blocks = SequenceMatcher(None, needle, longer, False).get_matching_blocks()
score = 0
for block in blocks:
    long_start = block[1] - block[0] if (block[1] - block[0]) > 0 else 0
    long_end = long_start + len(shorter)
    long_substr = longer[long_start:long_end]
    score = max(score, fuzz.ratio(needle, long_substr))

This is a lot faster than checking all possible alignments. However it only finds one of the best alignments and not necessarily the optimal one.

../_images/partial_ratio_long_needle.svg

Examples

>>> fuzz.partial_ratio("this is a test", "this is a test!")
100.0

partial_ratio_alignment#

rapidfuzz.fuzz.partial_ratio_alignment(s1, s2, *, processor=None, score_cutoff=None)#

Searches for the optimal alignment of the shorter string in the longer string and returns the fuzz.ratio and the corresponding alignment.

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff None is returned instead. Default is 0, which deactivates this behaviour.

Returns:

alignment – alignment between s1 and s2 with the score as a float between 0 and 100

Return type:

ScoreAlignment, optional

Examples

>>> s1 = "a certain string"
>>> s2 = "cetain"
>>> res = fuzz.partial_ratio_alignment(s1, s2)
>>> res
ScoreAlignment(score=83.33333333333334, src_start=2, src_end=8, dest_start=0, dest_end=6)

Using the alignment information it is possible to calculate the same fuzz.ratio

>>> fuzz.ratio(s1[res.src_start:res.src_end], s2[res.dest_start:res.dest_end])
83.33333333333334

token_set_ratio#

rapidfuzz.fuzz.token_set_ratio(s1, s2, *, processor=<cyfunction default_process>, score_cutoff=None)#

Compares the words in the strings based on unique and common words between them using fuzz.ratio

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is utils.default_process.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/token_set_ratio.svg

Examples

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
83.8709716796875
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100.0

partial_token_set_ratio#

rapidfuzz.fuzz.partial_token_set_ratio(s1, s2, *, processor=<cyfunction default_process>, score_cutoff=None)#

Compares the words in the strings based on unique and common words between them using fuzz.partial_ratio

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is utils.default_process.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/partial_token_set_ratio.svg

token_sort_ratio#

rapidfuzz.fuzz.token_sort_ratio(s1, s2, *, processor=<cyfunction default_process>, score_cutoff=None)#

Sorts the words in the strings and calculates the fuzz.ratio between them

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is utils.default_process.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/token_sort_ratio.svg

Examples

>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100.0

partial_token_sort_ratio#

rapidfuzz.fuzz.partial_token_sort_ratio(s1, s2, *, processor=<cyfunction default_process>, score_cutoff=None)#

sorts the words in the strings and calculates the fuzz.partial_ratio between them

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is utils.default_process.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/partial_token_sort_ratio.svg

token_ratio#

rapidfuzz.fuzz.token_ratio(s1, s2, *, processor=<cyfunction default_process>, score_cutoff=None)#

Helper method that returns the maximum of fuzz.token_set_ratio and fuzz.token_sort_ratio (faster than manually executing the two functions)

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is utils.default_process.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/token_ratio.svg

partial_token_ratio#

rapidfuzz.fuzz.partial_token_ratio(s1, s2, *, processor=<cyfunction default_process>, score_cutoff=None)#

Helper method that returns the maximum of fuzz.partial_token_set_ratio and fuzz.partial_token_sort_ratio (faster than manually executing the two functions)

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is utils.default_process.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/partial_token_ratio.svg

WRatio#

rapidfuzz.fuzz.WRatio(s1, s2, *, processor=<cyfunction default_process>, score_cutoff=None)#

Calculates a weighted ratio based on the other ratio algorithms

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is utils.default_process.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Notes

../_images/WRatio.svg

QRatio#

rapidfuzz.fuzz.QRatio(s1, s2, *, processor=<cyfunction default_process>, score_cutoff=None)#

Calculates a quick ratio between two strings using fuzz.ratio. The only difference to fuzz.ratio is, that this preprocesses the strings by default.

Parameters:
  • s1 (Sequence[Hashable]) – First string to compare.

  • s2 (Sequence[Hashable]) – Second string to compare.

  • processor (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is utils.default_process.

  • score_cutoff (float, optional) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour.

Returns:

similarity – similarity between s1 and s2 as a float between 0 and 100

Return type:

float

Examples

>>> fuzz.QRatio("this is a test", "THIS is a test!")
100.0