LeetCode Problem

How to Solve Count Pairs Of Similar Strings

The clean solution for Count Pairs Of Similar Strings is to reduce each word to its unique character set, then count how many earlier words share that same signature. A bitmask works especially well because each lowercase letter maps to one bit, so equivalent words like aba and aabb collapse into the same key. This turns repeated pairwise comparison into one array scan plus hash lookup.

GhostInterview Help

Need help with Count Pairs Of Similar Strings without spending extra time grinding it?

GhostInterview can read Count Pairs Of Similar Strings from a screenshot, generate the answer path, explain the complexity, and support solver-first interview workflows when you need direct help fast.

Screenshot Input

Capture the prompt fast instead of rewriting the problem by hand.

Answer + Complexity

Get the solution path, trade-offs, and complexity summary in one pass.

Stealth Workflow

Stay outside captured layers on supported screen-share workflows.

Problem #2506Array scanning plus hash lookupReviewed 2026-03-08
Difficulty
Easy
Primary pattern
Array scanning plus hash lookup
Answer-first problem summary
Step-by-step approach and complexity
GhostInterview solver workflow

The clean solution for Count Pairs Of Similar Strings is to reduce each word to its unique character set, then count how many earlier words share that same signature. A bitmask works especially well because each lowercase letter maps to one bit, so equivalent words like aba and aabb collapse into the same key. This turns repeated pairwise comparison into one array scan plus hash lookup.

Problem Statement

You get a 0-indexed array called words. Two words are considered similar when they contain exactly the same distinct letters, even if the counts and order are different. For example, aba and aabb are similar because both use only a and b, while abcd is not similar to bac because one has d and the other does not.

Your task is to count how many index pairs (i, j) satisfy i < j and produce similar words. A direct comparison of every pair works on small input, but this problem is designed to reward collapsing each word into a reusable signature of its character set before counting matches.

Examples

Example 1

Input: words = ["aba","aabb","abcd","bac","aabc"]

Output: 2

There are 2 pairs that satisfy the conditions:

  • i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'.
  • i = 3 and j = 4 : both words[3] and words[4] only consist of characters 'a', 'b', and 'c'.

Example 2

Input: words = ["aabb","ab","ba"]

Output: 3

There are 3 pairs that satisfy the conditions:

  • i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'.
  • i = 0 and j = 2 : both words[0] and words[2] only consist of characters 'a' and 'b'.
  • i = 1 and j = 2 : both words[1] and words[2] only consist of characters 'a' and 'b'.

Example 3

Input: words = ["nba","cba","dba"]

Output: 0

Since there does not exist any pair that satisfies the conditions, we return 0.

Constraints

  • 1 <= words.length <= 100
  • 1 <= words[i].length <= 100
  • words[i] consist of only lowercase English letters.

Solution Approach

Build a signature for each word

Scan each word and record which lowercase letters appear. The key detail is that frequency does not matter in Count Pairs Of Similar Strings, only presence. You can store the set as a 26-bit mask or as a normalized character-set string, but the mask is simpler and faster.

Count matches while scanning the array

For each word, compute its signature and look it up in a hash map. If that signature has already appeared k times, then the current word forms k new similar pairs with earlier words. After adding k to the answer, increment the stored count for that signature.

Why this beats pairwise comparison

The common failure mode is comparing every word against every other word and rebuilding sets again and again. That repeats the same work. By converting each word once and using hash lookup, you turn the problem into grouping equal signatures during one pass through the array.

Complexity Analysis

MetricValue
TimeDepends on the final approach
SpaceDepends on the final approach

Let n be the number of words and m be the maximum word length. Computing one signature takes O(m), so the full scan costs O(n * m). Hash lookups are O(1) on average, and the extra space is O(n) in the worst case when every word produces a different signature.

What Interviewers Usually Probe

  • They ask how to check whether two strings are similar without sorting entire words.
  • They hint that character frequency is irrelevant, so duplicates inside one word should collapse away.
  • They want you to replace repeated pairwise set construction with a reusable signature plus hash counting.

Common Pitfalls or Variants

Common pitfalls

  • Using raw words as map keys instead of a character-set signature, which misses pairs like ab and ba.
  • Counting letter frequency instead of letter presence, which incorrectly separates aba from aabb.
  • Running nested comparisons over all pairs and rebuilding sets every time, which adds avoidable work for this grouping problem.

Follow-up variants

  • Return the grouped words instead of only the pair count by storing index lists for each signature.
  • Change the alphabet size, which keeps the same hash-grouping idea but may replace the 26-bit mask representation.
  • Count pairs where one word's character set is a subset of another, which changes equality lookup into subset matching.

How GhostInterview Helps

  • GhostInterview identifies that Count Pairs Of Similar Strings is a signature-grouping problem, not a frequency-matching problem.
  • GhostInterview shows why a 26-bit mask is the safest key for collapsing words like aba, aabb, and ab into one bucket.
  • GhostInterview catches pair-counting mistakes by mapping each new word to existing signature frequency before incrementing the bucket.

Topic Pages

FAQ

What is the main pattern in Count Pairs Of Similar Strings?

The main pattern is array scanning plus hash lookup. Convert each word into a signature of its distinct letters, then count how many earlier words already have that same signature.

Why is a bitmask a good fit here?

The words use only lowercase English letters, so each letter can map to one of 26 bits. That makes it easy to represent the full character set of a word in one integer and compare similar words quickly.

Do repeated letters matter in this problem?

No. Similarity depends only on whether a letter appears at least once. That is why aba, aabb, and ab all share the same signature.

Could I solve it by sorting each word?

You could sort and deduplicate characters in each word to build a canonical key, but that does extra work compared with a direct presence-based mask. The bitmask approach matches the problem definition more directly.

Why not compare every pair of words directly?

Direct comparison works functionally, but it repeats character-set construction across many pairs. Grouping by signature is cleaner because each word is processed once, and every match is counted through the hash map.

GhostInterview Solver

Need direct help with Count Pairs Of Similar Strings instead of spending more time grinding it?

Download GhostInterview when you want a LeetCode solver, not another long practice loop. Capture Count Pairs Of Similar Strings from a screenshot, get the answer path and complexity, and use supported stealth workflows that stay outside captured layers.

Screenshot Input

Capture the prompt fast instead of rewriting the problem by hand.

Answer + Complexity

Get the solution path, trade-offs, and complexity summary in one pass.

Stealth Workflow

Stay outside captured layers on supported screen-share workflows.