Difficult Name Pairs: Six Cases the Generator Cannot Solve Alone

Name: Ship Name Lab
Author: Ship Name Lab Research Desk

Ship Name Lab Research DeskPublished on July 3, 2026

A case study of short, compound, accented, and non-Latin name pairs, including current outputs and the human review each one still needs.

A trustworthy naming tool should publish where it struggles. We selected six cases from the Name Pair Benchmark because each exposes a different limit in deterministic blending.

The results below use engine 2026.07.03-3. They are observations from one version, not permanent recommendations.

Case 1: Bo + Jo

Current leader: Bojo
Structural score: 77

Two-letter names leave one internal cut each. The engine can preserve both full inputs, but almost every alternative either repeats a letter or hides half of a source.

Human decision: Decide whether a four-letter full join is actually better than Bo/Jo. The slash form may be clearer and no less compact.

Case 2: Ann + Lee

Current leader: Anee
Structural score: 94

Short names create another issue: removing an overlapping letter can make a blend smooth while weakening recognition. Anee is compact, but a reader may interpret it as an ordinary given name rather than a pairing.

Human decision: Test recognition without showing the inputs. If nobody sees both sources, clarity should beat the score.

Case 3: Chloe + Mason

Current leader: Masloe
Structural score: 97

The first engine version overrated Masohloe because "hloe" preserved much of Chloe. That was a useful failure: source length is not the same as pronounceability. Version 2026.07.03-3 penalizes suffix fragments that begin with a difficult consonant pair.

Masloe is structurally cleaner, but it is still not an obvious spoken word.

Human decision: Compare Masloe with Chason, Chlason, and plain Chloe/Mason. Ask a reader to pronounce each before revealing the source names.

Case 4: Mary-Jane + O'Connor

Current leader: Marnor
Structural score: 97

The engine removes punctuation before making cuts. That keeps comparison deterministic, but it also erases information: Mary-Jane is compound and O'Connor contains a meaningful apostrophe.

Human decision: Choose which source units matter before generating. Mary + Connor, Jane + Connor, or a surname-based pairing may be more faithful than normalized full inputs.

Case 5: Zoë + Chloé

Current leader: Zoloé
Structural score: 93
Warnings: 1

Unicode letters and diacritics are retained. The warning exists because the engine's vowel and consonant rules are intentionally limited to basic Latin letters. It can measure character contribution but cannot claim French pronunciation knowledge.

Human decision: Ask a speaker familiar with the names. Do not remove accents solely to satisfy an English-centric score.

Case 6: 小明 + 小红

Current leader: 小明小红
Structural score: 76
Warnings: 1

The engine can slice and join Unicode characters, but its current readability model cannot evaluate Chinese morphology, pronunciation, wordplay, or name-order conventions. A full join wins because structural heuristics have too little legitimate information.

Human decision: Treat the output as evidence that this engine is the wrong authority for the task. Use a relevant-language speaker and community conventions.

What changed because of this review

This case study caused three concrete engine changes:

Three-letter consonant runs now reduce readability.
Suffix fragments beginning with a two-consonant cut receive a join penalty.
Source balance now outweighs mechanical join quality.

The overall score is also calibrated below 100. A tool that cannot evaluate meaning should not imply certainty.

What remains unsolved

The next benchmark revisions need human-rated pronunciation data, community-specific naming conventions, and a clearer distinction between "generate a blend" and "find the established pairing name." We will not infer those from anonymous usage without review.

The machine-readable benchmark is available from the benchmark page. Corrections can be sent to [email protected] with the pair, result, language or fandom context, and engine version.

Last reviewed: July 3, 2026
Change note: First publication. Documents failures found while building benchmark version 2026.07.03-3.