Cross-Language Name Blending: What This Engine Cannot Judge
A technical limits report on Unicode normalization, accented names, non-Latin scripts, pronunciation, and why character support is not language support.
A form can accept Unicode text and still be unqualified to judge names in that language. Ship Name Lab makes that distinction explicit.
The current engine preserves letters from many scripts, but its pronunciation heuristics are built around a small Latin vowel and consonant model. A non-Latin result is generated as a character combination and automatically marked for human review.
What normalization does
Before generation, the engine applies NFKC normalization. Unicode Standard Annex #15 defines Unicode normalization forms and explains how equivalent or compatibility-related sequences can be transformed into consistent representations.
For this tool, normalization reduces accidental differences such as width variants before cuts are compared. It does not translate a name, determine its language, or prove that two visually similar characters have the same cultural meaning.
After normalization, spaces, digits, punctuation, and symbols are removed. Letters and combining marks remain. This makes candidate generation repeatable, but it can also remove intentional separators from compound names.
What the Latin heuristic sees
For basic Latin text, the engine classifies a, e, i, o, u, and y as vowels. It uses those classes to find possible cut points and score the join between two fragments.
That model is deliberately narrow:
- accented letters are retained but are not all classified as vowels;
- transliteration is not performed;
- stress and syllable structure are unknown;
- digraphs and language-specific sound rules are unknown;
- character meaning and wordplay are unknown.
The score can therefore measure source length while still missing pronunciation.
Three benchmark observations
Zoë + Chloé
The current leader is Zoloé, score 93, with one language-review warning. The accents survive, but the English-centric boundary model cannot determine whether the blend is natural in French or in the speakers' actual language context.
はるか + れん
The current leader is the full join はるかれん, score 76, with one warning. The engine can join the character sequences. It does not analyze morae, reading, name convention, or the meaning of the result.
小明 + 小红
The current structural leader is the full join 小明小红, score 76, with one warning. This is not evidence that the full join is a good Chinese pairing name. It is evidence that the present model lacks enough legitimate language information to rank more aggressive cuts.
All three runs are visible in the Name Pair Benchmark.
A safer review protocol
When either input contains a script or pronunciation system the engine does not model:
- Preserve the original spelling and accents.
- Treat every score as structural only.
- Ask a speaker familiar with the names to read the result.
- Check whether name order carries meaning in the relevant community.
- Search the result in the original script.
- Prefer full-name notation when a blend loses identity or creates an unintended word.
Do not transliterate solely to obtain a higher score. Transliteration creates a second naming decision and may remove information the represented people consider essential.
Product boundary
Ship Name Lab will not relabel Unicode acceptance as multilingual intelligence. Language-specific ranking should be added only with reviewed examples, native-speaker evaluation, and separate documented tests. Until then, the warning is part of the result, not an error to suppress.
Primary technical source: Unicode Standard Annex #15: Unicode Normalization Forms
Benchmark: Current cross-script cases
Last reviewed: July 3, 2026
Engine checked: 2026.07.03-3
Change note: First publication. Separates Unicode character handling from language-specific judgment.