Skip to content

Intl.Segmenter: isWordLike does not match Node.js and browsers #4370

@kytta

Description

@kytta

Which package?

@formatjs/intl-segmenter

Describe the bug

As described in #4184, the value of isWordLike in the output of Segmenter::segment() is almost always true (the exception being newlines). This does not match the behaviour present in Node.js and Chrome (ICU4C), and Firefox Nightly (ICU4X).

To Reproduce

Codesandbox URL

https://codesandbox.io/p/sandbox/dank-smoke-cv9hl7

Reproducible Steps/Repo

Steps to reproduce the behavior:

  1. Make a string with spaces, dashes, etc.
  2. Segment it with Intl.Segmenter in Chrome or Node.js or Firefox Nightly
  3. Segment it with FormatJS' polyfill
  4. Compare

Expected behavior

FormatJS' polyfill matches Intl.Segmenter, in that it marks punctuation as non-word-like

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions