Intl.Segmenter【国際化：テキスト区切り】オブジェクト

メモ コンストラクタ・メソッド プロパティ一覧 メソッド一覧 例

メモ

概要

テキストを区切り単位 (書記素・単語・文) で分割
- 書記素：文字・数字・記号等の最小単位
サロゲートペア ・emoji (絵文字) 対応〔例〕
- 文字数の取得可
- String【文字列】のサロゲートペア・emoji (絵文字) も参照
Firefox 114.0.2 (2023-6-20) は未対応

基本操作

〔例〕

Intl.Segmenter【国際化：テキスト区切り】を new で生成
segment【セグメント化】メソッドで Segments【セグメントコレクション】取得
Segments【セグメントコレクション】から個々のセグメント要素取得 (下記で繰り返し)
- [ @@iterator ]【イテレータ作成】でイテレータ処理
- for-of【プロパティ値反復処理】で処理 (処理は上記参照)

外部リンク

ECMA-402 (英語)
Segmenter Objects
ES2024 Intl (11) ES2023 Intl (10) ES2022 Intl (9)
Unicode® Emoji (英語)
- Unicode® Emoji Charts

Segmenter Objects
ES2024 Intl (11)	ES2023 Intl (10)	ES2022 Intl (9)

コンストラクタ・メソッド

構文	備考
new Intl.Segmenter( [ locales [ , options ] ] )	コンストラクタ

Segmenter【国際化：テキスト区切り】メソッド		備考
Intl.Segmenter.prototype.	resolvedOptions ( )	ロケール・オプション取得
Intl.Segmenter.prototype.	segment ( string )	セグメント化
Intl.Segmenter.	supportedLocalesOf ( locales [ , options ] )	サポートロケール取得

Segments【セグメントコレクション】メソッド		備考
Intl.Segments.prototype	[ @@iterator ] ( )	イテレータ作成実装：`segments` [ Symbol.iterator ]( )
Intl.Segments.prototype.	containing ( index )	インデックス位置セグメント取得

プロパティ

プロパティ		備考
Intl.Segmenter.prototype	[ @@toStringTag ]	タグ (デフォルト：'Intl.Segmenter') 実装：`segmenter` [ Symbol.toStringTag ]
Intl.Segmenter.prototype.	constructor	コンストラクタ定義
Intl.Segmenter.	prototype	プロトタイプ

new Intl.Segmenter【コンストラクタ】

メモ

概要

Intl.Segmenter【国際化：テキスト区切り】オブジェクトを生成
- segment【セグメント化】でテキスト区切り

外部リンク

ECMA-402 (英語)
Intl.Segmenter ( [ locales [ , options ] ] )
ES2024 Intl (11) ES2023 Intl (10) ES2022 Intl (9)
BCP 47 (Best Current Practice) [英語]
- BCP 47
- The Registry
ISO 639-1コード一覧

Intl.Segmenter ( [ locales [ , options ] ] )
ES2024 Intl (11)	ES2023 Intl (10)	ES2022 Intl (9)

構文

new Intl.Segmenter( [locales[, options]] )

locales ロケール (BCP 47 の言語タグ等)〔実装依存〕
     省略：デフォルトのロケール〔実装依存〕
     文字列：１ロケール指定
     文字列の配列：複数指定可 (適切な１ロケールを自動選択)
options オプション の組合せ

locales (ロケール) 実装依存

BCP 47 の言語タグ (一例)

値	備考
ja	日本語
ja-JP	日本語 (日本)
en-US	英語 (アメリカ)
en-GB	英語 (イギリス)
de-DE	ドイツ語 (ドイツ)
fr-FR	フランス語 (フランス)

ISO 639-1・639-2 (言語コード) 一例

ISO 639-1	ISO 639-2	ISO 639-3	備考
ja	jpn	jpn	日本語
en	eng	eng	英語
de	deu ger	deu	ドイツ語
fr	fra fre	fra	フランス語

options (オプション)

オプション	値 (太字：デフォルト値)	説明
granularity	'grapheme'：書記素 (最小単位) 'word'：単語 'sentence'：文	区切り単位
localeMatcher	'lookup'：Lookupアルゴリズム 'best fit'：最適アルゴリズム (実装依存)	ロケールマッチングアルゴリズム (実装依存)

例

const strJ = '隣の客はよく柿食う客だ。坊主が屏風に上手に坊主の絵を描いた。';
const segSentenceJ = new Intl.Segmenter('ja-JP', { granularity: 'sentence' });
for (const segment of segSentenceJ.segment(strJ)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 '隣の客はよく柿食う客だ。'
// 12 '坊主が屏風に上手に坊主の絵を描いた。'

const strE = 'She sells sea shells by the seashore. Peter Piper picked a peck of pickled peppers.';
const segSentenceE = new Intl.Segmenter('en-US', { granularity: 'sentence' });
for (const segment of segSentenceE.segment(strE)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 'She sells sea shells by the seashore. '
// 38 'Peter Piper picked a peck of pickled peppers.'

Intl.Segmenter.prototype.resolvedOptions【ロケール・オプション取得】

メモ

概要

ロケール・オプションを取得

外部リンク

ECMA-402 (英語)
Intl.Segmenter.prototype.resolvedOptions ( )
ES2024 Intl (11) ES2023 Intl (10) ES2022 Intl (9)

Intl.Segmenter.prototype.resolvedOptions ( )
ES2024 Intl (11)	ES2023 Intl (10)	ES2022 Intl (9)

構文

segmenter.resolvedOptions( )

 オブジェクト (下記プロパティ有効)

プロパティ	備考
locale	ロケール
granularity	区切り単位

例

const segJ = new Intl.Segmenter('ja');
console.log(segJ.resolvedOptions());
// 出力例：{locale: 'ja', granularity: 'grapheme'}

const segE = new Intl.Segmenter('en', { granularity: 'sentence' });
console.log(segE.resolvedOptions());
// 出力例：{locale: 'en', granularity: 'sentence'}

Intl.Segmenter.prototype.segment【セグメント化】

メモ

概要

テキストを区切って Segments【セグメントコレクション】取得
個々のセグメント要素は、下記で繰り返し処理
- [ @@iterator ]【イテレータ作成】でイテレータ処理
- for-of【プロパティ値反復処理】で処理 (処理は上記参照)

外部リンク

ECMA-402 (英語)
Intl.Segmenter.prototype.segment ( string )
ES2024 Intl (11) ES2023 Intl (10) ES2022 Intl (9)

Intl.Segmenter.prototype.segment ( string )
ES2024 Intl (11)	ES2023 Intl (10)	ES2022 Intl (9)

構文

segmenter.segment( string )

 Segments【セグメント コレクション】
string テキスト

例

const strJ = '隣の客はよく柿食う客だ。坊主が屏風に上手に坊主の絵を描いた。';
const segSentenceJ = new Intl.Segmenter('ja-JP', { granularity: 'sentence' });
for (const segment of segSentenceJ.segment(strJ)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 '隣の客はよく柿食う客だ。'
// 12 '坊主が屏風に上手に坊主の絵を描いた。'

const strE = 'She sells sea shells by the seashore. Peter Piper picked a peck of pickled peppers.';
const segSentenceE = new Intl.Segmenter('en-US', { granularity: 'sentence' });
for (const segment of segSentenceE.segment(strE)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 'She sells sea shells by the seashore. '
// 38 'Peter Piper picked a peck of pickled peppers.'

Intl.Segmenter.supportedLocalesOf【サポートロケール取得】

メモ

概要

指定ロケールから、サポート対象のロケールを取得

外部リンク

ECMA-402 (英語)
Intl.Segmenter.supportedLocalesOf ( locales [ , options ] )
ES2024 Intl (11) ES2023 Intl (10) ES2022 Intl (9)

Intl.Segmenter.supportedLocalesOf ( locales [ , options ] )
ES2024 Intl (11)	ES2023 Intl (10)	ES2022 Intl (9)

構文

Intl.Segmenter.supportedLocalesOf( locales[, options] ) 

 サポートされるロケールの配列
locales BCP 47 言語タグの文字列 または その配列
(new Intl.Segmenter【コンストラクタ】の ロケール詳細 を参照)
options マッチングオプション (localeMatcher【ロケールマッチングアルゴリズム】)
(new Intl.Segmenter【コンストラクタ】の オプション詳細 を参照)

例

// BCP 47・ISO 639-1
console.log(Intl.Segmenter.supportedLocalesOf('ja'));
// 出力例：['ja']

console.log(Intl.Segmenter.supportedLocalesOf('ng'));
// 出力例：[]

let locales = ['ng', 'ja', 'ja-JP', 'en', 'en-US', 'en-GB'];
console.log(Intl.Segmenter.supportedLocalesOf(locales));
// 出力例：(5) ['ja', 'ja-JP', 'en', 'en-US', 'en-GB']

// ISO 639-2・ISO 639-3
locales = ['ng', 'jpn', 'jpn-JP', 'eng', 'eng-US', 'eng-GB'];
console.log(Intl.Segmenter.supportedLocalesOf(locales));
// 出力例：(5) ['ja', 'ja-JP', 'en', 'en-US', 'en-GB']

console.log(Intl.Segmenter.supportedLocalesOf(locales, { localeMatcher: 'lookup' }));
// 出力例：(5) ['ja', 'ja-JP', 'en', 'en-US', 'en-GB']

Intl.Segmenter.prototype [ @@iterator ]【イテレータ作成】

メモ

概要

イテレータオブジェクト作成
- Segments【セグメントコレクション】の要素アクセス

外部リンク

ECMA-402 (英語)
%SegmentsPrototype% [ @@iterator ] ( )
ES2024 Intl (11) ES2023 Intl (10) ES2022 Intl (9)

%SegmentsPrototype% [ @@iterator ] ( )
ES2024 Intl (11)	ES2023 Intl (10)	ES2022 Intl (9)

構文

segments[ Symbol.iterator ]( )

 イテレータオブジェクト

for (const segment of segments ) {
  segment (セグメント オブジェクト) の処理
}

セグメントオブジェクト

プロパティ	型	備考
segment	String	セグメント
index	Number	インデックス
input	String	入力文字列
isWordLike	Boolean	単語相当判定 (実装依存) true：単語相当 false：その他 (granularity【区切り単位】が 'word'【単語】の場合のみ)

例

const strJ = '隣の客はよく柿食う客だ。';
const segWordJ = new Intl.Segmenter('ja-JP', { granularity: 'word' });
const segmentsJ = segWordJ.segment(strJ);

// [@@iterator]()
const iteratorJ = segmentsJ[Symbol.iterator]();
let result = iteratorJ.next();
while (! result.done) {
  console.log(result.value.index, result.value.segment, result.value.isWordLike);
  result = iteratorJ.next();
}
// 出力：
// 0 '隣' true
// 1 'の' true
// 2 '客' true
// 3 'は' true
// 4 'よく' true
// 6 '柿' true
// 7 '食う' true
// 9 '客' true
// 10 'だ' true
// 11 '。' false

// for-of
for (const segment of segmentsJ) {
  console.log(segment.index, segment.segment, segment.isWordLike);
}
// 出力：
// 0 '隣' true
// 1 'の' true
// 2 '客' true
// 3 'は' true
// 4 'よく' true
// 6 '柿' true
// 7 '食う' true
// 9 '客' true
// 10 'だ' true
// 11 '。' false

const strE = 'She sells sea shells by the seashore.';
const segWordE = new Intl.Segmenter('en-US', { granularity: 'word' });
const segmentsE = segWordE.segment(strE);

// [@@iterator]()
const iteratorE = segmentsE[Symbol.iterator]();
result = iteratorE.next();
while (! result.done) {
  console.log(result.value.index, result.value.segment, result.value.isWordLike);
  result = iteratorE.next();
}
// 出力：
// 0 'She' true
// 3 ' ' false
// 4 'sells' true
// 9 ' ' false
// 10 'sea' true
// 13 ' ' false
// 14 'shells' true
// 20 ' ' false
// 21 'by' true
// 23 ' ' false
// 24 'the' true
// 27 ' ' false
// 28 'seashore' true
// 36 '.' false

// for-of
for (const segment of segmentsE) {
  console.log(segment.index, segment.segment, segment.isWordLike);
}
// 出力：
// 0 'She' true
// 3 ' ' false
// 4 'sells' true
// 9 ' ' false
// 10 'sea' true
// 13 ' ' false
// 14 'shells' true
// 20 ' ' false
// 21 'by' true
// 23 ' ' false
// 24 'the' true
// 27 ' ' false
// 28 'seashore' true
// 36 '.' false

Intl.Segments.prototype.containing【インデックス位置セグメント取得】

メモ

概要

指定インデックス位置の文字を含むセグメントを取得

外部リンク

ECMA-402 (英語)
%SegmentsPrototype%.containing ( index )
ES2024 Intl (11) ES2023 Intl (10) ES2022 Intl (9)

%SegmentsPrototype%.containing ( index )
ES2024 Intl (11)	ES2023 Intl (10)	ES2022 Intl (9)

構文

segments.containing( index )

 セグメント オブジェクト (下記参照)
undefined：範囲外
index (0～) インデックス位置

(セグメントオブジェクト)

プロパティ	型	備考
segment	String	セグメント
index	Number	インデックス
input	String	入力文字列
isWordLike	Boolean	単語相当判定 (実装依存) true：単語相当 false：その他 (granularity【区切り単位】が 'word'【単語】の場合のみ)

例

const strJ = '隣の客はよく柿食う客だ。';
const segWordJ = new Intl.Segmenter('ja-JP', { granularity: 'word' });
const segmentsJ = segWordJ.segment(strJ);
for (const segment of segmentsJ) {
  console.log(segment.index, segment.segment, segment.isWordLike);
}
// 出力：
// 0 '隣' true
// 1 'の' true
// 2 '客' true
// 3 'は' true
// 4 'よく' true
// 6 '柿' true
// 7 '食う' true
// 9 '客' true
// 10 'だ' true
// 11 '。' false
console.log(segmentsJ.containing(-99));
// 出力：undefined
console.log(segmentsJ.containing(4));
// 出力：{segment: 'よく', index: 4, input: '隣の客はよく柿食う客だ。', isWordLike: true}
console.log(segmentsJ.containing(5));
// 出力：{segment: 'よく', index: 4, input: '隣の客はよく柿食う客だ。', isWordLike: true}
console.log(segmentsJ.containing(99));
// 出力：undefined

const strE = 'She sells sea shells by the seashore.';
const segWordE = new Intl.Segmenter('en-US', { granularity: 'word' });
const segmentsE = segWordE.segment(strE);
for (const segment of segmentsE) {
  console.log(segment.index, segment.segment, segment.isWordLike);
}
// 出力：
// 0 'She' true
// 3 ' ' false
// 4 'sells' true
// 9 ' ' false
// 10 'sea' true
// 13 ' ' false
// 14 'shells' true
// 20 ' ' false
// 21 'by' true
// 23 ' ' false
// 24 'the' true
// 27 ' ' false
// 28 'seashore' true
// 36 '.' false
console.log(segmentsE.containing(-99));
// 出力：undefined
console.log(segmentsE.containing(10));
// 出力：{segment: 'sea', index: 10, input: 'She sells sea shells by the seashore.', isWordLike: true}
console.log(segmentsE.containing(11));
// 出力：{segment: 'sea', index: 10, input: 'She sells sea shells by the seashore.', isWordLike: true}
console.log(segmentsE.containing(12));
// 出力：{segment: 'sea', index: 10, input: 'She sells sea shells by the seashore.', isWordLike: true}
console.log(segmentsE.containing(99));
// 出力：undefined

例

基本処理

const strJ = '隣の客はよく柿食う客だ。坊主が屏風に上手に坊主の絵を描いた。';
const segGraphemeJ = new Intl.Segmenter('ja-JP', { granularity: 'grapheme' });
const segWordJ = new Intl.Segmenter('ja-JP', { granularity: 'word' });
const segSentenceJ = new Intl.Segmenter('ja-JP', { granularity: 'sentence' });

for (const segment of segGraphemeJ.segment(strJ)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 '隣'
// 1 'の'
// 2 '客'
// 3 'は'
// (省略)
// 26 '描'
// 27 'い'
// 28 'た'
// 29 '。'

for (const segment of segWordJ.segment(strJ)) {
  console.log(segment.index, segment.segment, segment.isWordLike);
}
// 出力：
// 0 '隣' true
// 1 'の' true
// 2 '客' true
// 3 'は' true
// 4 'よく' true
// 6 '柿' true
// 7 '食う' true
// 9 '客' true
// 10 'だ' true
// 11 '。' false
// 12 '坊主' true
// 14 'が' true
// 15 '屏風' true
// 17 'に' true
// 18 '上手' true
// 20 'に' true
// 21 '坊主' true
// 23 'の' true
// 24 '絵' true
// 25 'を' true
// 26 '描' true
// 27 'い' true
// 28 'た' true
// 29 '。' false

for (const segment of segSentenceJ.segment(strJ)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 '隣の客はよく柿食う客だ。'
// 12 '坊主が屏風に上手に坊主の絵を描いた。'

const strE = 'She sells sea shells by the seashore. Peter Piper picked a peck of pickled peppers.';
const segGraphemeE = new Intl.Segmenter('en-US', { granularity: 'grapheme' });
const segWordE = new Intl.Segmenter('en-US', { granularity: 'word' });
const segSentenceE = new Intl.Segmenter('en-US', { granularity: 'sentence' });

for (const segment of segGraphemeE.segment(strE)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 'S'
// 1 'h'
// 2 'e'
// 3 ' '
// (省略)
// 79 'e'
// 80 'r'
// 81 's'
// 82 '.'

for (const segment of segWordE.segment(strE)) {
  console.log(segment.index, segment.segment, segment.isWordLike);
}
// 出力：
// 0 'She' true
// 3 ' ' false
// 4 'sells' true
// 9 ' ' false
// 10 'sea' true
// 13 ' ' false
// 14 'shells' true
// 20 ' ' false
// 21 'by' true
// 23 ' ' false
// 24 'the' true
// 27 ' ' false
// 28 'seashore' true
// 36 '.' false
// 37 ' ' false
// 38 'Peter' true
// 43 ' ' false
// 44 'Piper' true
// 49 ' ' false
// 50 'picked' true
// 56 ' ' false
// 57 'a' true
// 58 ' ' false
// 59 'peck' true
// 63 ' ' false
// 64 'of' true
// 66 ' ' false
// 67 'pickled' true
// 74 ' ' false
// 75 'peppers' true
// 82 '.' false

for (const segment of segSentenceE.segment(strE)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 'She sells sea shells by the seashore. '
// 38 'Peter Piper picked a peck of pickled peppers.'

サロゲートペア・絵文字

const segGrapheme = new Intl.Segmenter('ja-JP');

// サロゲートペア
const str =
  // U+53F1 (第1水準)・U+308B
  '叱る'
  // U+20B9F (第3水準)・U+308B
  + '𠮟る'
  + '。';
console.log(str);
// 出力：叱る𠮟る。
// String.length NG
console.log(str.length);
// 出力：6
console.log([...segGrapheme.segment(str)].length);
// 出力：5
for (const segment of segGrapheme.segment(str)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 '叱'
// 1 'る'
// 2 '𠮟'
// 4 'る'
// 5 '。'

// 絵文字
const strEmoji =
  // 👍 thumbs up (2)
  String.fromCodePoint(0x1F44D)
  // #️⃣ keycap: # (3)
  + String.fromCodePoint(0x0023, 0xFE0F, 0x20E3)
  // 👍🏽 thumbs up + Color (4)
  + String.fromCodePoint(0x1F44D, 0x1F3FD)
  // 🏴‍☠️ pirate flag (5)
  + String.fromCodePoint(0x1F3F4, 0x200D, 0x2620, 0xFE0F)
  // 🕵️‍♂️ man detective (6)
  + String.fromCodePoint(0x1F575, 0xFE0F, 0x200D, 0x2642, 0xFE0F)
  // 👁️‍🗨️ eye in speech bubble (7)
  + String.fromCodePoint(0x1F441, 0xFE0F, 0x200D, 0x1F5E8, 0xFE0F)
  // 🧑‍🤝‍🧑 people holding hands (8)
  + String.fromCodePoint(0x1F9D1, 0x200D, 0x1F91D, 0x200D, 0x1F9D1)
  // 👨‍👩‍👧‍👦 family: man, woman, girl, boy (11)
  + String.fromCodePoint(0x1F468, 0x200D, 0x1F469, 0x200D, 0x1F467, 0x200D, 0x1F466)
  // 🏴󠁧󠁢󠁥󠁮󠁧󠁿 flag: England (14)
  + String.fromCodePoint(0x1F3F4, 0xE0067, 0xE0062, 0xE0065, 0xE006E, 0xE0067, 0xE007F)
  // (1)
  + '.';
console.log(strEmoji);
// 出力：👍#️⃣👍🏽🏴‍☠️🕵️‍♂️👁️‍🗨️🧑‍🤝‍🧑👨‍👩‍👧‍👦🏴󠁧󠁢󠁥󠁮󠁧󠁿.
// String.length NG
console.log(strEmoji.length);
// 出力：61
console.log([...segGrapheme.segment(strEmoji)].length);
// 出力：10
for (const segment of segGrapheme.segment(strEmoji)) {
  console.log(segment.index, segment.segment);
}
// 出力：
// 0 '👍'
// 2 '#️⃣'
// 5 '👍🏽'
// 9 '🏴‍☠️'
// 14 '🕵️‍♂️'
// 20 '👁️‍🗨️'
// 27 '🧑‍🤝‍🧑'
// 35 '👨‍👩‍👧‍👦'
// 46 '🏴󠁧󠁢󠁥󠁮󠁧󠁿'
// 60 '.'

特記なし・	ECMAScript® 5.1 (2011/6) 準拠
^{20xx (N)}・^{20xx (N)}	ECMAScript® 20xx (Nth) 準拠
	国際化 ECMAScript® ECMA-402 準拠
^{20xx (Intl)}・^{20xx (Intl)}	ECMAScript® ECMA-402 20xx 準拠
・	策定中
	非推奨
	静的
	読み取り専用
	W3C HTML5 (2014/10/28) 定義
	W3C HTML 5.1 (2016/11/1) 定義
	HTML Living Standard (WHATWG HTML) 定義
	Microsoft 拡張
	Mozilla 拡張
	ブラウザ拡張
	ブラウザ非推奨
[～]	～は省略可能
[～]	[と] はそのまま記述
x\|y	x または y
\|	\| はそのまま記述
Link	別リファレンスリンク
Link	外部リンク

Intl.Segmenter【国際化：テキスト区切り】オブジェクト

メモ

概要

基本操作

関連

外部リンク

コンストラクタ・メソッド

プロパティ

new Intl.Segmenter【コンストラクタ】

メモ

概要

外部リンク

構文

例

Intl.Segmenter.prototype.resolvedOptions【ロケール・オプション取得】

メモ

概要

外部リンク

構文

例

Intl.Segmenter.prototype.segment【セグメント化】

メモ

概要

外部リンク

構文

例

Intl.Segmenter.supportedLocalesOf【サポート ロケール取得】

メモ

概要

外部リンク

構文

例

Intl.Segmenter.prototype [ @@iterator ]【イテレータ作成】

メモ

概要

外部リンク

構文

例

Intl.Segments.prototype.containing【インデックス位置セグメント取得】

メモ

概要

外部リンク

構文

例

例

基本処理

サロゲートペア・絵文字

Intl.Segmenter.supportedLocalesOf【サポートロケール取得】