GMMNに基づく音声合成におけるグラム行列のスパース近似の検討

郡山, 知樹; 高道, 慎之介; 小林, 隆夫

WEKO3

インデックスツリー

RootNode

アイテム

GMMNに基づく音声合成におけるグラム行列のスパース近似の検討

https://ipsj.ixsq.nii.ac.jp/records/194516

名前 / ファイル	ライセンス	アクション
IPSJ-SLP19126001.pdf (760.8 kB)	Copyright (c) 2019 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2019-02-20

タイトル

GMMNに基づく音声合成におけるグラム行列のスパース近似の検討

タイトル

言語

タイトル

A Study of Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

東京工業大学工学院情報通信系

著者所属

東京大学大学院情報理工学研究科システム情報学専攻

著者所属

東京工業大学工学院情報通信系

著者名

郡山, 知樹
高道, 慎之介
小林, 隆夫

論文抄録

内容記述タイプ

Other

内容記述

人間の音声生成のように発話間変動を持つ音声合成の実現を目標とし，我々はこれまでに，生成的モーメントマッチングネットワーク (GMNN) に基づく音声パラメータのランダム生成手法を提案している．GMMN では分布間の距離を表す条件付き maximum mean discrepancy (CMMD) を最小にするようにニューラルネットワークを学習する．音声合成のように学習データのサイズが大きい場合，CMMD を直接求めることは計算量の観点から非現実的であり何らかの近似を行う必要があったが，これまで近似手法について十分な検討が行われていなかった．本研究では CMMD の計算手法として，変数同士の類似度を表すグラム行列に random Fourier features (RFF) を用いる近似手法を提案し従来のブロック対角近似手法との比較を行う．またミニバッチの選択手法して，従来のランダム選択の代わりに K-means クラスタリングを用いて，類似した入力変数を同じミニバッチとする手法を検討する．主観評価実験では提案法が従来法に比べ，発話間変動が知覚されやすいという結果を得た．

論文抄録(英)

内容記述タイプ

Other

内容記述

To realize human-like synthetic speech, synthetic speech samples should change every time even if the same sentences is spoken. In this context, we have proposed a technique of random sampling of synthetic speech parameters based on generative moment matching network (GMMN). GMMN is a neural network whose parameters are trained using conditional maximum mean discrepancy (CMMD) which represents the distance of two distributions. An issue of GMMN is that CMMD is computationally infeasible for a large amount of data, including speech synthesis database. In this report, we propose an approximation method based on random Fourier features and minibatch selection technique using K-means clustering. In the subjective evaluations, the proposed method outperformed the conventional one in the perception of inter-speech diversity.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2019-SLP-126, 号 1, p. 1-6, 発行日 2019-02-20

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 23:27:20.769901

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

GMMNに基づく音声合成におけるグラム行列のスパース近似の検討

× 郡山, 知樹

× 高道, 慎之介

× 小林, 隆夫

Versions

Share

Cite as

エクスポート