Add Japanese and trilingual text normalization for numbers and symbols#18
Merged
Abandon-ht merged 5 commits intom5stack:devfrom May 16, 2025
Merged
Add Japanese and trilingual text normalization for numbers and symbols#18Abandon-ht merged 5 commits intom5stack:devfrom
Abandon-ht merged 5 commits intom5stack:devfrom
Conversation
Streamline and simplify code in the SOLA module for improved readability and maintenance
Refactor SOLA component code
Implement regex-based text normalization functionality to support trilingual (CJE) content processing
Add text normalization for Chinese, Japanese, and English
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Implemented Japanese text normalization module to handle numbers, symbols and special characters
Added trilingual (presumably Chinese/English/Japanese) text normalization support
Created regex patterns for converting numbers and symbols into pronounceable text
Integrated the new normalization modules into the existing text processing pipeline
Why
This enhancement improves pronunciation accuracy when synthesizing Japanese content and multilingual text containing numbers and symbols, ensuring more natural-sounding speech output across all supported languages.
Testing
Verified correct normalization of various Japanese numerical expressions
Tested with mixed language text containing numbers and special symbols
Confirmed proper pronunciation of normalized text through synthesized audio output
Compared results against expected pronunciations in each language