ABSTRACT
Everyone uses the computer for one reason or the other. For those with poor eye sight, it is always a problem to read texts from screen, either due to small font-size or bad eye sight. This has led to the design of a text to speech system capable of converting written texts to speech.. TSpeech synthesis systems are often called text-to-speech (TTS) systems in reference to their ability to convert text into speech. However, systems exist that instead render symbolic linguistic representations like phonetic transcriptions into speech. A text-to-speech system is composed of two parts: a front-end and a back-end. Broadly, the front-end takes input in the form of text and outputs a symbolic linguistic representation. The back-end takes the symbolic linguistic representation as input and outputs the synthesized speech waveform . TTS software can "read" text from a document, Web page or e-Book, generating synthesized speech through a computer's speakers. TTS can also convert text files into audio MP3 files that can then be transferred to a portable MP3 player or CD-ROM. This can save time by allowing the user to listen to reports or background materials while performing other tasks. TTS makes a critical difference to those with disabilities such as poor vision or visual dyslexia. People with speech loss can utilize specialized TTS programs to turn typed words into vocalization. TTS programs provide a valuable edge, particularly for learning new languages. This thesis was implemented using the java programming language for front-end design and MySQL for data storage.
CHAPTER ONE
1.1 BACKGROUND OF THE STUDY
Language is the ability to express one’s thoughts by means of a set of signs (text), gestures, and sounds. It is a distinctive feature of human beings, who are the only creatures to use such a system. Speech is the oldest means of communication between people and it is also the most widely used. ‘Speech synthesis’ also called ‘Text to speech synthesis’ is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizerand can be implemented in software. A text-to-speech (TTS)system simply converts text to speech. Many computer operating systems have included speech synthesizers since the early 1990s. Recent progress in speech synthesis has produced synthesizers with very high intelligibility but the sound quality and naturalness still remain a major problem. However, the quality of present products has reached an adequate level for several applications, such as multimedia and telecommunications. The following thesis presents a brief overview of the main text-to-speech synthesis problems, and the initial work done in building a TTS in English.
At first sight, this task does not look too hard to perform. After all we all have a deep knowledge of reading rules of our mother tongue. They were transmitted to us, in a simplified form, at primary school, and we improved them year after year. But in the context of TTS synthesis, it is impossible to record and store all the words of the language. Some other method has to be implemented for this purpose. The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood. A text-to-speech synthesizer allows people with visual impairments and reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s. Astro- physician Stephen Hawkins, who is completely paralyzed, gives all his lectures using a TTS system.
Text-to-speech synthesis -TTS - is the automatic conversion of a text into speech that resembles, as closely as possible, a native speaker of the language reading that text. Text-to-speech/ Audio system is the technology which lets computer speak to you. The TTS system gets the text as the input and then a computer algorithm which called TTS engine analyses the text, pre-processes the text and synthesizes the speech with some mathematical models. The TTS engine usually generates sound data in an audio format as the output. The text-to-speech (TTS) synthesis procedure consists of two main phases. The first is text analysis, where the input text is transcribed into a phonetic or some other linguistic representation, and the second one is the generation of speech waveforms, where the output is produced from this phonetic and prosodic information. These two phases are usually called high and low-level synthesis. The input text might be for example data from a word processor, standard ASCII from e-mail, a mobile text-message, or scanned text from a newspaper. The character string is then pre-processed and analyzed into phonetic representation which is usually a string of phonemes with some additional information for correct intonation, duration, and stress. Speech sound is finally generated with the low-level synthesizer by the information from high-level one. The artificial production of speech-like sounds has a long history, with documented mechanical attempts dating to the eighteenth century. [O' Shaughnessy 2004].
Speech synthesis can be described as artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diaphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output.
1.2 STATEMENT OF THE PROBLEM
The importance of texts cannot be overemphasized. Hardly can anyone pass a message without including one form of text or the other. This is a problem for the visually impaired. They find it hard to read through the texts especially when the font-size is small. This has led to the development of a text to speech conversion system. For those with learning disabilities, some in literary levels, they often get frustrated trying to browse the internet because so much of it is in text form.
Also in some already developed speech synthesizers, the problem area in speech synthesis is very wide. There are several problems in text pre-processing, such as numerals, abbreviations, and acronyms. This system will help solve the problems by using well written synthesis algorithm for the conversion.
Even for people with the visual capability to read, the process can often cause too much strain to be of any use or enjoyment. With text to speech, people with visual impairment can take in all manner of content in comfort instead of strain.
1.3 OBJECTIVES OF STUDY
The main objective of the paper is to design and implement a Text-to-Speech/Audio System. The Speech/Audio systemfocuses precisely on the following objectives:
- To Design and Implement a Speech synthesizer that converts text to audio.
- To Design and Implement a System that can read out text in any frequency that user specifies.
- To design and implement a speech synthesizer that can read out text in both female and male voices.
1.4 SIGNIFICANCE OF THE STUDY
The significance of this study is:
1. The application will build a platform to aid people with disabilities especially on reading and also help get information easily without any stress.
2. The project could also help children learn how to pronounce words and how to read.
3. The study will serve as a foundation and guide to other research students interested in researching on Text-to-Speech systems.