Prediction of speech intelligibility using a neurogram orthogonal polynomial measure (NOPM)

Mamun, N. and Jassim, W.A. and Zilany, M.S.A. (2015) Prediction of speech intelligibility using a neurogram orthogonal polynomial measure (NOPM). IEEE/ACM Transactions on Audio, Speech and Language Processing, 23 (4). pp. 760-773. ISSN 2329-9290, DOI https://doi.org/10.1109/taslp.2015.2401513.

Preview

PDF (Prediction of Speech Intelligibility Using a Neurogram Orthogonal Polynomial Measure (NOPM))
Prediction_of_Speech_Intelligibility_Using_a_Neurogram_Orthogonal.pdf - Other
Download (2MB)

Official URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumb...

Abstract

Sensorineural hearing loss (SNHL) is an increasingly prevalent condition, resulting from damage to the inner ear and causing a reduction in speech intelligibility. This paper proposes a new speech intelligibility prediction metric, the neurogram orthogonal polynomial measure (NOPM). This metric applies orthogonal moments to the auditory neurogram to predict speech intelligibility for listeners with and without hearing loss. The model simulates the responses of auditory-nerve fibers to speech signals under quiet and noisy conditions. Neurograms were created using a physiologically based computational model of the auditory periphery. A well-known orthogonal polynomial measure, Krawtchouk moments, was applied to extract features from the auditory neurogram. The predicted intelligibility scores were compared to subjective results, and NOPM showed a good fit with the subjective scores for normal listeners and also for listeners with hearing loss. The proposed metric has a realistic and wider dynamic range than corresponding existing metrics, such as mean structural similarity index measure and neurogram similarity index measure, and the predicted scores are also well-separated as a function of hearing loss. The application of this metric could be extended for assessing hearing-aid and speech-enhancement algorithms.

Item Type:	Article
Funders:	UNSPECIFIED
Additional Information:	ISI Document Delivery No.: CE4ZY Times Cited: 0 Cited Reference Count: 60 Cited References: Ansi A., 1997, NEW YORK AM NAT STAN, V19, P90 Bondy J., 2003, ADV NEURAL INF PROCE, V15, P1409 Bruce IC, 2003, J ACOUST SOC AM, V113, P369, DOI 10.1121/1.1519544 Bruce IC, 2004, PHYSIOL MEAS, V25, P945, DOI 10.1088/0967-3334/25/4/013 CARNEY LH, 1994, HEARING RES, V76, P31, DOI 10.1016/0378-5955(94)90084-1 CARNEY LH, 1993, J ACOUST SOC AM, V93, P401, DOI 10.1121/1.405620 Chi TS, 1999, J ACOUST SOC AM, V106, P2719, DOI 10.1121/1.428100 Cui XD, 2005, IEEE T SPEECH AUDI P, V13, P1161, DOI 10.1109/TSA.2005.853002 D. C. DARPA U.S.,, 1990, NIST SPEECH DISC Davies-Venn E, 2009, EAR HEARING, V30, P494, DOI 10.1097/AUD.0b013e3181aec5bc Dillon H., 2001, HEARING AIDS Donohue K., 2009, AUDIO SYSTEMS ARRAY Dubno JR, 2005, J ACOUST SOC AM, V118, P914, DOI 10.1121/1.1953107 FESTEN JM, 1990, J ACOUST SOC AM, V88, P1725, DOI 10.1121/1.400247 Fletcher H, 1922, J FRANKL INST, V193, P0729, DOI 10.1016/S0016-0032(22)90319-9 FRENCH NR, 1947, J ACOUST SOC AM, V19, P90, DOI 10.1121/1.1916407 Hines A, 2010, SPEECH COMMUN, V52, P736, DOI 10.1016/j.specom.2010.04.006 Hines A., 2011, SPEECH COMMUN, V54, P306 Holmberg M, 2006, IEEE T AUDIO SPEECH, V14, P43, DOI 10.1109/TSA.2005.860349 Hopkins K, 2009, J ACOUST SOC AM, V125, P442, DOI 10.1121/1.3037233 Hornsby BWY, 2005, J ACOUST SOC AM, V118, P1719, DOI 10.1121/1.1993128 Jassim WA, 2014, IET SIGNAL PROCESS, V8, P891, DOI 10.1049/iet-spr.2013.0322 Jassim WA, 2012, IET SIGNAL PROCESS, V6, P713, DOI 10.1049/iet-spr.2011.0004 Jorgensen S, 2013, J ACOUST SOC AM, V134, P436, DOI 10.1121/1.4807563 Kates JM, 2005, J ACOUST SOC AM, V117, P2224, DOI 10.1121/1.1862575 KHOTANZAD A, 1990, IEEE T ACOUST SPEECH, V38, P1028, DOI 10.1109/29.56063 KHOTANZAD A, 1990, IEEE T PATTERN ANAL, V12, P489, DOI 10.1109/34.55109 Kiang N.-S., 1975, ANN OTO RHINOL LARYN, V85, P752 KRYTER KD, 1946, J ACOUST SOC AM, V18, P413, DOI 10.1121/1.1916380 KRYTER KD, 1962, J ACOUST SOC AM, V34, P1689, DOI 10.1121/1.1909094 law Pawlak M., 2006, IMAGE ANAL MOMENTS R LIBERMAN MC, 1982, J ACOUST SOC AM, V72, P1441, DOI 10.1121/1.388677 LIBERMAN MC, 1978, J ACOUST SOC AM, V63, P442, DOI 10.1121/1.381736 Lorenzi C, 2006, P NATL ACAD SCI USA, V103, P18866, DOI 10.1073/pnas.0607364103 Moon IJ, 2014, J NEUROSCI, V34, P12145, DOI 10.1523/JNEUROSCI.1025-14.2014 Moore BCJ, 2008, JARO-J ASSOC RES OTO, V9, P399, DOI 10.1007/s10162-008-0143-x Nie KB, 2005, IEEE T BIO-MED ENG, V52, P64, DOI 10.1109/TBME.2004.839799 POLLACK I, 1958, J ACOUST SOC AM, V30, P127, DOI 10.1121/1.1909503 Rhebergen KS, 2006, J ACOUST SOC AM, V120, P3988, DOI 10.1121/1.2358008 ROSEN S, 1992, PHILOS T ROY SOC B, V336, P367, DOI 10.1098/rstb.1992.0070 SHANNON RV, 1995, SCIENCE, V270, P303, DOI 10.1126/science.270.5234.303 Smith ZM, 2002, NATURE, V416, P87, DOI 10.1038/416087a STEENEKEN HJM, 1980, J ACOUST SOC AM, V67, P318, DOI 10.1121/1.384464 STELMACHOWICZ PG, 1985, J ACOUST SOC AM, V77, P620, DOI 10.1121/1.392378 Stickney GS, 2005, J ACOUST SOC AM, V118, P2412, DOI 10.1121/1.2031967 STUDEBAKER GA, 1993, J SPEECH HEAR RES, V36, P799 Studebaker GA, 2002, J ACOUST SOC AM, V111, P1422, DOI 10.1121/1.1445788 Studebaker GA, 1999, J ACOUST SOC AM, V105, P2431, DOI 10.1121/1.426848 Wang Z, 2004, IEEE T IMAGE PROCESS, V13, P600, DOI 10.1109/TIP.2003.819861 Wee CY, 2010, PATTERN RECOGN, V43, P4055, DOI 10.1016/j.patcog.2010.05.026 Wong JC, 1998, HEARING RES, V123, P61, DOI 10.1016/S0378-5955(98)00098-7 Xu L, 2003, J ACOUST SOC AM, V114, P3024, DOI 10.1121/1.1623786 Yap PT, 2003, IEEE T IMAGE PROCESS, V12, P1367, DOI 10.1109/TIP.2003.818019 Yap PT, 2004, IEE P-VIS IMAGE SIGN, V151, P128, DOI 10.1049/ip-vis:20040395 Zhang XD, 2001, J ACOUST SOC AM, V109, P648, DOI 10.1121/1.1336503 Zhu HQ, 2012, PATTERN RECOGN, V45, P1540, DOI 10.1016/j.patcog.2011.10.002 Zilany MSA, 2007, J ACOUST SOC AM, V122, P402, DOI 10.1121/1.2735117 Zilany MSA, 2009, J ACOUST SOC AM, V126, P2390, DOI 10.1121/1.3238250 Zilany MSA, 2014, J ACOUST SOC AM, V135, P283, DOI 10.1121/1.4837815 Zilany MSA, 2006, J ACOUST SOC AM, V120, P1446, DOI 10.1121/1.2225512 Mamun, Nursadul Jassim, Wissam A. Zilany, Muhammad S. A. University of Malaya under High Impact Research Grant UM.C/625/1/HIR/152 Manuscript received January 07, 2014; revised May 23, 2014; accepted January 25, 2015. Date of publication February 06, 2015; date of current version March 16, 2015. This work was supported by the University of Malaya under High Impact Research Grant UM.C/625/1/HIR/152 (MSAZ). The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Wai-Yip Geoffrey Chan. 0 IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC PISCATAWAY IEEE-ACM T AUDIO SPE
Uncontrolled Keywords:	Auditory-nerve model, neurogram, orthogonal moment, sensorineural, hearing loss, speech intelligibility, AUDITORY-NERVE FIBERS, TEMPORAL FINE-STRUCTURE, IMAGE QUALITY, ASSESSMENT, HIGH SOUND LEVELS, PHENOMENOLOGICAL MODEL, FREQUENCY-MODULATION, RECEPTION THRESHOLD, FLUCTUATING NOISE, WORD, RECOGNITION, NORMAL-HEARING
Subjects:	T Technology > T Technology (General) T Technology > TA Engineering (General). Civil engineering (General)
Divisions:	Faculty of Engineering
Depositing User:	Mr Jenal S
Date Deposited:	22 Jul 2015 01:43
Last Modified:	08 Jul 2017 04:05
URI:	http://eprints.um.edu.my/id/eprint/13741

Actions (login required)

View Item