US4415767AExpiredUtilityPatentIndex 88
Method and apparatus for speech recognition and reproduction

Assignee: VOTANPriority: Oct 19, 1981Filed: Oct 19, 1981Granted: Nov 15, 1983
Est. expiryOct 19, 2001(expired)· nominal 20-yr term from priority
Inventors:GILL STEPHEN P WAGNER LAWRENCE F FRYE GREGORY G BANTOWSKY KLAUS-PETER A
G10L 15/02
PatentIndex Score
102
Cited by
References
Claims
Abstract

Speech signal analysis for data reduction, as stored for synthesis or recognition, is improved by features including: digital spectral analysis; reduction of channel data and bit allocation by selective summation of groups of contiguous data; using the mean average of the log amplitude to find the deviation for each channel; also using the instaneous shape of the mean value for each channel for pairs of adjacent frames, all combined to find a feature ensemble for each pair of adjacent frames.
Claims

exact text as granted — not AI-modified
We claim: 
     
       1. A method for providing a spectral analysis of an analog signal waveform comprising the steps of: dividing the total incoming analog signal into time frames of equal duration;   converting the analog signal to a sequence of discrete signal amplitudes at equally spaced time intervals in each frame;   transforming the sequence of discrete signal amplitudes to a sequence of complex spectral amplitudes, each such spectral amplitude representing the magnitude and phase of a function V(n,k) defined as: ##EQU12##  wherein k=time sequence index   n=frequency sequence index   r,t=integer summation indexes   m=time function parameter defining the number of retained bits   φ=phase adjustment function    and the subscripts (p-r) and (r-t) for n and k refer to bit locations in their binary representation with bit locations ranging from o to the maximum value p and subscript values outside this range representing vanishing values.   
     
     
       2. The method of claim 1 wherein the phase adjustment function φ is defined as: ##EQU13## 
     
     
       3. The method of claim 1 wherein the phase adjustment function φ is zero. 
     
     
       4. The method of claim 1 wherein the transformation from a sequence of discrete signal amplitudes to a sequence of complex spectral amplitudes is accomplished by establishing a processing array; transferring the signal amplitude data to the array in accordance with the expression   A.sup.o (k.sub.p,k.sub.p-1, . . . k.sub.o)=Z(k.sub.p, k.sub.p-1, . . . k.sub.o)     wherein A o  represents the starting values of the array and Z represents the signal data in the form of binary digits;   starting from the original sequence of signal data substituting one bit of the spectral sequence n for one bit of the time sequence k in accordance with the expression: ##EQU14##  wherein A r  =results of the r th  step of processing, beginning at r=o and ending at r=p+1 determining the sequence of complex spectral amplitudes from the final step of the processing array in accordance with the formula:   S(n.sub.p, n.sub.p-1, . . . n.sub.o)=A.sup.p+1 (n.sub.o, n.sub.1, . . . n.sub.p)        wherein   S=the desired sequence of complex spectral amplitudes.   
     
     
       5. A method for producing an analog signal waveform comprising the steps of: providing a predetermined series of digital signals representing a sequence of complex spectral amplitudes;   transforming the sequence of complex spectral amplitudes to a sequence of discrete time waveform amplitudes, each such spectral amplitude representing the magnitude and phase of a function V(n,k) defined as: ##EQU15##  wherein k=time sequence index   n=frequency sequence index   r,t=integer summation indexes   m=time function parameter defining the number of retained bits   φ=phase adjustment function   converting the transformed digital data into an analog output signal.   
     
     
       6. The method of claim 5 wherein the phase adjustment function φ is defined as ##EQU16## 
     
     
       7. The method of claim 5 wherein the phase adjustment function φ is zero. 
     
     
       8. The method of claim 5 wherein the transformation from a sequence of complex spectral amplitudes to a sequence of discrete time waveform amplitudes is accomplished by establishing a processing array; transferring the complex conjugate of the spectral amplitude data to the array in accordance with the expression   A.sup.o (n.sub.p, n.sub.p-1, . . . n.sub.o)=S*(n.sub.p, n.sub.p-1, . . . n.sub.o)     wherein A o  represents the starting values of the array and S* represents the complex conjugate of the spectral amplitude data in the form of binary digits;   starting from the original sequence of spectral amplitude data one bit of the time sequence k is substituted for one bit of the spectral sequence n in accordance with the formula: ##EQU17##  wherein A r  =results of the r th  step of processing, beginning at r=o and ending at r=p+1, determining the sequence of time waveform amplitudes from the final step of the processing array in accordance with the formula:   Z(k.sub.p, k.sub.p-1, . . . k.sub.o)=R.sub.e A.sup.p+1 (k.sub.o, . . . k.sub.p)        wherein   Z=the desired sequence of time waveform amplitudes   R e  A p+1  =the real part of complex values representing the final stage of processing.   
     
     
       9. A method for producing audio analog output comprising the steps of: providing a predetermined series of encoded digital signals representing the analog output to be produced;   decoding the encoded signals to provide a sequence of complex spectral amplitudes;   transforming the sequence of complex spectral amplitudes to a sequence of discrete time waveform amplitudes, each such spectral amplitude representing the magnitude and phase of a function V(n,k) defined as: ##EQU18##  wherein k=time sequence index   n=frequency sequence index   r,t=integer summation indexes   m=time function parameter defining the number of retained bits   φ=phase adjustment function;   converting the transformed digital data into an analog output signal.   
     
     
       10. The method of claim 9 wherein the encoded digital signals representing the analog output are provided from an external memory bank. 
     
     
       11. The method of claim 9 wherein the encoded digital signals representing the analog output are provided by performing a spectral analysis of an analog signal input to produce a digital voiceprint. 
     
     
       12. The method of claim 11 wherein the spectral analysis includes the steps of: dividing the total signal into time frames of equal duration;   converting the analog signal to a sequence of discrete signal amplitudes at equally spaced time intervals in each frame;   transforming the discrete signal amplitudes of each frame to a preselected number of spectral amplitudes representing values of various frequency components of the said series of signal amplitudes;   reducing the number of spectral coefficients of each frame by comparing the magnitude of each coefficient to a predetermined threshold value, and eliminating coefficients which are below the threshold;   reducing the number of bits describing each remaining coefficient to a predetermined maximum.   
     
     
       13. A method for producing a voiceprint template for recognition of an analog waveform signal comprising the steps of: dividing the total signal into time frames of equal duration;   converting the analog signal to a sequence of discrete signal amplitudes at equally spaced time intervals in each said frame;   transforming the discrete signal amplitudes of each frame to a preselected number of spectral amplitudes representing values of various frequency components of the said series of signal amplitudes;   compacting and converting the spectral amplitudes of each frame to a lesser number of channels, each channel being comprised of an energy summation of amplitudes within a designated frequency range expressed in logarithmic amplitudes, and allocated on the basis of predetermined acoustic significance;   deriving a mean amplitude value for all of said channels of each frame;   measuring a deviation from said mean value for each separate channel amplitude in each frame;   determining a feature ensemble for a plurality of successive frames of said total waveform signal; and   storing a digital representation of said feature ensembles for said total waveform signal to form a digital coded template thereof.   
     
     
       14. The method of claim 13 wherein each said feature ensemble is comprised of a pair of adjacent successive frames of the total waveform signal. 
     
     
       15. The method of claim 14 wherein each said feature ensemble is comprised of the average mean amplitude value of each frame pair, the slope of the difference in mean values of the same channel in the adjacent pair of frames, and the average amplitude deviation from the mean values for each channel of each frame pair. 
     
     
       16. A word recognition method comprising the steps of: providing a digital data template representing preselected acoustic features of a spoken word which include time-rates-of-change of spectral amplitudes;   receiving a spoken word to be compared and performing a spectral analysis thereof to determine data representing its acoustic features including time-rates-of-changes of spectral amplitudes;   comparing the template with the received spoken word spectral analysis data to determine a degree of similarity between features given by the metric function: ##EQU19##  where: d=degree of similarity   j=channel index   a=a scaling factor to account for normal rates of speech   b=a parameter for improving recognition performance   x=mean amplitude value of spoken word template   y=mean amplitude value of stored word template   x=time-rate-of-change of spoken word template   y=time-rate-of-change of stored word template   Δx j  =deviation of channel amplitude from mean value in spoken word template   Δy j  =deviation of channel amplitude from mean value in stored word template; and   producing an output in response to a predetermined degree of similarity between said template and said spoken word data.   
     
     
       17. The method of claim 16 wherein said digital data template is retrieved from an external memory storage. 
     
     
       18. The method of claim 16 wherein said digital data template is established by providing an initial training word; performing a spectral analysis of said training word to produce said template; and temporarily storing said training word template before comparing it with the subsequently said received spoken word. 
     
     
       19. The method of claim 16 wherein the step of producing an output includes the sub step of providing stored digital data representing predetermined analog signals; and synthesizing said stored data to produce the analog signals. 
     
     
       20. A voice recognition system for producing a voiceprint template of an analog waveform signal comprising: means for converting an incoming analog signal to a sequence of discrete digital signals;   voice processor means including a timing generator for producing repetitive series of timing cycles, counter means for dividing the total incoming signal into time frames of equal length, sequence control means connected to said timing generator including ROM means for providing operating instructions for the processor during said timing cycles, an arithmetic logic unit for performing a spectral analysis of the received digital signals in response to instructions from said ROM means, said ROM means including instructions for: transforming the discrete signal amplitudes to a preselected number of spectral amplitudes representing values of various frequency components of the said series of signal amplitudes, compacting and converting the spectral amplitudes of each frame to a lesser number of channels, each channel being comprised of a summation of amplitudes within a designated frequency range allocated on the basis of predetermined acoustic significance, deriving a mean amplitude value for all of said channels of each frame, measuring a deviation from said mean value for each separate channel amplitude in each frame, and determining a feature ensemble for each pair of successive frames of said total waveform signal; and   external memory means for storing a digital representation of said feature ensembles for said total waveform signal comprising a digital coded template thereof.   
     
     
       21. A voice recognition system for producing a voiceprint template of an analog waveform signal comprising: means for converting an incoming analog signal to a sequence of discrete digital signals;   voice processor means including a timing generator for producing repetitive series of timing cycles, counter means for dividing the total incoming analog signal into time frames of equal length, sequence control means connected to said timing generator including ROM means for providing operating instructions for the processor during said timing cycles, means including an arithmetic logic unit for performing a spectral analysis of the received analog signal in response to instructions from said ROM means, said ROM means including instructions for transforming the discrete signal amplitudes of each frame to a sequence of complex spectral amplitudes each representing the magnitude and phase of a function V (n, k) defined as: ##EQU20##  wherein: k=time sequence index   n=frequency sequence index   r,t=integer summation indexes   m=time function parameter defining the number of retained bits   φ=phase adjustment function   said ROM means also including instructions for: compacting and converting the spectral amplitudes of each frame to a lesser number of channels, each channel being comprised of a summation of signal amplitudes within a designated frequency range allocated on the basis of predetermined acoustic significance; deriving a mean amplitude value for all of said channels of each frame; measuring a deviation from said mean value for each separate channel amplitude in each frame, and determining a feature ensemble for each pair of successive frames of said total waveform signal; and   external memory means for storing a digital representation of said feature ensembles for said total waveform signal comprising a digital coded template thereof.   
     
     
       22. The voice recognition system as described in claim 20 wherein said ROM means includes means providing instructions for transforming a sequence of discrete signal amplitudes to a sequence of complex amplitudes by establishing a processing array and transforming signal amplitude data to the array in accordance with the expression:   A.sup.o (k.sub.p,k.sub.p-1, . . . k.sub.o)=Z(k.sub.p,k.sub.p-1, . . . k.sub.o)     wherein A o  represents the starting values of the array and Z represents the signal data in the form of binary digits;   said ROM means including further instructions for substituting one bit of the spectral sequence n for one bit of the time sequence k, starting from the original sequence of signal data, in accordance with the expression: ##EQU21##  wherein: A r  =results of the r th  step of processing, beginning at r=o and ending at r=p+1   said ROM means including further instructions for determining the sequence of complex spectral amplitudes from the processing array in accordance with the expression:   S(n.sub.p, n.sub.p-1, . . . n.sub.o)=A.sup.p+1 (n.sub.o, n.sub.1, . . . n.sub.p)        wherein:   S=the desired sequence of complex spectral amplitudes.   
     
     
       23. The voice recognition system as described in claim 22 wherein said voice processor includes means for comparing the voice template developed by spectral analysis of the analog signal with a second template stored in said external memory means. 
     
     
       24. The voice recognition system as described in claim 23 wherein said means for comparing includes ROM instruction means for determining a degree of similarity between features of the developed voice template and said second template in accordance with the function: ##EQU22## 
     
     
       25. The voice recognition system as described in claim 21 wherein said voice processor is in the form of an integrated circuit semiconductor device. 
     
     
       26. The voice recognition system as described in claim 21 wherein said voice processor is in the form of an integrated circuit semiconductor device that also includes said means for converting the incoming analog signal to digital signals. 
     
     
       27. A voice synthesis device comprising: means providing a predetermined series of digital signals representing a sequence of preselected complex spectral amplitudes;   means for transforming said sequence of complex spectral amplitudes to a sequence of discrete time waveform amplitudes, each such spectral amplitude representing the magnitude and phase of a function V(n,k) defined as: ##EQU23##  wherein: k=time sequence index   n=frequency sequence index   r,t=integer summation indexes   m=time function parameter defining the number of retained bits   φ=phase adjustment function   and means for converting the transformed digital data into an analog output signal.   
     
     
       28. The voice synthesis device of claim 27 wherein said means for transforming includes: means for establishing a processing array and thereafter transferring the complex conjugate of the spectral amplitude data to the array in accordance with the expression:   A.sup.o (n.sub.p,n.sub.p-1, . . . n.sub.o)=S*(n.sub.p,n.sub.p-1, . . . n.sub.o)        wherein A o  represents the starting values of the array and S* represents the complex conjugate of the spectral amplitude data in the form of binary digits; and also including means for determining the sequence of time waveform amplitudes from the final processing array in accordance with the formula:   Z(k.sub.p,k.sub.p-1, . . . k.sub.o)=Re A.sup.p+1 (k.sub.o, . . . k.sub.p)        wherein:   Z=the desired sequence of time waveform amplitudes   Re A p+1  =the real part of complex values representing the final stage of processing,   means for substituting one bit of the time sequence k for one bit of the spectral sequence n, starting from the original sequence of spectral amplitudes data in accordance with the formula: ##EQU24##  wherein: A r  =results of the r th  step of processing, beginning at r=o and ending at r=p+1.
Cited by (0)

No later patents cite this yet.
References (0)

No backward citations on record.