Perceptual coding of audio signals
Abstract
A method is disclosed for determining estimates of the perceived noise masking level of audio signals as a function of frequency. By developing a randomness metric related to the euclidian distance between (i) actual frequency components amplitude and phase for each block of sampled values of the signal and (ii) predicted values for these components based on values in prior blocks, it is possible to form a tonality index which provides more detailed information useful in forming the noise masking function. Application of these techniques is illustrated in a coding and decoding context for audio recording or transmission. The noise spectrum is shaped based on a noise threshold and a tonality measure for each critical frequency-band (bark).
Claims
exact text as granted — not AI-modifiedWe claim:
1. A method of processing an ordered time sequence of audio signals partitioned into contiguous blocks of samples, each such block having a discrete short-time spectrum, S(ω i ), i=1,2, . . . , N, for each of said blocks, comprising predicting, for each block .Iadd.of audio signals.Iaddend., an estimate of the values for each S(ω i ) based on the values for S(ω i ) for one or more prior blocks, determining for each frequency, ω i , a randomness metric based on the predicted value for each S(ω i ) and the actual value for S(ω i ) for each block, based on said randomness metrics, and the distribution of power with frequency in the block, determining the value of a tonality function as a function of frequency, and based on said tonality function, estimating the noise masking threshold at each ω i for the block.
2. The method of claim 1 further comprising quantizing said S(ω i ) based on said noise masking threshold at each respective ω i .
3. The method of claim 1 wherein said step of predicting comprises, for each ω i , forming the difference between the value of S(ω i ) for the corresponding ω i from the two preceding blocks, and adding said difference to the value for S(ω i ) from the immediately preceding block.
4. The method of claim 3, wherein said S(ω i ) is represented in terms of .[.its.]. magnitude and phase, and wherein said difference and adding are effected separately for the magnitude and phase of S(ω i ).
5. The method of claim 1, wherein said determining of said randomness metric is accomplished by calculating the euclidian distance between said estimate of S(ω i ) and said actual value for S(ω i ).
6. The method of claim 5, wherein said determining of said randomness metric further comprises normalizing said euclidian distance with respect to the sum of the magnitude of said actual magnitude for S(ω i ) and the absolute value of said estimate of S(ω i ).
7. The method of claim 1, wherein said estimating of the noise masking threshold at each ω i comprises calculating an unspread threshold function, and modifying said unspread threshold function in accordance with a spreading function to generate a spread threshold function.
8. The method of claim 7, wherein said estimating of the noise masking threshold function further comprises modifying said spread threshold function in response to an absolute noise masking threshold for each ω i to form a limited spread threshold function.
9. The method of claim 8, further comprising modifying said limited threshold function to eliminate any existing pre-echoes, thereby generating an output threshold function value for each ω i .
10. The method of any of claims 1, 7, 8 or 9, further comprising the steps of generating an estimate of the number of bits necessary to encode S(ω i ) quantizing said S(ω i ) to form quantized representations of said S(ω i ) using said estimate of the number of bits, and providing to a medium a coded representation of said quantized values and information about how said quantized values were derived.
11. A method for processing an ordered sequence of coded signals comprising first code signals representing values of the frequency components of a block of values of an audio signal and second code signals representing information about how said first .Iadd.code .Iaddend.signals were derived to reproduce said audio signal with reduced perceptual error, said method comprising using said second .Iadd.code .Iaddend.signals to determine quantizing levels for said audio signal which reflect a reduced level of perceptual distortion, reconstructing quantized values for said frequency .[.content.]. .Iadd.components .Iaddend.of said audio signal in accordance with said quantizing levels, and transforming said reconstructed quantized .[.spectrum.]. .Iadd.values .Iaddend.to recover an estimate of the audio signal.
12. The method of claim 11 wherein said reconstructing comprises using said second .Iadd.code .Iaddend.signals to effect scaling of said quantized values.
13. The method of claim 11 wherein said reconstructing comprises applying a global gain factor based on said second .Iadd.code .Iaddend.signals.
14. The method of claim 11 wherein said reconstructing comprises determining quantizer step size as a function of frequency component.
15. The method of claim 11 wherein said second .Iadd.code .Iaddend.signals include information about the degree of coarseness of quantization as a function of frequency component.
16. The method of claim 11 wherein said second .Iadd.code .Iaddend.signals include information about the number of values of said audio signal that occur in each block. .Iadd.
17. A method of processing an ordered time sequence of audio signals partitioned into a set of ordered blocks, each said block having a discrete frequency spectrum comprising a first set of frequency coefficients, the method comprising, for each said block, the steps of: (a) grouping said first set of frequency coefficients into a plurality of frequency groups, each of said frequency groups comprising at least one frequency coefficient; (b) determining for frequency coefficients in each of said frequency groups a randomness metric, said randomness metrics reflecting the predictability of said frequency coefficients; (c) based on said randomness metrics, determining the value of a tonality function signal as a function of frequency; and (d) based on said tonality function signal, estimating a noise masking threshold for frequency coefficients in each frequency group..Iaddend..Iadd.18. The method of claim 17 further comprising at least one quantizing frequency coefficient in said first set of frequency coefficients based on said noise masking threshold for each frequency coefficient being quantized..Iaddend..Iadd.19. The method of claim 18 wherein said step of quantizing comprises assigning quantizing levels for each of said frequency coefficients in each of said frequency groups such that noise contributed by said quantizing falls below said noise masking threshold for the respective frequency group..Iaddend..Iadd.20. A method of processing an ordered time sequence of audio signals partitioned into a set of ordered blocks, each said block having a discrete frequency spectrum comprising a first set of frequency coefficients, the method comprising, for each said block, the steps of (a) grouping said first set of frequency coefficients into a plurality of frequency groups, each of said frequency groups comprising at least one frequency coefficient; and (b) generating a set of tonality index signals, said set of tonality index signals comprising a tonality index signal for each of said frequency groups, said set of tonality index signals being based on at least one of said first set of frequency coefficients corresponding to at least one
previous block..Iaddend..Iadd.21. The method of claim 20 further comprising generating, based on the set of tonality index signals, a set of respective noise masking thresholds..Iaddend..Iadd.22. The method of claim 21 further comprising quantizing at least one frequency coefficient in said first set of frequency coefficients based on said noise masking threshold for the band comprising the frequency coefficient being quantized..Iaddend..Iadd.23. The method of claim 22 wherein said step of quantizing comprises assigning quantizing levels for each of said frequency coefficients in each of said frequency groups such that noise contributed by said quantizing falls below said noise masking threshold for each respective frequency coefficient..Iaddend..Iadd.24. A storage medium adapted for use with a decoder, the storage medium manufactured in accordance with a process comprising the steps of (a) processing an ordered time sequence of audio signals partitioned into a set of ordered blocks, each said block having a discrete frequency spectrum comprising a first set of frequency coefficients; and (b) for each block: (1) grouping said first set of frequency coefficients into a plurality of frequency groups, each of said frequency groups comprising at least one frequency coefficient; (2) determining for each of said frequency coefficients in said frequency groups a randomness metric, said randomness metrics reflecting the predictability of said frequency coefficients; (3) based on said randomness metrics, determining the value of a tonality function as a function of frequency; (4) based on said tonality function, estimating a noise masking threshold for each frequency group; (5) quantizing each of said frequency coefficients such that noise contributed by said quantizing falls below said noise masking threshold for the frequency group comprising the frequency coefficient being quantized; (6) applying a recording signal to said storage medium, thereby causing said storage medium to store said recording signal, said recording signal comprising signals representing (i) said quantized frequency coefficients; and (ii) side information for controlling said decoder in reconstructing said audio signal from said recording signal upon retrieval of said recording signal from said storage medium, said side information comprising quantizing information relating to said quantizing of frequency
coefficients..Iaddend..Iadd.25. The method of claim 24 wherein said storage medium is a compact disc..Iaddend..Iadd.26. The method of claim 24 wherein said storage medium is a magnetic storage means..Iaddend..Iadd.27. A method of transmitting audio signals, the method comprising: (a) processing an ordered time sequence of audio signals partitioned into a set of ordered blocks, each said block having a discrete frequency spectrum comprising a first set of frequency coefficients; and (b) for each block: (1) grouping said first set of frequency coefficients into a plurality of frequency groups, each of said frequency groups comprising at least one frequency coefficient; (2) determining for each of said frequency coefficients in said frequency groups a randomness metric, said randomness metrics reflecting the predictability of said frequency coefficients; (3) based on said randomness metrics, determining the value of a tonality function as a unction of frequency; (4) based on said tonality function, estimating a noise masking threshold for each frequency group; (5) quantizing each of said frequency coefficients such that noise contributed by said quantizing falls below said noise masking threshold for the frequency group comprising the frequency coefficient being quantized; (6) applying a transmission signal to a transmission medium, said transmission signal comprising signals representing said quantized
frequency coefficients..Iaddend..Iadd.28. The method of claim 27 wherein said transmission medium is a broadcast transmission medium..Iaddend..Iadd.29. The method of claim 27 wherein said transmission medium is an electrical conducting medium..Iaddend..Iadd.30. The method of claim 27 wherein said transmission medium is an optical transmission medium..Iaddend..Iadd.31. The method of any of claims 17, 20, or 27 wherein said processing further comprises generating discrete frequency spectrum signals..Iaddend..Iadd.32. The method of claim 31 wherein said generating of discrete frequency spectrum signals comprises generating discrete Fourier coefficient signals..Iaddend.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.