Method, medium, and system for music retrieval using modulation spectrum
Abstract
An audio information retrieval method, medium, and system that can rapidly retrieve audio information, even in noisy environments, by extracting a modulation spectrum that is robust against noise, converting features of the extracted modulation spectrum into hash bits, and using a hash table. The audio information retrieval method may include extracting a modulation spectrum from audio data of a compressed domain, converting the extracted modulation spectrum into fingerprint bits, arranging the fingerprint bits in a form of a hash table, converting a received query into an address by a hash function corresponding to the query, and retrieving the audio information by referring to the hash table.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. An audio information storage method, comprising:
generating a Modified Discrete Cosine Transformation-Modulation Spectrum (MDCT-MS) fingerprint database from audio data in corresponding compressed domains;
generating a hash table by dividing each MDCT-MS fingerprint in the MDCT-MS fingerprint database into segments;
extracting an MDCT-MS fingerprint from an audio clip;
dividing the extracted MDCT-MS fingerprint from the audio clip into segments and utilizing the audio clip segments as a hash value for referring to the MDCT-MS fingerprint database to retrieve a stored clip that matches the audio clip; and
acquiring unreliable bits with respect to MDCT-MS fingerprints by ranking deviation values of neighboring frames a corresponding MDCT-MS.
2. The method of claim 1 , further comprising calculating Bit Error Ratio (BER) values between the audio clip and indexed clips of the database, and comparing the calculated BER values to determine one of the indexed clips having a lowest BER value as a final result of the retrieving of the stored clip identical to the audio clip.
3. The method of claim 1 , wherein the generating of the hash table comprises:
dividing each MDCT-MS fingerprint into a plurality of segments, each segment having an identical length; and
generating the hash table by using the divided segments as the hash value.
4. The method of claim 1 , wherein the hash table corresponds to each segment of the MDCT-MS fingerprints.
5. The method of claim 1 , wherein the acquiring of the unreliable bits comprises acquiring the unreliable bits with respect to a corresponding MDCT-MS fingerprint by setting a predetermined threshold with respect to the deviation values of the neighboring frames of the corresponding MDCT-MS.
6. At least one non-transitory computer readable medium comprising computer readable code to control at least one processing element to implement the audio information storage method of claim 1 .
7. An audio information storage method, comprising:
generating a Modified Discrete Cosine Transformation-Modulation Spectrum (MDCT-MS) fingerprint database from audio data in corresponding compressed domains;
generating a hash table for the generated MDCT-MS fingerprint database based on corresponding unreliable-bits-toggled MDCT-MS fingerprints;
extracting an MDCT-MS fingerprint from an audio clip while calculating a hash value of the audio clip based on the unreliable-bits-toggled MDCT-MS fingerprints; and
referring to the MDCT-MS database to retrieve a clip that matches the audio clip based on the hash value of the audio clip.
8. The method of claim 7 , further comprising calculating Bit Error Ratio (BER) values between the audio clip and indexed clips and comparing the calculated BER values to determine one of the indexed clips having a lowest BER value as a final result of the retrieving of the clip matching the audio clip.
9. At least one non-transitory computer readable medium comprising computer readable code to control at least one processing element to implement the audio information storage method of claim 7 .
10. An audio information storage method, comprising:
generating a Modified Discrete Cosine Transformation-Modulation Spectrum (MDCT-MS) fingerprint database from audio data in corresponding compressed domains;
generating a hash table for the generated MDCT-MS fingerprint database by using corresponding peak points as a corresponding hash value;
calculating a hash value, based on peak points, of an audio clip and extracting an MDCT-MS fingerprint of the audio clip; and
referring to the MDCT-MS database to retrieve a clip that matches the audio clip, from clips that are maintained in the MDCT-MS fingerprint database, based on the calculated hash value of the audio clip.
11. The method of claim 10 , further comprising calculating Bit Error Ratio (BER) values between the audio clip and indexed clips and comparing the calculated BER values to determine at least one of the indexed clips having a lowest BER value as a final result of the retrieving of the clip matching the audio clip.
12. The method of claim 10 , wherein the corresponding hash value utilizes a corresponding first peak point and second peak point of the corresponding MDCT-MS.
13. The method of claim 12 , wherein corresponding hash value utilizes a distance between the corresponding first peak point and second peak point of the corresponding MDCT-MS.
14. The method of claim 10 , wherein the generating of the hash table further comprises generating the hash table by simultaneously utilizing information on a corresponding first peak point and second peak point of the corresponding MDCT-MS.
15. The method of claim 10 , wherein the retrieving of the audio clip further comprises retrieving the matching clip from the MDCT-MS fingerprint database based on peak point information of the audio clip.
16. The method of claim 10 , further comprising:
generating bits bias tolerance with respect to a corresponding first peak point and second peak point of the corresponding MDCT-MS.
17. At least one non-transitory computer readable medium comprising computer readable code to control at least one processing element to implement the audio information storage method of claim 10 .
18. An audio information storage system, comprising:
an audio fingerprint generation unit to extract a Modified Discrete Cosine Transformation-Modulation Spectrum (MDCT-MS) from audio data in a compressed domain and to generate an audio fingerprint of the audio data to be stored in a memory; and
an audio data retrieval unit to refer to a database to retrieve retrieval audio data corresponding to the generated audio fingerprint,
wherein the audio fingerprint generation unit comprises:
a Modified Discrete Cosine Transformation (MDCT) coefficient extraction unit to extract MDCT coefficients from the audio data in the compressed domain by partially decoding the audio data;
an MDCT coefficient selection unit to select an MDCT coefficient, existing in a frequency domain not affected by noise, from the extracted MDCT coefficients;
a modulation spectrum generation unit to perform a Discrete Fourier Transform (DFT) with respect to the selected MDCT coefficient and to generate an MDCT modulation spectrum (MDCT-MS) of the audio data; and
a bit unit to quantize features of the generated MDCT-MS according to a bit derivation method.
19. The system of claim 18 , wherein the bit unit ranks absolute values according to the bit derivation method, selects unreliable bits from quantized bits, and quantizes the selected unreliable bits to ‘0’ and ‘1’ from ‘1’ and ‘0’, respectively.
20. The system of claim 18 , further comprising:
a peak point extraction unit to extract peak points from the MDCT-MS features.
21. The system of claim 18 , wherein the audio data retrieval unit comprises:
a hash retrieval unit to generate a hash value from the generated audio fingerprint and to retrieve at least one candidate audio fingerprint from the database which matches the generated hash value by referring to a hash table;
a fingerprint retrieval unit to compare the at least one retrieved candidate audio fingerprint and the generated audio fingerprint and retrieving one of the at least one candidate audio fingerprint that has a bit error rate smaller than a predetermined reference value;
an information storage unit to store audio data information, each comprising corresponding candidate audio fingerprints; and
an information providing unit to provide a user with audio data information corresponding to the one of the at least one candidate audio fingerprint.
22. The system of claim 21 , wherein the hash retrieval unit comprises:
a hash value generation unit to extract an indexing bit from the generated audio fingerprint and to generate a hash value by a hash function;
a hash table storing hash values corresponding to addresses referring to each candidate audio fingerprint in the database and an address referring to each corresponding audio data information; and
a table retrieval unit to retrieve the one of the at least one candidate audio fingerprint which matches the generated hash value from the hash table.
23. The system of claim 21 , wherein the fingerprint retrieval unit comprises:
an audio fingerprint storage unit to convert the audio data into the generated audio fingerprint and to store the generated audio fingerprint;
a Bit Error Ratio (BER) calculation unit to calculating a BER value of the at least one candidate audio fingerprint and the generated audio fingerprint;
a comparison unit to compare a predetermined threshold and the calculated BER value;
an audio fingerprint detection unit to detect the one of the at least one candidate audio fingerprint as having a BER value smaller than the threshold; and
a threshold adjustment unit to adjust the threshold according to a result of the detection of the one of the at least one candidate audio fingerprint.
24. The system of claim 23 , wherein the threshold adjustment unit adjusts the threshold until only a single candidate audio fingerprint, of the at least one candidate audio fingerprints, is detected from the audio fingerprint detection unit.
25. An audio information storage system, to be referred to for retrieval of a stored audio data, corresponding to a query audio data input, using a hash function, comprising:
a Modified Discrete Cosine Transformation (MDCT) coefficient extraction unit to extract corresponding MDCT coefficients from audio data in corresponding compressed domains by partially decoding the audio data;
an MDCT coefficient selection unit to select a corresponding MDCT coefficient, existing in a frequency domain not affected by noise, from the extracted corresponding MDCT coefficients;
a modulation spectrum generation unit to perform a Discrete Fourier Transform (DFT) with respect to the selected corresponding MDCT coefficient and to generate a corresponding MDCT modulation spectrum (MDCT-MS) of the audio data to be stored in a memory;
a bit unit to quantize features of the generated corresponding MDCT-MS according to a bit derivation method; and
a storage to store a plurality of generated audio fingerprints in a database and/or to store a hash table corresponding to the plurality of generated audio fingerprints, based on results of the MDCT coefficient extraction unit, MDCT coefficient selection unit, modulation spectrum generation unit, and bit unit.
26. An audio retrieving method, comprising:
extracting, using at least one processor, a fingerprint from querying audio; comparing, using the at least one processor, the extracted querying audio fingerprint with one or more candidate audio fingerprints, respectively extracted from one or more candidate audios, by referring to a database; determining, using the at least one processor, which of the one or more candidate audio fingerprints matches the extracted querying audio fingerprint; and providing, using the at least one processor, audio information with respect to the one or more candidate audios corresponding to the determined one or more matching candidate audio fingerprints, wherein the comparing comprises:
generating a querying hash value using the extracted querying audio fingerprint;
adjusting the querying hash value by toggling determined unreliable bits according to the extracted querying audio fingerprint.
27. The method of claim 26, wherein the comparing further comprises:
comparing the querying hash value with one or more candidate hash values, corresponding to the one or more candidate audio fingerprints, by referring to a hash table; and determining which of the one or more candidate hash values matches the querying hash value.
28. The method of claim 27, wherein the generating includes dividing the extracted querying audio fingerprint into a plurality of segments, and utilizing each of the plurality of segments in the generating of the querying hash value.
29. The method of claim 27, wherein the generating of the querying hash value includes utilizing a peak point of a modulation spectrum of the querying audio according to the extracted querying audio fingerprint.
30. The method of claim 27, wherein the determining includes determining which of the candidate hash values matches the querying hash value based on determined error information, the determined error information being related to respective candidate audio fingerprints and the querying audio.
31. The method of claim 26,
wherein the extracting of the fingerprint of the querying audio includes converting a modulation spectrum of the querying audio, based on frequency transform, and wherein the modulation spectrum is generated by decoding the querying audio.
32. At least one non-transitory computer readable medium comprising computer readable code to control at least one processing device to implement the audio retrieving method of claim 26.
33. An audio retrieving apparatus, comprising:
a fingerprint retrieving unit which extracts a fingerprint to be stored in a memory from querying audio, compares the extracted querying audio fingerprint with one or more candidate audio fingerprints, respectively extracted from one or more candidate audios, by referring to a database, and determines which of the one or more candidate audio fingerprints matches the extracted querying audio fingerprint; and an information providing unit which provides audio information with respect to the one or more candidate audios corresponding to the determined one or more matching candidate audio fingerprints, wherein the fingerprint retrieving unit generates a querying hash value using the extracted querying audio fingerprint, and wherein the fingerprint retrieving unit is configured to adjust the querying hash value by toggling determined unreliable-bits according to the extracted querying audio fingerprint.
34. The apparatus of claim 33, wherein the fingerprint retrieving unit compares the querying hash value with one or more candidate hash values, corresponding to the one or more candidate audio fingerprints, by referring to a hash table, and determines which of the one or more candidate hash values matches the querying hash value.
35. The apparatus of claim 34, wherein the fingerprint retrieving unit divides the extracted querying audio fingerprint into a plurality of segments, and utilizes each of the plurality of segments in the generating of the querying hash value.
36. The apparatus of claim 34, wherein the fingerprint retrieving unit utilizes a peak point of a modulation spectrum of the querying audio according to the extracted querying audio fingerprint.
37. The apparatus of claim 34, wherein the fingerprint retrieving unit determines which of the candidate hash values matches the querying hash value based on determined error information, the determined error information being related to respective candidate audio fingerprints and the querying audio.
38. The apparatus of claim 33,
wherein the fingerprint retrieving unit converts a modulation spectrum of the querying audio, based on frequency transform, and wherein the fingerprint retrieving unit generates the modulation spectrum by decoding the querying audio.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.