Constrained training of artificial neural networks using labelled medical data of mixed quality
Abstract
The invention relates to a method ( 100 ) for supervised training of an artificial neural network for medical image analysis. The method comprises acquiring (SI) first and second sets of training samples, wherein the training samples comprise feature vectors and associated predetermined labels, the feature vectors being indicative of medical images and the labels pertaining to anatomy detection, to semantic segmentation of medical images, to classification of medical images, to computer-aided diagnosis, to detection and/or localization of biomarkers or to quality assessment of medical images. The accuracy of predetermined labels may be better for the second set of training samples than for the first set of training samples. The neural network is trained (S 3 ) by reducing a cost function, which comprises a first and a second part. The first part of the cost function depends on the first set of training samples, and the second part of the cost function depends on a first subset of training samples, the first subset being a subset of the second set of training samples. In addition, the second part of the cost function depends on an upper bound for the average prediction performance of the neural network for the first subset of training samples and the second part of the cost function is configured for preventing that the average prediction performance for the first subset of training samples exceeds the upper bound.
Claims
exact text as granted — not AI-modifiedThe invention claimed is:
1. A computer-implemented method for supervised training of an artificial neural network for a medical image analysis, the method comprising:
acquiring first and second sets of training samples;
acquiring an upper bound for an average prediction performance of the neural network for a first subset of the second set of training samples; and
training the neural network by reducing a cost function,
wherein the training samples comprise feature vectors and associated predetermined labels, the feature vectors being indicative of medical images and the labels pertaining to at least one of anatomy detection, semantic segmentation of medical images, classification of medical images, computer-aided diagnosis, detection and/or localization of biomarkers, and quality assessment of medical images;
wherein the cost function comprises a first part and a second part;
wherein the first part of the cost function depends on the first set of training samples;
wherein the second part of the cost function depends on the first subset of training samples and the upper bound for the average prediction performance of the neural network for the first subset of training samples, and
wherein the second part of the cost function is configured to prevent the upper bound from exceeding the average prediction performance for the first subset of training samples.
2. The method according to claim 1 , wherein an average accuracy of the predetermined labels is better for the second set of training samples than for the first set of training samples.
3. The method according to claim 1 , wherein the second part of the cost function depends on a difference between the average prediction performance for the first subset of training samples and the upper bound for the average prediction performance for the first subset of training samples.
4. The method according to claim 3 , wherein the second part of the cost function depends on a power of the difference between the average prediction performance for the first subset of training samples and the upper bound for the average prediction performance for the first subset of training samples, the power being strictly greater than one.
5. The method according to claim 1 , wherein the second part of the cost function further depends on a positive weight.
6. The method according to claim 5 , further comprising: increasing the weight upon detecting that the average prediction performance for the first subset of training samples is larger than the upper bound for the average prediction performance for the first subset of training samples.
7. The method according to claim 1 , wherein the first part of the cost function is based on a first label prediction error measure;
wherein the second part of the cost function is based on a second label prediction error measure; and
wherein the first and second label prediction error measures are not affine functions of each other.
8. The method according to claim 1 , wherein the first part of the cost function further depends on a second subset of training samples from the second set of training samples.
9. The method according to claim 1 , wherein the cost function is reduced iteratively;
wherein an iteration comprises drawing a first mini-batch from the first set of training samples and computing an approximate gradient of the first part of the cost function based on the first mini-batch; and/or
wherein the iteration comprises drawing a second mini-batch from the first subset of training samples and computing an approximate gradient of the second part of the cost function based on the second mini-batch.
10. The method according to claim 9 , wherein the approximate gradient of the second part of the cost function is set to zero when an average prediction performance of the neural network for the second mini-batch is less than or equal to the upper bound for the average prediction performance for the first subset of training samples.
11. The method according to claim 9 ,
wherein the first part of the cost function further depends on a second subset of training samples from the second set of training samples;
wherein the iteration comprises drawing a third mini-batch from the second subset of training samples, and computing an approximate gradient of the first part of the cost function based on the third mini-batch; and
wherein a cardinality of the third mini-batch divided by a cardinality of the second subset of training samples is larger than the cardinality of the first mini-batch divided by the cardinality of the first set of training samples.
12. The method according to claim 1 , further comprising:
increasing a cardinality of the second set of training samples by selecting a training sample from the second set of training samples, transforming the selected training sample, and including the transformed training sample in the second set of training samples,
wherein transforming the selected training sample comprises acquiring an image of the selected training sample, transforming the acquired image, generating a feature vector indicative of the transformed image, and adapting a predetermined label of the selected training sample according to the transformation of the acquired image.
13. A system for supervised training of an artificial neural network for a medical image analysis, comprising:
a memory that stores a plurality of instructions; and
processor circuitry that couples to the memory and is configured to execute the plurality of instructions to:
acquire first and second sets of training samples;
acquire an upper bound for an average prediction performance of the neural network for a first subset of the second set of training samples; and
train the neural network by reducing a cost function,
wherein the training samples comprise feature vectors and associated predetermined labels, the feature vectors being indicative of medical images and the labels pertaining to at least one of anatomy detection, semantic segmentation of medical images, classification of medical images, computer-aided diagnosis, detection and/or localization of biomarkers, and quality assessment of medical images;
wherein the cost function comprises a first part and a second part;
wherein the first part of the cost function depends on the first set of training samples;
wherein the second part of the cost function depends on the first subset of training samples and the upper bound for the average prediction performance of the neural network for the first subset of training samples, and
wherein the second part of the cost function is configured to prevent the upper bound from exceeding the average prediction performance for the first subset of training samples.
14. A non-transitory computer-readable medium for storing executable instructions, which cause a method to be performed for supervised training of an artificial neural network for a medical image analysis, the method comprising:
acquiring first and second sets of training samples;
acquiring an upper bound for an average prediction performance of the neural network for a first subset of the second set of training samples; and
training the neural network by reducing a cost function,
wherein the training samples comprise feature vectors and associated predetermined labels, the feature vectors being indicative of medical images and the labels pertaining to at least one of anatomy detection, semantic segmentation of medical images, classification of medical images, computer-aided diagnosis, detection and/or localization of biomarkers, and quality assessment of medical images;
wherein the cost function comprises a first part and a second part;
wherein the first part of the cost function depends on the first set of training samples;
wherein the second part of the cost function depends on the first subset of training samples and the upper bound for the average prediction performance of the neural network for the first subset of training samples, and
wherein the second part of the cost function is configured to prevent the upper bound from exceeding the average prediction performance for the first subset of training samples.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.