Sampling and optimization in phrase-based machine translation using an enriched language model representation
Abstract
Rejection sampling is performed to acquire at least one target language translation for a source language string s in accordance with a phrase-based statistical translation model p(x)=p(t, a|s) where t is a candidate translation, a is a candidate alignment comprising a biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a. The rejection sampling uses a proposal distribution comprising a weighted finite state automaton (WFSA) q (n) that is refined responsive to rejection of a sample x* obtained in a current iteration of the rejection sampling to generate a refined WFSA q (n+1) for use in a next iteration of the rejection sampling. The refined WFSA q (n+1) is selected to satisfy the criteria p(x)≦q (n+1) (x)≦q (n) (x) for all x∈X and q (n+1) (x*)<q (n) (x*) where the space X is the set of sequences x corresponding to candidate alignments a that generate candidate translations t for the source language string s.
Claims
exact text as granted — not AI-modifiedThe invention claimed is:
1. A non-transitory storage medium storing instructions executable by an electronic data processing device to perform rejection sampling to acquire at least one accepted target language translation for a source language string s in accordance with a phrase-based statistical translation model
p ( x )= p ( t,a|s )
where t is a candidate translation, a is a candidate alignment comprising a source language-target language biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a, the rejection sampling using a proposal distribution comprising a weighted finite state automaton (WFSA) q (n) that is refined responsive to rejection of a sample x* obtained in a current iteration of the rejection sampling to generate a refined WFSA q (n+1) for use in a next iteration of the rejection sampling wherein the refined WFSA q (n+1) is selected to satisfy the criteria:
p ( x )≦ q (n+1) ( x )≦ q (n) ( x ) for all x∈X ; and
q (n+1) ( x *)< q (n) ( x *);
where the space X is the set of sequences x corresponding to candidate alignments a that generate candidate translations t for the source language string s.
2. The non-transitory storage medium of claim 1 , wherein:
the sample x* is rejected if it does not satisfy a no overlap constraint requiring that the alignment a represented by the sample x* consumes each word of the source language string s exactly once, and
the refinement performed when the no overlap constraint is not satisfied comprises (i) identifying a sub-sequence a k . . . a k+l of the sample x* that violates the no overlap constraint and (ii) refining WFSA q (n) to WFSA q (n+1) such that WFSA q (n+1) does not accept any sequence x that contains the sub-sequence a k . . . a k+l .
3. The non-transitory storage medium of claim 2 , wherein:
the rejection sampling sets p(x*)=0 if the sample x* does not satisfy the no overlap constraint, and
the refinement comprising operations (i) and (ii) is performed responsive to the condition p(x*)=0.
4. The non-transitory storage medium of claim 1 , wherein the sequence x represents the alignment a as a sequence of triples each including: a field E denoting exactly one word of the translation t, a field B denoting the biphrase generating E, and a field N denoting the count of words of the source language string s consumed by the biphrases of the alignment a up to and including the biphrase generating E.
5. The non-transitory storage medium of claim 4 , wherein:
the sample x* is rejected if it does not satisfy a no overlap constraint requiring that the alignment a represented by the sample x* consumes each word of the source language string s exactly once as determined based in part on whether the value of N for the last triple of the sample x* equals the number of words in s.
6. The non-transitory storage medium of claim 1 , wherein:
the rejection sampling obtains the sample x* by random sampling of the space X;
the rejection sampling accepts or rejects x* based on comparison of a ratio p(x*)/q(x*) with a random draw; and
the refined proposal distribution q (n+1) is selected to satisfy the criteria:
p ( x )≦ q (n+1) ( x )≦ q (n) ( x ) for all x∈X;
q (n+1) ( x *)< q (n) ( x *); and
a norm ∥ q (n+1) ∥ α is minimized where α<∞.
7. The non-transitory storage medium of claim 6 , wherein the rejection sampling acquires a plurality of accepted samples and terminates when a cumulative acceptance rate of the rejection sampling exceeds a threshold.
8. The non-transitory storage medium of claim 6 , where α=1.
9. The non-transitory storage medium of claim 1 , wherein:
the rejection sampling obtains the sample x* such that q*=q (n) (x*) maximizes q (n) over the space X;
the rejection sampling accepts or rejects x* based on a difference between or ratio of q* and p(x*); and
the refined proposal distribution q (n+1) is selected to satisfy the criteria:
p ( x )≦ q (n+1) ( x )≦ q (n) ( x ) for all x∈X;
q (n+1) ( x *)< q (n) ( x *); and
a norm ∥ q (n+1) ∥ α =max{ q (n+1) ( x )} is minimized.
10. The non-transitory storage medium of claim 1 , wherein the non-transitory storage medium stores instructions executable by an electronic data processing device to perform sampling of the phrase-based statistical translation model p(t, a|s) by accumulating a set of accepted samples x* over a plurality of iterations of the rejection sampling.
11. The non-transitory storage medium of claim 10 , wherein each iteration obtains the sample x* by random sampling of the space X and accepts or rejects x* based on comparison of a ratio p(x*)/q(x*) with a random draw.
12. The non-transitory storage medium of claim 10 , wherein the rejection sampling terminates when a cumulative acceptance rate of the rejection sampling exceeds a threshold.
13. An apparatus comprising:
a non-transitory storage medium as set forth in claim 1 ; and
an electronic data processing device configured to execute instructions stored on the non-transitory storage medium.
14. A method of generating at least one target language translation for a source language string s in accordance with a phrase-based statistical translation model p(x)=p(t, a|s) where t is a candidate translation, a is a candidate alignment comprising a source language-target language biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a, the method comprising:
performing iterative rejection sampling of p(x) by generating a sample x* and accepting or rejecting the sample x* based on a comparison of p(x*) and q(x*) where q is a proposal distribution comprising a weighted finite state automaton, the at least one target language translation corresponding to an accepted sample x*;
during the iterative rejection sampling, identifying a sample x* that includes a sub-sequence a k . . . a k+l violating a no overlap constraint requiring that the alignment a consumes each word of the source language string s exactly once; and
refining q so that it does not accept any sequence x that contains the sub-sequence a k . . . a k+l , the refined q being used in subsequent iterations of the iterative rejection sampling;
wherein the iterative rejection sampling including the identifying and refining operations is performed by an electronic data processing device.
15. The method of claim 14 , further comprising setting p(x*)=0 responsive to identifying a sample x* that includes a sub-sequence a k . . . a k+l violating the no overlap constraint, the refining being performed responsive to rejection of a sample x* for which p(x*)=0.
16. The method of claim 14 , wherein x represents the alignment a as a sequence of triples including: a field E denoting exactly one word of the translation t, a field B denoting the biphrase generating E, and a field N denoting the count of words of the source language string s consumed by the biphrases of the alignment a up to and including the biphrase generating E, and the identifying includes:
identifying a sample x* as including a sub-sequence a k . . . a k+l violating the no overlap constraint based on the value of N for the last triple in the sample x* being different from the number of words in the source language string s.
17. The method of claim 14 , wherein the performing of iterative rejection sampling comprises performing the OS* algorithm.
18. An apparatus comprising:
an electronic data processing device; and
a non-transitory storage medium storing instructions executable by the electronic data processing device to perform a method including:
performing rejection sampling of a phrase-based statistical translation model p(x)=p(t, a|s) where t is a candidate translation, a is a candidate alignment comprising a source language-target language biphrase sequence generating the candidate translation t, and x represents the candidate alignment a as a sequence of triples each including: a field E denoting exactly one word of the translation t, a field B denoting the biphrase generating E, and a field N denoting the count of words of the source language string s consumed by the biphrases of the alignment a up to and including the biphrase generating E; and
during the rejection sampling, identifying a sample x* as including a sub-sequence a k . . . a k+l violating a no overlap constraint requiring that the alignment a consumes each word of the source language string s exactly once based on the value of N for the last triple in the sample x* being different from the number of words in the source language string s.
19. The apparatus of claim 18 , wherein the method further includes, responsive to identifying a sample x* that includes a sub-sequence a k . . . a k+l violating the no overlap constraint:
refining a proposal distribution used in the rejection sampling such that the proposal distribution no longer includes sequences x containing the sub-sequence a k . . . a k+l .
20. The apparatus of claim 18 , wherein the method further includes, responsive to identifying a sample x* that includes a sub-sequence a k . . . a k+l violating the no overlap constraint:
refining a proposal distribution comprising a weighted finite state automaton q used in the rejection sampling such that q no longer accepts any sequence x that contains the sub-sequence a k . . . a k+l .Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.