Everything about Meteor totally explained
METEOR (
Metric for Evaluation of Translation with Explicit ORdering) is a
metric for the evaluation of
machine translation output. The metric is based on the
harmonic mean of unigram
precision and
recall, with recall weighted higher than precision. It also has several features that are not found in other metrics, such as
stemming and
synonymy matching, along with the standard exact word matching. The metric was designed to fix some of the problems found in the more popular
BLEU metric, and also produce good correlation with human judgement at the sentence or segment level This differs from the BLEU metric in that BLEU seeks correlation at the corpus level.
Results have been presented which give
correlation of up to 0.964 with human judgement at the corpus level, compared to
BLEU's achievement of 0.817 on the same data set. At the sentence level, the maximum correlation with human judgement achieved was 0.403.
Algorithm
As with
BLEU, the basic unit of evaluation is the sentence, the algorithm first creates an
alignment (see illustrations) between two
sentences, the candidate translation string, and the reference translation string. The
alignment is a set of
mappings between
unigrams. A mapping can be thought of as a line between a unigram in one string, and a unigram in another string. The constraints are as follows; every unigram in the candidate translation must map to zero or one unigram in the reference translation and
vice versa. In any alignment, a unigram in one string can't map to more than one unigram in another string.
An alignment is created incrementally through a series of
stages, which are controlled by
modules. A module is simply a matching algorithm, for example the "wn_synonymy" module maps
synonyms using
WordNet, while the "exact" module matches exact words. Examples are given as follows:
Each stage is split up into two
phases. In the first phase, all possible unigram mappings are collected for the module being used in this stage. In the second phase, the largest subset of these mappings is selected to produce an
alignment as defined above. If there are two alignments with the same number of mappings, the alignment is chosen with the fewest
crosses, that is, with fewer
intersections of two mappings. From the two alignments shown, alignment (a) would be selected at this point. Stages are run consecutively and each stage only adds to the alignment those unigrams which have not been matched in previous stages. Once the final alignment is computed, the score is computed as follows: Unigram precision
is calculated as:
Examples of pairs of words which will be mapped by each module |
| Module |
Candidate |
Reference |
Match |
| Exact |
good |
good |
Yes |
| Stemmer |
goods |
good |
Yes |
| Synonymy |
well |
good |
Yes |
»
To calculate a score over a whole
corpus, or collection of segments, the aggregate values for
,
and
are taken and then combined using the same formula. The algorithm also works for comparing a candidate translation against more than one reference translations. In this case the algorithm compares the candidate against each of the references and selects the highest score.
Examples
| Reference |
the |
cat |
sat |
on |
the |
mat |
| Hypothesis |
on |
the |
mat |
sat |
the |
cat |
Score: 0.5000 = Fmean: 1.0000 * (1 - Penalty: 0.5000)
Fmean: 1.0000 = 10 * Precision: 1.0000 * Recall: 1.0000 / Recall: 1.0000 + 9 * Precision: 1.0000
Penalty: 0.5000 = 0.5 * (Fragmentation: 1.0000 ^3)
Fragmentation: 1.0000 = Chunks: 6.0000 / Matches: 6.0000
| Reference |
the |
cat |
sat |
on |
the |
mat |
| Hypothesis |
the |
cat |
sat |
on |
the |
mat |
Score: 0.9977 = Fmean: 1.0000 * (1 - Penalty: 0.0023)
Fmean: 1.0000 = 10 * Precision: 1.0000 * Recall: 1.0000 / Recall: 1.0000 + 9 * Precision: 1.0000
Penalty: 0.0023 = 0.5 * (Fragmentation: 0.1667 ^3)
Fragmentation: 0.1667 = Chunks: 1.0000 / Matches: 6.0000
| Reference |
the |
cat |
|
sat |
on |
the |
mat |
| Hypothesis |
the |
cat |
was |
sat |
on |
the |
mat |
Score: 0.9654 = Fmean: 0.9836 * (1 - Penalty: 0.0185)
Fmean: 0.9836 = 10 * Precision: 0.8571 * Recall: 1.0000 / Recall: 1.0000 + 9 * Precision: 0.8571
Penalty: 0.0185 = 0.5 * (Fragmentation: 0.3333 ^3)
Fragmentation: 0.3333 = Chunks: 2.0000 / Matches: 6.0000
Further Information
Get more info on 'Meteor'.
|
External Link Exchanges
Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:
<a href="http://meteor.totallyexplained.com">METEOR Totally Explained</a>
Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned. |