

Good applications for crummy machine translation

— Kenneth W. Church & Eduard H. Hovy, 1993

  • There is a risk that eval can devolve into mindless metrics.
  • The success of the eval often depends very strongly on the selection of an appropriate application.
  • It is wise to identify the niche application first (where strengths of machines are valued) and then we will be in a much better position to address evaluation questions, and then steer them towards high-payoff niches of functionality.
  • Agree with ALPAC that this basic research could not be justified in terms of short-term return on investiment. When compared with human capabilities, MT systems of the time were not deemed a success, and might never be.

Has anything changed since ALPAC?

  → Increasing commercial value

The application venue of MT has been shifted from government to industry, so as the punding providers. One must choose an application that exploits the strengths of the machine and does not compete with the strengths of human. This point is well put in the following: The question now is not whether MT is feasible, but in what domains it is most likely to be effective... The object of an evaluation is, to determine whether a system permits an adequate response to given needs and constraints. --- Lehrberger and Bourbeau, 1988

The blame is to be laid on the desire for generality

In spite of all the literature on MT, the general evaluation measures often fail to pinpoint the strengths of systems. They seem to confound important and less important aspects. Unfortunately, this failure seems to be characteristic of many of the task-independent evaluation metrics. We propose that MT eval metrics should be sensitive to the intended use of the system. And it becomes crucial to the success of an MT effort to identify high-payoff niche application so that the MT system will stand up well to the eval, even though the system might produce crummy translations. By and large 这有一点grants-oriented的嫌疑但是世界又需要这样的effort

Traditional Eval Metrics
  • System-based
    Tied to a particular system, can’t be used effectively f

本文标签: FormulasuccessModestypapersummary