WikiDive
Энциклопедия дайвинга
Главная
Помощь
Энциклопедия
Мероприятия
Форум
Статистика
Регистрация
?
Просмотры
Участник
Обсуждение
Просмотр
История
Просмотр
Материал из WikiDive
Страница «
Участник:VonBeckwith3381
»
Перейти к:
навигация
,
поиск
Machine Translation - How it operates, What Users Expect, and What They Get Machine translation (MT) systems are ubiquitous. This ubiquity is because of a mix of increased requirement for translation in today's global marketplace, plus an exponential increase in computing souped up that makes such systems viable. And under the right circumstances, MT systems are a powerful tool. They offer low-quality translations in situations where low-quality translation is superior to no translation in any way, or in which a rough translation of a large document delivered within minutes or minutes is a lot more useful than the usual good translation delivered in three weeks' time. Unfortunately, regardless of the widespread accessibility of MT, it is clear that the purpose and limitations of such systems are generally misunderstood, and their capability widely overestimated. On this page, I must offer a brief introduction to how MT systems work and thus how you can be placed to best use. Then, I'll present some data on how Internet-based MT will be used right this moment, and demonstrate that [http://www.eloquia.com Click here] there is a chasm relating to the intended and actual use of such systems, understanding that users still need educating on the way to use MT systems effectively. How machine translation works It's likely you have expected that the computer translation program would use grammatical rules in the languages involved, combining these with some form of in-memory "dictionary" to create the resulting translation. And even, that's essentially how some earlier systems worked. But many modern MT systems actually please take a statistical approach that's quite "linguistically blind". Essentially, it is trained on a corpus of example translations. It feels right a statistical model that incorporates information like: - "when what (a, b, c) exist in succession in the sentence, there is an X% chance that the words (d, e, f) will occur in succession inside translation" (N.B. there doesn't have to be exactly the same quantity of words in each pair); - "given two successive words (a, b) inside the target language, if word (a) leads to -X, it comes with an X% chance that word (b) can finish in -Y". Given a massive body of these observations, the machine are able to translate a sentence by considering various candidate translations-- produced by stringing words together almost aimlessly (the truth is, via some 'naive selection' process)-- and choosing the statistically probably option. On hearing this high-level description of methods MT works, many people are surprised that a real "linguistically blind" approach works in any respect. What's a lot more surprising is it typically works better than rule-based systems. This can be partly because depending upon grammatical analysis itself introduces errors to the equation (automated analysis just isn't completely accurate, and humans don't always acknowledge the best way to analyse a sentence). And training a system on "bare text" allows you to base a process on a great deal more data than would otherwise be possible: corpora of grammatically analysed texts are small and quite few; pages of "bare text" can be purchased in their trillions. However, what this method does mean is the quality of translations is very influenced by how good elements of the origin text are represented inside the data originally used to train the device. If you accidentally type he can returned or vous avez demander (as opposed to he'll return or vous avez demande), the device is going to be hampered because sequences including will returned are unlikely to own occurred often in the training corpus (or worse, could possibly have occurred using a totally different meaning, as with they needed his will returned to the solicitor). And since the system has little perception of grammar (to work through, for example, that returned is often a way of return, and "the infinitive is probable after he will"), it in place has little to take. Similarly, you could ask the machine to translate a sentence that is certainly perfectly grammatical and common in everyday use, but including features that happen not have been common in the training corpus. MT systems are usually trained on the varieties of text which is why human translations are plentiful, like technical or business documents, or transcripts of meetings of multilingual parliaments and conferences. This offers MT systems a natural bias towards some kinds of formal or technical text. And also if everyday vocabulary remains to be taught in training corpus, the grammar of everyday speech (including using tu as an alternative to usted in Spanish, or using the present tense instead of the future tense in various languages) may well not. MT systems in practice Researches and developers laptop or computer translation systems will always be conscious one of the primary dangers is public misperception of these purpose and limitations. Somers (2003)[1], observing the use of MT on the internet and in chat rooms, comments that: "This increased visibility of MT has received numerous side effets. [...] There is certainly a desire to teach the general public regarding the inferior of raw MT, and, importantly, why the product quality can be so low." Observing MT being used in '09, there's sadly little evidence that users' knowing of these issues has improved. For example, I'll present a little sample of knowledge coming from a Spanish-English MT service i make available with the Espanol-Ingles web page. The service works by using the user's input, applying some "cleanup" processes (like correcting some common orthographical errors and decoding common instances of "SMS-speak"), and after that seeking translations in (a) a bank of examples from the site's Spanish-English dictionary, and (b) a MT engine. Currently, Google Translate is utilized for the MT engine, although a custom engine can be utilized later on. The figures I present listed here are from an analysis of 549 Spanish-English queries presented to the system from machines in Mexico[2]-- put simply, we think that most users are translating using their native language. First, precisely what are people with all the MT system for? For every query, Cleaning it once a a "best guess" in the user's purpose for translating the query. In many cases, the point is pretty obvious; in a few cases, there is clearly ambiguity. Your caveat, I judge that in approximately 88% of cases, the intended use is fairly clear-cut, and categorise these uses the following: Searching for a single word or term: 38% Translating a formal text: 23% Internet chat session: 18% Homework: 9% An amazing (or even alarming!) observation is that in this particular large proportion of cases, users are utilizing the translator to find information about a single word or term. In fact, 30% of queries was comprised of an individual word. The finding might be a surprising given that the web page involved also has a Spanish-English dictionary, and points too users confuse the objective of dictionaries and translators. However, not represented within the raw figures, there are clearly certain instances of consecutive searches where it appeared which a user was deliberately splitting up a sentence or phrase that might have in all probability been better translated if left together. Perhaps as a result of student over-drilling on dictionary usage, we view, for instance, a query for cuarto para ("quarter to") followed immediately by way of a query for any number. There is certainly clearly a necessity to coach students and users generally for the difference between the electronic dictionary as well as the machine translator[3]: particularly, that a dictionary will move the user to picking the appropriate translation given the context, but requires single-word or single-phrase lookups, whereas a translator generally works best on whole sentences and given just one word or term, will still only report the statistically most popular translation. I estimate that in less than a quarter of cases, users are employing the MT system because of its "trained-for" purpose of translating or gisting an elegant text (and so are entering a whole sentence, or at least partial sentence in lieu of a remote noun phrase). Naturally, you can't really know whether some of these translations were then intended for publication without further proof, which definitely isn't the reason for the device. Making use for translating formal texts has become almost rivalled with the use to translate informal on-line chat sessions-- a context that MT systems are generally not trained. The on-line chat context poses particular problems for MT systems, since features such as non-standard spelling, deficiency of punctuation and presence of colloquialisms not within other written contexts are normal. For chat sessions being translated effectively may possibly demand a dedicated system trained over a far better (and perchance custom-built) corpus.
Возврат к странице
Участник:VonBeckwith3381
.
Навигация
Заглавная страница
Сообщество
Текущие события
Свежие правки
Случайная статья
Справка
Поиск
Инструменты
Ссылки сюда
Связанные правки
Вклад участника
Журналы
Спецстраницы
Личные инструменты
Представиться / зарегистрироваться