How Google Translate deceives you
(www.americanthinker.com)
You're viewing a single comment thread. View all comments, or full comment thread.
Comments (10)
sorted by:
I'd have to tear that sentence down and feed it to google piece by piece to be convinced that this is actually deliberate.
I've been using google translate on a manga, and I've found that it can get confused with run-on sentences if it has to change ordering systems as well. There are things you can throw at it that it'll handle fine if you chop them up into smaller pieces, but if you do the whole thing it'll get confused and start truncating.
It makes me think that they're actually using some pretty heavy AI behind the scenes, extracting meaning from the input and then re-expressing it in the selected output. Which is COOL, because it means that to an extent their system is actually comprehending that "jack went up the hill" and could say that in any language you want. But it also opens it up to problems and potentially means needing to structure things for a lower level of reading comprehension (using simple statements, bullet lists, etc).
If you don't follow what I mean, go read some articles on Simple English Wikipedia.
It would be nice if they had a visual debugging mode, so you could see the sentence diagram.
I'd be astonished if it's deliberate, because the AI capability one would need to strip inconvenient truths out of foreign news media like this would suggest google's censorship tech is years if not decades ahead of where I thought it was.
I've seen google translate omit entire chunks of innocuous sentences before, and wondered what on earth it was smoking. I'd be shocked to learn that this was deliberate censorship rather than the system simply drawing a blank when trying to interpret something.
Languages have all sorts of weirdness. Transliteration is usually not very accurate as turning phrases and decoding idioms literally might not make sense. Literal "Darmok and Jalad at Tanagra" level stuff.
So useful translating applications have models to keep the nonsense from emerging from the output. In the old days it was heuristic models. These days I don't know how it works but if it's going GAN-based, the training of the system would be around what needs to be avoided from the output.
The difference between oddball phrases that shouldn't be transmitted literally and inconvenient facts that must not be communicated is merely a few points in a data store.
HIS ARMS WIDE