Machine Translation System
From CrisisCommons Wiki
Project Twitter hashtag:
Last page update: 13APR2010
- Project Lead: Chris Taylor, christoper-paul-taylor (dashes to dots) over at gmail
- Customer: no official, strong desire to integrate with openstreetmap, and ushahidi
- Location of Lead: DC
- Volunteers: see googlecode page below
- Web Link: http://code.google.com/p/ccmts/, http://crisisterp.dyndns.org
Description:
Building a machine translation system for english and haitian creole using opensource data and open source software tools. Project is using Moses, http://www.statmt.org/moses/?n=Development.GetStarted, to conduct the machine translation.
Programming languages used based on task:
- Corpus Formatting - anything!
- Translation System - C/C++, Python, and SWIG
- Web interface - Python, Javascript
Tasks Completed:
- Parallel English/Haitian Corpus finalized
- Acquired web hosting via vis slicehost.
- Built the webhost (it was a "from scratch" virtual machine instance).
- Built moses and dependencies on the webhost.
- moses has generated it's required data sets from the parallel corpus.
- Got the moses c/c++ backend wrapped in python.
- Project logo completed.
- Project site is up and running, I'm building a user-base!
TODO Tasks:
- Improve error handling on the web page
- Package source code for public consumption
- Release training text, collect more training text (english and haitian)
- Identify other languages that need automatic translation tools
- Review web user interface, improve
- Find projects that could integrate into our service
- Develop mobile application/variant
January 30th update (Toronto team)
- Contact Heather Leson heatherlesonAT gmail.com or Jeff Kolesnikowicz Jeff At codepoets dot ca if you have any questions.
- Our team worked with Chris Taylor on this project. Chris's project uses an MT system called "Moses". Our team started to test other alternatives such as the google api. Translators worked with Py developers.
February 6th update (Toronto team)
- Have a basic Web form working, but I'm having some I18N issues.
- Don't have permission to check in to the Google Code repository.
- created a link using google translate app direct to an irc bot
- test case set up
- Built a web front end - not ready yet
- also backing onto google translate service
- Our Kreyol translators advised that the the translations need work, but a start
- NEXT STEPS: request comparison of google translation back-end and the "moses" MT backend
The bot code was a proof of concept. The rest of our bot is deeply unready for public exposure, and without the rest of the bot, that piece won't be of much use to anyone. The folks from Toronto continue to work on this.

