基本信息
views: 14

Bio
I previously worked at Apple, Johns Hopkins University (where I also completed my PhD), MIT Lincoln Laboratory, and Rincon Research Corporation on topics including text-to-speech, machine translation (MT), bitext curation and filtering, automatic MT evaluation, multilingual modeling, paraphrasing, cross-language information retrieval, domain adaptation, and digital signal processing.
I developed Vecalign for the ParaCrawl parallel data acquisition project. Vecalign is an accurate sentence alignment algorithm based on multilingual sentence embeddings which is linear in complexity with respect to the number of sentences being aligned. In conjunction with LASER, Vecalign makes it easy to perform sentence alignment in about 100 languages (i.e. 100^2 language pairs), without the need for a machine translation system or lexicon. At the time of writing, Vecalign has the best reported performance on the test set released with Bleualign.
I also developed Prism, an automatic MT metric which uses a sequence-to-sequence paraphraser to score MT system outputs conditioned on their respective human references. Prism uses a multilingual neural MT model as a zero-shot paraphraser, which eliminates the need for synthetic paraphrase data and results in a single model which works in many languages (we release a model in 39 languages). At the time of publication, Prism outperformed or statistically tied with all metrics submitted to the WMT 2019 metrics shared task at segment-level human correlation. I developed bitext filtering code to preprocess the data used to train Prism, but the code is general enough to use for any MT training and is released here.
Research Interests
Papers共 30 篇Author StatisticsCo-AuthorSimilar Experts
By YearBy Citation主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
Conference on Machine Translationpp.47-81, (2024)
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2 SHORT PAPERSpp.488-500, (2024)
Ibrahim Said Ahmad,Antonios Anastasopoulos,Ondřej Bojar,Claudia Borg,Marine Carpuat,Roldano Cattoni,Mauro Cettolo,William Chen,Qianqian Dong,Marcello Federico,Barry Haddow,Dávid Javorský,Mateusz Krubiński,Tsz Kin Lam,Xutai Ma,Prashant Mathur,Evgeny Matusov, Chandresh Maurya, John McCrae,Kenton Murray,Satoshi Nakamura,Matteo Negri,Jan Niehues,Xing Niu,Atul Kr. Ojha,John Ortega,Sara Papi,Peter Polák, Adam Pospíšil,Pavel Pecina,Elizabeth Salesky, Nivedita Sethiya, Balaram Sarkar,Jiatong Shi,Claytone Sikasote,Matthias Sperber,Sebastian Stüker,Katsuhito Sudoh,Brian Thompson,Marco Turchi,Alex Waibel,Shinji Watanabe,Patrick Wilken, Petr Zemánek,Rodolfo Zevallos
CoRR (2024)
Annual Meeting of the Association for Computational Linguisticspp.1763-1775, (2024)
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023pp.289-295, (2023)
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (2023): 419-435
INTERSPEECH 2023pp.37-41, (2023)
Conference on Machine Translationpp.95-102, (2023)
Conference on Machine Translationpp.578-628, (2023)
Load More
Author Statistics
#Papers: 30
#Citation: 604
H-Index: 10
G-Index: 21
Sociability: 5
Diversity: 1
Activity: 6
Co-Author
Co-Institution
D-Core
- 合作者
- 学生
- 导师
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn