東京都立大学

自然言語処理研究室

東京都立大学システムデザイン学部情報科学科（東京都立大学大学院システムデザイン研究科情報科学域）人工知能・自然言語処理分野、自然言語処理研究室（小町研）のウェブサイトです。小町研では、多言語コミュニケーションを支援するために、コンピュータを用いて人間のことばを理解・解析する手法の研究をしています。西東京に自然言語処理の研究開発拠点を作ることを目指しています。

小町は2023年4月に一橋大学に新設されるソーシャル・データサイエンス学部・研究科に転出しました。都立大では新規の学生の募集を停止しています。一橋大学で大学院（修士課程）の指導を希望する人は小町までお問い合わせください。※博士課程は2025年4月入学の学生から受け入れ予定です。一橋大学の研究室サイトは暫定的にこちら。

キーワード: 自然言語処理（機械翻訳、言語学習支援、基盤技術）、機械学習（深層学習、単語分散表現）

ニュース

2024/03/25 言語処理学会第30回年次大会にて、下記の発表を行いました。
- ○相田太一 (国語研/都立大), 近藤明日子 (東大), 小木曽智信 (国語研). 「昭和・平成書き言葉コーパス」の語彙統計情報の公開. 言語処理学会第30回年次大会. 2024年3月12日.
- ○◊Zizheng Zhang (都立大), Masato Mita (サイバーエージェント), Mamoru Komachi (一橋大). A Task of Cloze Explanation Generation for ESL Learning. 言語処理学会第30回年次大会. 2024年3月12日.
- ○凌志棟, 相田太一, 岡照晃 (都立大), 小町守 (一橋大). 日本語意味変化検出の評価セットの拡張と検出手法の評価. 言語処理学会第30回年次大会. 2024年3月12日.
- ○小林正宗 (都立大), 三田雅人 (サイバーエージェント), 小町守 (一橋大). 文法誤り訂正の包括的メタ評価: 既存自動評価の限界と大規模言語モデルの可能性. 言語処理学会第30回年次大会. 2024年3月12日.
- ○◊Zhishen Yang (東工大), Tosho Hirasawa (都立大), Edison Marrese-Taylor (産総研/東大), Naoaki Okazaki (東工大). Large Language Models as Manga Translators: A Case Study. 言語処理学会第30回年次大会. 2024年3月13日.
- ○佐藤郁子, 平澤寅庄, 金輝燦, 岡照晃 (都立大), 小町守 (一橋大). 語義曖昧性解消に着目した英日マルチモーダル機械翻訳の評価セット構築と分析. 言語処理学会第30回年次大会. 2024年3月13日.
- ○木山朔, 相田太一 (都立大), 小町守 (一橋大), 小木曽智信 (国語研), 高村大也 (産総研), 松井秀俊 (滋賀大), 持橋大地 (統数研). 意味変化分析に向けた単語埋め込みの時系列パターン分析. 言語処理学会第30回年次大会. 2024年3月13日.
- ○上田直生也 (都立大), 三田雅人 (サイバーエージェント/都立大), 小町守 (一橋大). Minimal-pair Paradigmデータセットにおけるトークン長バイアスの分析と改善. 言語処理学会第30回年次大会. 2024年3月14日.
- ○大平颯人 (東北大), 金輝燦 (都立大), 小町守 (一橋大). 多言語ゼロショット学習における推論言語に関する分析. 言語処理学会第30回年次大会. 2024年3月14日.
2024/02/01 今年は下記の博士論文の公聴会（オンラインまたはハイブリッド）を開催しました。
- 1/15 15:00-16:30 高橋啓吾 "Context Analysis with Large Language Models"
- 1/24 15:30-17:00 Zizheng Zhang "Empowering Language Assessment and Education with Natural Language Processing: A Focus on Cloze Tests"
- 1/29 15:00-16:30 平澤寅庄 "Multimodal Machine Translation as a Resource Scarcity Problem"

2023/10/09 The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023) および The Eighth Conference on Machine Translation (WMT 2023) に以下の論文が採択されました。
1. Taichi Aida and Danushka Bollegala (Liverpool University). Swap and Predict -- Predicting the Semantic Changes in Words across Corpora by Context Swapping. Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore. December, 2023.
2. Xiaohang Tang (Liverpool University), Yi Zhou (Liverpool University), Taichi Aida, Procheta Sen (Liverpool University), and Danushka Bollegala (Liverpool University). Can Word Sense Distribution Detect Semantic Changes of Words? Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore. December, 2023.
3. Zizheng Zhang, Masato Mita (Cyberagent/TMU), Mamoru Komachi (Hitotsubashi University). ClozEx: A Task toward Generation of English Cloze Explanation. Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore. December, 2023.
4. Tosho Hirasawa, Emanuele Bugliarello (University of Copenhagen), Desmond Elliott (University of Copenhagen) and Mamoru Komachi (Hitotsubashi University). Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation. The Eight Conference on Machine Translation (WMT 2023). Singapore. December, 2023.
2023/09/10 The 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37) に以下の論文が採択されました。
1. Taisei Enomoto, Tosho Hirasawa, Hwichan Kim, Teruaki Oka (Hitotsubashi University) and Mamoru Komachi (Hitotsubashi University). Simultaneous Domain Adaptation of Tokenization and Machine Translation. The 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37). Hong Kong. December, 2023. (poster, accepted)
2. Zhidong Ling, Taichi Aida, Teruaki Oka (Hitotsubashi University) and Mamoru Komachi (Hitotsubashi University). Construction of Evaluation Dataset for Japanese Lexical Semantic Change Detection. The 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37). Hong Kong. December, 2023. (oral, accepted)
2023/08/31 NLP 若手の会シンポジウム (YANS 2023) で以下の表彰がありました。
1. スポンサー賞（LLM-X 賞）: 中島京太郎 (都立大). 語彙内トークンを媒介とした大規模言語モデルへのソフトプロンプトの転移.
2. ハッカソン賞（リーダーボードハッカソン優秀賞）: 山本悠士 (東京理科大)，和田有輝也 (LINE)，郷原聖士 (NAIST)，木山朔 (都立大)

2023/08/14 NLP 若手の会シンポジウム (YANS 2023) で以下の発表を予定しています。
- 中島京太郎 (都立大), 金輝燦 (都立大), 平澤寅庄 (都立大), 岡照晃 (一橋大), 小町守 (一橋大). 語彙内トークンを媒介とした大規模言語モデルへのソフトプロンプトの転移.
- 上田直生也 (都立大), 三田雅人 (サイバーエージェント/都立大), 小町守 (一橋大). 文法性評価ベンチマークBLiMPにおけるバイアス除去.
- 金輝燦 (都立大), 小町守 (一橋大), 鈴木潤 (東北大). 言語識別器を用いた敵対的学習による多言語モデルの言語横断性の改善.
- 佐藤郁子 (都立大), 平澤寅庄 (都立大), 金輝燦 (都立大), 岡照晃 (一橋大), 小町守 (一橋大). 視覚情報による曖昧性解消に着目した英日マルチモーダル機械翻訳のデータセット構築.

2023/06/21 一般社団法人アジア太平洋機械翻訳協会 AAMT 長尾賞学生奨励賞の表彰式予定です。
- Yuting Zhao. Multimodal Neural Machine Translation based on Image-Text Semantic Correspondence.
2023/06 下記の国際会議論文が採択されました。
- Xiaomeng Pan, Zhousi Chen and Mamoru Komahi. Query Generation Using GPT-3 for CLIP-Based Word Sense Disambiguation for Image Retrieval. The 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023). (short)
- Hwichan Kim and Mamoru Komachi. Enhancing Few-shot Cross-lingual Transfer with Target Language Peculiar Examples. Findings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). (long)
- Taichi Aida and Danushka Bollegala (Liverpool University). Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings. Findings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). (long)
2023/05 下記の論文誌が出版予定です。
- Siti Oryza Khairunnisa, Zhousi Chen, Mamoru Komachi. Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). April, 2023.
- Keigo Takahashi, Teruaki Oka, Mamoru Komachi. Effectiveness of pre-trained language models for the Japanese Winograd Schema Challenge. Journal of Advanced Computatinal Intelligence and Intelligent Informatics, Vol. 27, No.3. (accepted)
- 相田太一, 小町守, 小木曽智信（国語研）, 高村大也（産総研）, 持橋大地（統数研）. 異なる時期での意味の違いを捉える単語分散表現の結合学習. 自然言語処理, Vol.30, No.2. 2023年6月. （採択決定）
- 小山碧海, 喜友名朝視顕, 小林賢治, 新井美桜, 三田雅人, 岡照晃, 小町守. 日本語文法誤り訂正のための誤用タグ付き評価コーパスの構築. 自然言語処理, Vol.30, No.2. 2023年6月.（採択決定）
- 小林千真, 相田太一, 岡照晃, 小町守. BERTを用いた日本語の意味変化の分析. 自然言語処理, Vol.30, No.2. 2023年6月. （採択決定）