東京都立大学

自然言語処理研究室

東京都立大学システムデザイン学部情報科学科（東京都立大学大学院システムデザイン研究科情報科学域）人工知能・自然言語処理分野、自然言語処理研究室（小町研）のウェブサイトです。小町研では、多言語コミュニケーションを支援するために、コンピュータを用いて人間のことばを理解・解析する手法の研究をしています。西東京に自然言語処理の研究開発拠点を作ることを目指しています。

小町は2023年4月に一橋大学に新設されたソーシャル・データサイエンス学部・研究科に転出しました。都立大では新規の学生の募集を停止しています。一橋大学での指導を希望する人は小町までお問い合わせください。一橋大学の研究室サイトはこちら。

キーワード: 自然言語処理（機械翻訳、言語学習支援、基盤技術）、機械学習（深層学習、単語分散表現）

お知らせ

2025/05/29 自然言語処理のトップカンファレンス、ACL 2025 に以下の研究が採択されました！
1. Aomi Koyama, Masato Mita, Su-Youn Yoon (EduLab), Yasufumi Takama, Mamoru Komachi. Targeted Syntactic Evaluation for Grammatical Error Correction. ACL 2025 (main, long).
2025/03/14 言語処理学会第31回年次大会にて、下記の発表をしました。
1. 相田太一 (都立大), 小町守 (一橋大), 小木曽智信 (国語研), 高村大也 (産総研), 持橋大地 (統数研/国語研). ガウス過程による埋め込み点集合の時間遷移のモデル化. 言語処理学会第31回年次大会.
2. 中島京太郎, 金輝燦, 平澤寅庄, 榎本大晟 (都立大), 陳宙斯, 小町守 (一橋大). 大規模言語モデルに対するチューニング手法の調査：内部のアクセス性に基づく分類と比較.言語処理学会第31回年次大会.
3. 佐藤祥太 (金沢大), 木山朔 (都立大), 中島秀太, 小町守 (一橋大), 唐堂由其 (金沢大). 単語埋め込みの独立成分分析の軸が解釈できる粒度はどれくらいか？言語処理学会第31回年次大会.
4. 佐藤郁子, 金輝燦 (都立大), 陳宙斯 (一橋大), 三田雅人 (サイバーエージェント/都立大), 小町守 (一橋大). アライメントが大規模言語モデルの数値バイアスに与える影響. 言語処理学会第31回年次大会.
5. 坂部立 (一橋大), 金輝燦 (都立大), 小町守 (一橋大). 人間と LLM の "面白さ" の感性は一致するのか？言語処理学会第31回年次大会.（博報堂DYホールディングス賞受賞）
6. 榎本大晟, 金輝燦 (都立大), 陳宙斯, 小町守 (一橋大). 多言語大規模言語モデルにおける英語指示文と対象言語指示文の公平な比較. 言語処理学会第31回年次大会.
7. 大平颯人 (一橋大), 佐藤郁子 (都立大), 真鍋章, 谷本恒野, 原慎大 (富士電機), 小町守 (一橋大). 論文を対象とした RAG システムにおける質問分類に基づく動的検索. 言語処理学会第31回年次大会.

ニュース

2025/01/23 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) に下記の研究が採択されました！
1. Taisei Enomoto, Hwichan Kim, Zhousi Chen (Hitotsubashi University), Mamoru Komachi. A Fair Comparison without Translationese: English vs. Target-language Instructions for Multilingual LLMs. Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025). April, 2025. (accepted)
2025/01/22 The 31st International Conference on Computational Linguistics (COLING 2025) にて以下の発表をしました。
1. Hajime Kiyama, Taichi Aida, Mamoru Komachi, Toshinobu Ogiso (NINJAL), Hiroya Takamura (AIST), Daichi Mochihashi (ISM). Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices. The 31st International Conference on Computational Linguistics (COLING 2025). January, 2025. (oral)
2. Taichi Aida and Danushka Bollegala (University of Liverpool). Investigating the Contextualised Word Embedding Dimensions Specified for Contextual and Temporal Semantic Changes. The 31st International Conference on Computational Linguistics (COLING 2025). January, 2025. (poster)
2025/01/21 下記2件の公聴会を開催しました。対面・オンラインでご参加いただいた方、ありがとうございました。
1. 1/20 15:00-16:30 相田太一 "Considering Temporal and Contextual Information for Lexical Semantic Change Detection"
2. 1/20 16:30-18:00 金輝燦 "Enhancing Cross-Lingual Transfer: Strategies for Dialects and Distant Languages"
2024/12/12 情報処理学会第262回自然言語処理研究会にて、下記の研究が優秀研究賞を受賞しました！
1. 木山朔, 相田太一, 小町守, 小木曽智信 (国語研), 高村大也 (産総研), 持橋大地 (統数研). 単語の通時的な類似度行列による意味変化パターンの分析. 情報処理学会第262回自然言語処理研究会. 2024年12月12日.
2024/12/13 38th Pacific Asia Conference on Language, Information and Computation (PACLIC 38) にて以下の2件の発表をしました。
1. Ayako Sato, Tosho Hirasawa, Hwichan Kim, Zhousi Chen (Hitotsubashi University), Teruaki Oka, Masato Mita (CyberAgent), Mamoru Komachi. DejaVu: Disambiguation evaluation dataset for English-JApanese machine translation on VisUal information. Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation (PACLIC 38). December, 2024. (oral, accepted)
2. Kyotaro Nakajima, Hwichan Kim, Tosho Hirasawa, Taisei Enomoto, Zhousi Chen (Hitotsubashi University), Mamoru Komachi. A Survey for LLM Tuning Methods: Classifying Approaches Based on Model Internal Accessibility. Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation (PACLIC 38). December, 2024. (poster, accepted)
2024/11/20 JACIII に以下の2本の論文が掲載されました。
1. Hongfei Wang, Zhousi Chen, Zizheng Zhang, Zhidong Ling, Xiaomeng Pan, Wenjie Duan, Masato Mita (CyberAgent/TMU), Mamoru Komachi. Revisiting the Evaluation for Chinese Grammatical Error Correction. Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol.28, No.6, pp.1380-1390. November, 2024. (PDF)
2. Siti Oryza Khairunnisa, Zhousi Chen, Mamoru Komachi. Improving Domain-Specific NER in the Indonesian Language through Domain Transfer and Data Augmentation. Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol.28, No.6, pp.1299-1312. November, 2024. (PDF)
2024/11/16 EMNLP 2024 及び EMNLP 2024 と連続開催の WMT 2024 にて下記の研究を発表しました。
1. Hwichan Kim, Jun Suzuki (Tohoku University), Tosho Hirasawa, Mamoru Komachi (HIT). Pruning Multilingual Large Language Models for Multilingual Inference. Findings of the 2024 Confernece on Empirical Methods in Natural Language Processing (EMNLP 2024). November, 2024. (accepted)
2. Ayako Sato, Kyotaro Nakajima, Hwichan Kim, Zhousi Chen (HIT) and Mamoru Komachi (HIT). TMU-HIT's Submission for the WMT24 Quality Estimation Shared Task: Is GPT-4 a Good Evaluator for Machine Translation? Ninth Conference on Machine Translation (WMT24). November, 2024. (poster, accepted)
3. Masamune Kobayashi, Masato Mita (CyberAgent), Mamoru Komachi (HIT). Revisiting Meta-evaluation for Grammatical Error Correction. Transactions of the Association for Computational Linguistics (TACL), Vol.12, pp.837-855. July 1, 2024. (accepted)
2024/09/07 第19回 YANS シンポジウムにて以下の発表をしました。
1. 臼井久生 (東京農工大), 木山朔 (都立大), 古宮嘉那子 (東京農工大). 大規模視覚言語モデルの謎解き能力調査. YANS2024. 2024年9月5日.
2. 榎本大晟 (都立大), 金輝燦 (都立大), 陳宙斯 (一橋大), 小町守 (一橋大). Multilingual LLM への指示文は本当に英語であるべきなのか？. YANS2024. 2024年9月5日.（奨励賞受賞）
3. 坂部立 (一橋大), 金輝燦 (都立大), 小町守 (一橋大). 人間とLLMが考える"面白い”は一致するのか？ YANS2024. 2024年9月6日.（奨励賞受賞）
4. 木山朔 (都立大), 相田太一 (都立大), 小町守 (一橋大), 小木曽智信 (国語研), 高村大也 (産総研), 持橋大地 (統数研). 日本語の単語を対象とした複数時期の意味変化パターン分析. YANS2024. 2024年9月6日.（フューチャー株式会社賞受賞）
5. 佐藤郁子 (都立大), 金輝燦 (都立大), 陳宙斯 (一橋大), 三田雅人 (サイバーエージェント/都立大), 小町守 (一橋大). テキスト評価におけるLLMアライメント手法の影響分析. YANS2024. 2024年9月6日.
6. 中島京太郎 (都立大), 金輝燦 (都立大), 平澤寅庄 (都立大), 榎本大晟 (都立大), 小町守 (一橋大). 言語モデルの透明性ごとに適応な可能なチューニング手法の調査. YANS2024. 2024年9月6日.
2024/08/27 Siti Oryza Khairunnisa さんの博士論文公聴会 "Dataset Creation and Transfer Learning Approaches for Indonesian Named Entity Recognition（インドネシア語固有表現認識のためのデータセット作成と転移学習技法）" を開催しました。ご参加いただいた方々、ありがとうございました。
2024/08/22 以下の3編の論文が論文誌に採録決定または採録されました。
1. 凌志棟, 相田太一, 岡照晃 (SB Intuitions), 小町守. 日本語意味変化検出のための評価データセットの構築と分析. 自然言語処理. Vol.31, No.4, December, 2024. (accepted)
2. Siti Oryza Khairunnisa, Zhousi Chen, Mamoru Komachi. Improving Domain-Specific NER in the Indonesian Language through Domain Transfer and Data Augmentation. Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol.28, No.6. November, 2024. (accepted)
3. Masamune Kobayashi, Masato Mita (CyberAgent), Mamoru Komachi. Revisiting Meta-evaluation for Grammatical Error Correction. Transactions of the Association for Computational Linguistics (TACL), Vol.12, pp.837-855. July 1, 2024. (PDF)
2024/06/21 自然言語処理のメジャー国際会議 NAACL の併設ワークショップ BEA 2024 で以下の論文を発表しました。MLSP (multilingual lexical simplification pipeline) という多言語テキスト平易化の共通タスクではほぼ全ての言語で1位の性能を達成しました。
1. Masamune Kobayashi, Masato Mita, Mamoru Komachi. Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction. Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024). June, 2024. (short)
2. Taisei Enomoto, Hwichan Kim, Tosho Hirasawa, Yoshinari Nagai, Ayako Sato, Kyotaro Nakajima and Mamoru Komachi. TMU-HIT at MLSP 2024: How Well Can GPT-4 Tackle Multilingual Lexical Simplification? Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024): Shared task 2. June, 2024. (poster)
2024/05/24 自然言語処理のメジャー国際会議 LREC-COLING で以下の論文を発表しました。

Naoya Ueda, Masato Mita (CyberAgent), Teruaki Oka, Mamoru Komachi. Token-length Bias in Minimal-pair Paradigm Datasets. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp.16224–16236. May, 2024. (PDF)
Yoshinari Nagai, Teruaki Oka, Mamoru Komachi. A Document-Level Text Simplification Dataset for Japanese. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp.459–476. May, 2024. (PDF)
Hwichan Kim, Shota Sasaki (CyberAgent), Sho Hoshino (CyberAgent), and Ukyo Honda (CyberAgent). A Single Linear Layer Yields Task-Adapted Low-Rank Matrices. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp.1602–1608. May, 2024.

Page updated

Report abuse