Mingyu Derek Ma

PhD Candidate

derek.ma at ucla.edu
he/his/him

Hi!

I am a PhD candidate in Computer Science at UCLA working with Prof. Wei Wang. I earned my bachelor’s degree in Computing from The Hong Kong Polytechnic University with First Class Honours in 2018, advised by Prof. Qin Lu and Prof. Jiannong Cao. I studied as an exchange student at the University of Maryland in 2016. I’ve also spent time at Amazon Alexa AI (working with Dr. Jiun-Yu Kao and Dr. Tagyoung Chung), USC Information Sciences Institute (working with Prof. Nanyun (Violet) Peng and Prof. Muhao Chen), The Chinese University of Hong Kong (working with Prof. Helen Meng), UC Santa Cruz (working with Prof. Marilyn Walker) and MIT (working with Dr. Abel Sanchez and Prof. John R. Williams).

I’m interested in Natural Language Processing, Machine Learning and AI4Science. My research focuses on generative language models, especially in the clinical, medical, and science domains:

Architecture and training of generative language models ACL'23, INTERSPEECH'23
Data generation and augmentation with LLMs AAAI'24
Language models for clinical outcome prediction and scientific IE ACL'23a, ACL'23b
Bias, fairness, and safety of language models NAACL'24a, NAACL'24b, NAACL'24c
Data-efficient information extraction NAACL'21, EMNLP-F'22, ACL-F'23
Knowledge graphs EMNLP-F'21, AKBC'22

Recent News

Mar 13, 2024 Conference

To be presented at NAACL 2024 🇲🇽

Three papers are accepted to NAACL 2024 main conference!

In Mitigating Bias for Question Answering Models by Tracking Bias Influence, we present an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance by observing its influence on another instance. We then use the bias level detected as an optimization objective to form a multi-task learning setting in addition to the original QA task.
In Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models, we demonstrate that an attacker can inject backdoors by issuing very few malicious instructions among thousands of gathered data and control model behavior through data poisoning. Through such instruction attacks, the attacker can achieve over 90% attack success rate across four commonly used NLP datasets, and cause persistent backdoors that are easily transferred to 15 diverse datasets zero-shot.
In Instructional Fingerprinting of Large Language Models, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.

Feb 22, 2024 Conference, Talk, Poster, Demo

Presenting at AAAI 2024 🇨🇦

Time	Location	Activity
Demo Session 1, Feb 22 Thu, 19:00-21:00	Exhibit Hall AB1	Demo presentation: MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways. We present an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information.
Poster Session 2, Feb 23 Fri, 19:00-21:00	Exhibit Hall AB1	Poster presentation of the paper: STAR: Improving Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models. We present a structure-to-text data generation method for complicated structure prediction tasks that first generates complicated event structures (Y) and then generates input passages (X), all with Large Language Models. We show that the data generated by STAR significantly improves the performance of low-resource event extraction and relation extraction tasks, even surpassing the effectiveness of human-curated data.

Jan 21, 2024 Preprint

New preprint on LLM ownership protection

In InstructionalFingerprint, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.

Oct 1, 2023 Preprint

New preprint on bias mitigation

In BMBI, we propose to mitigate bias exhibited in QA models by observing the query instance’s influence on another instance, enabling bias mitigation with extremely low resources. With our method, bias levels in multiple bias categories can be reduced without using category-specific instance-level annotation.

Aug 1, 2023 Conference, Talk

Presenting at INTERSPEECH 2023 🇮🇪

Time	Location	Activity
Aug 24 Thu, 10:00-10:20 (IST)	Wicklow Hall 1	Oral presentation of the conf paper: Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning. In the collaboration work with Amazon Alexa AI, we introduce a dialogue state tracking model tuning less than 1% of LM parameters and achieves better low-resource performance with prompt tuning techniques.

More news

Publications

New! Memorize and Rank: Enabling Large Language Models for Medical Event Prediction

Mingyu Derek Ma, Yijia Xiao, Anthony Cuturrufo, Xiaoxuan Wang, Wei Wang

AAAI 2024 Spring Symposium on Clinical Foundation Models, 2024

We introduce Mera, a clinical event prediction model that bridges pertaining natural language knowledge with medical code. We apply contrastive learning on a predicted ranking list for task-specialized optimization. With concept memorization through fine-tuning, we equip the LLM with an in-depth understanding to recall the natural language definitions for medical code during inference.

New! Mitigating Bias for Question Answering Models by Tracking Bias Influence

Mingyu Derek Ma, Jiun-Yu Kao, Arpit Gupta, Yu-Hsiang Lin, Wenbo Zhao, Tagyoung Chung, Wei Wang, Kai-Wei Chang, Nanyun Peng

NAACL, 2024

We propose BMBI, an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance by observing its influence on another instance. We then use the bias level detected as an optimization objective to form a multi-task learning setting in addition to the original QA task.

New! Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models

Jiashu Xu, Mingyu Derek Ma, Fei Wang, Chaowei Xiao, Muhao Chen

NAACL, 2024

Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions among thousands of gathered data and control model behavior through data poisoning. Through such instruction attacks, the attacker can achieve over 90% attack success rate across four commonly used NLP datasets, and cause persistent backdoors that are easily transferred to 15 diverse datasets zero-shot.

New! Instructional Fingerprinting of Large Language Models

Jiashu Xu, Fei Wang, Mingyu Derek Ma, Pang Wei Koh, Chaowei Xiao, Muhao Chen

NAACL, 2024

We present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.

New! Improving Event Definition Following For Zero-Shot Event Detection

Zefan Cai, Po-Nien Kung, Ashima Suvarna, Mingyu Derek Ma, Hritik Bansal, Baobao Chang, P. Jeffrey Brantingham, Wei Wang, Nanyun Peng

arXiv, 2024

We aim to improve zero-shot event detection by training models to better follow event definitions. We hypothesize that a diverse set of event types and definitions are the key for models to learn to follow event definitions while existing event extraction datasets focus on annotating many high-quality examples for a few event types. Our experiments verify our hypothesis.

New! STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models

Mingyu Derek Ma, Xiaoxuan Wang, Po-Nien Kung, P. Jeffrey Brantingham, Nanyun Peng, Wei Wang

AAAI, 2024

We propose STAR, a structure-to-text data generation method for complicated structure prediction tasks that first generates complicated event structures (Y) and then generates input passages (X), all with Large Language Models. We further reduce errors and improve data quality through self-reflection error identification and self-refinement with iterative revision. We show that the data generated by STAR significantly improves the performance of low-resource event extraction and relation extraction tasks, even surpassing the effectiveness of human-curated data.

New! MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways

Mingyu Derek Ma, Alexander K. Taylor, Nuan Wen, Yanchen Lin, Po-Nien Kung, Wenna Qin, Shicheng Wen, Azure Zhou, Diyi Yang, Xuezhe Ma, Nanyun Peng, Wei Wang

AAAI Demonstrations, 2024

We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information.

New! Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach

Yanchen Lin, Mingyu Derek Ma, Wenna Qin, Azure Zhou, Jiaao Chen, Weiyan Shi, Wei Wang, Diyi Yang

arXiv, 2023

We propose a computational model to infer users’ susceptibility levels given their activities. Since user’s susceptibility is a key indicator for their reposting behavior, we utilize the supervision from the observable sharing behavior to infer the underlying susceptibility tendency. Building upon such large-scale susceptibility labeling, we further conduct a comprehensive analysis of how different social factors relate to susceptibility.

DICE: Data-Efficient Clinical Event Extraction with Generative Models

Mingyu Derek Ma, Alexander K. Taylor, Wei Wang, Nanyun Peng

ACL, 2023

We introduce DICE, a robust and data-efficient generative model for clinical event extraction, which specializes in clinical mention identification, and MACCROBAT-EE, the first clinical event extraction dataset with event argument annotation.

Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?

Jiashu Xu, Mingyu Derek Ma, Muhao Chen

ACL, 2023

We present NBR, which converts biomedical relation extraction as natural language inference formulation through indirect supervision.

Multi-hop Evidence Retrieval for Cross-document Relation Extraction

Keming Lu, I-Hung Hsu, Wenxuan Zhou, Mingyu Derek Ma, Muhao Chen

ACL Findings, 2023

We propose Mr.CoD, a multi-hop evidence retrieval method based on evidence path mining and ranking with adapted dense retrievers.

Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning

Mingyu Derek Ma, Jiun-Yu Kao, Shuyang Gao, Arpit Gupta, Di Jin, Tagyoung Chung, Nanyun Peng

INTERSPEECH, 2023 & ENLSP at NeurIPS, 2022

We use soft prompt tokens to learn task properties, incorporate segment information and reiterate the task before predicting value. Our method drastically reduces the number of parameters needed to less than 0.5% of prior works while achieving better low-resource dialogue state tracking performance.

Summarization as Indirect Supervision for Relation Extraction

Keming Lu, I-Hung Hsu, Wenxuan Zhou, Mingyu Derek Ma, Muhao Chen

EMNLP Findings, 2022

We present SuRE, which converts RE into a summarization formulation. SuRE leads to more precise and resource-efficient RE based on indirect supervision from summarization tasks.

Bending the Future: Autoregressive Modeling of Temporal Knowledge Graphs in Curvature-Variable Hyperbolic Spaces

Jihoon Sohn, Mingyu Derek Ma, Muhao Chen

AKBC, 2022

We use more expressive hyperbolic spaces to tackle temporal knowledge graph reasoning with global representations to model chronological hierarchies between KGs, and local ones to model diverse hierarchical levels of KGs by variable curvatures of hyperbolic embeddings.

HyperExpan: Taxonomy Expansion with Hyperbolic Representation Learning

Mingyu Derek Ma, Muhao Chen, Te-Lin Wu, Nanyun Peng

EMNLP Findings, 2021

A taxonomy expansion algorithm that seeks to preserve the structure of a taxonomy in a more expressive hyperbolic embedding space and learn to represent concepts and their relations with a Hyperbolic Graph Neural Network.

EventPlus: A Temporal Event Understanding Pipeline

Mingyu Derek Ma, Jiao Sun, Mu Yang, Kung-Hsiang Huang, Nuan Wen, Shikhar Singh, Rujun Han, Nanyun Peng

NAACL Demonstrations, 2021

A temporal event understanding pipeline that integrates various state-of-the-art event understanding components including event trigger and type detection, event argument detection, event duration and temporal relation extraction.

Dual Memory Network Model for Sentiment Analysis of Review Text

Jiaxing Shen, Mingyu Derek Ma, Rong Xiang, Qin Lu, Elvira Perez Vallejos, Ge Xu, Chu-Ren Huang, Yunfei Long

Knowledge-Based Systems, 2020

A dual user and product memory network (DUPMN) model to learn user profiles and product information for reviews classification using separate memory networks

Implicit Discourse Relation Identification for Open-domain Dialogues

Mingyu Derek Ma, Kevin K. Bowden, Jiaqi Wu, Wen Cui, Marilyn Walker

ACL, 2019

A novel dataset of implicit discourse relation argument pairs and labels for dialogic turns and a novel discourse relation identification pipeline specifically tuned for open-domain dialogue systems

Dual Memory Network Model for Biased Product Review Classification

Yunfei Long, Mingyu Derek Ma, Qin Lu, Rong Xiang, Chu-Ren Huang

EMNLP WASSA, 2018

Use of separate memory networks for user profile and product information helps sentiment analysis on Yelp and IMDB datasets

BlocHIE: a BLOCkchain-based platform for Healthcare Information Exchange

Shan Jiang, Jiannong Cao, Hanqing Wu, Yanni Yang, Mingyu Derek Ma, Jianfei He

SMARTCOMP, 2018

A Blockchain-based platform for healthcare information exchange consisting of two loosely-coupled Blockchains for different sources

Curriculum Vitae

Experience

Amazon Alexa AI
Applied Scientist Intern with Jiun-Yu Kao, Arpit Gupta, Yu-Hsiang Lin, Wenbo Zhao, Kai-Wei Chang, Nanyun Peng and Tagyoung Chung
Jun - Sep 2022, Sunnyvale, CA
Amazon Alexa AI
Applied Scientist Intern with Jiun-Yu Kao, Shuyang Gao, Arpit Gupta, Di Jin, Nanyun Peng and Tagyoung Chung
Jun - Sep 2021, Remote
UCLA Computer Science Department
Graduate Student Researcher / Teaching Assistant
Since Sep 2020, Los Angeles, CA
USC Information Sciences Institute
Graduate Research Assistant to Dr. Nanyun Peng, Aug 2019 - Aug 2020; Marina del Rey, CA
Knowledge-directed Artiﬁcial Intelligence Reasoning Over Schemas
The Chinese University of Hong Kong Human-Computer Communications Lab
Research Assistant with Prof. Helen Meng
Jan - Jul 2019, Hong Kong
UC Santa Cruz Natural Language and Dialogue Systems Lab
Research Intern with Prof. Marilyn Walker
Jun - Oct 2018, Santa Cruz, CA
PolyU Department of Computing
Undergraduate Research Assistant with Prof. Qin Lu and Prof. Jiannong Cao
Jan 2017 - June 2018, Hong Kong
MIT Geospatial Data Center
Research Intern with Dr. Abel Sanchez and Prof. John R. Williams
Jul - Aug 2017, Cambridge, MA

Education

University of California, Los Angeles
PhD Student in Computer Science
Since 2020, Los Angeles, CA
University of Southern California
PhD Student in Computer Science
2019 - 2020, Los Angeles, CA
The Hong Kong Polytechnic University
Bachelor of Science in Computing (First Class Honours)
2014 - 2018, Hong Kong
Best Capstone Project Award (Top 1%), Graduate Representative for Valedictory Speech
University of Maryland, College Park
Exchange Student
2016, College Park, MD

Awards

Outstanding Project Award - Best Capstone Project Award Competition, PolyU Dept. of Computing (1/100) , 2018

PolyU Computing News

HKSAR Government Scholarship Fund Talent Development Scholarship , 2018

Silver Award - Hong Kong ICT (Information and Communication Technologies) Awards (website | wiki) Student Innovation Award (Tertiary or Above), Hong Kong Government , 2018

PolyU News | PolyU Computing News | PolyU Computing News about InnoCarnival Exhibition | PolyU Tweet

Champion and Most Innovative Award (HKSAR) - Imagine Cup (website | wiki), Microsoft , 2017

Commercial Radio 50th Anniversary Scholarship, Hong Kong Commercial Broadcasting Company Limited & PolyU (1/400) , 2017

Winner - Hong Kong Techathon, PolyU and City University of Hong Kong , 2018

PolyU Computing News | PolyU Tweet

CMA (The Chinese Manufacturers’ Association of Hong Kong) & Donors Scholarship (3/100) , 2018

Champion - PolyU Smart Computing Competition (website) , 2017

PolyU Computing News

Best Creative Service Project - Youth Volunteer Service Conference (website) , 2017

News by Office of Service-Learning, PolyU | News by HKSAR Gov Agency for Volunteer Service

PolyU Undergraduate Summer Research Abroad Sponsorship , 2017

PolyU Chinese Mainland and Overseas Activities Fund , 2016