(* indicates equal contribution)
2024
-
Small Models are Valuable Plug-ins for Large Language Models
Canwen Xu,
Yichong Xu,
Shuohang Wang,
Yang Liu,
Chenguang Zhu,
and Julian McAuley
ACL 2024 (Findings)
[arXiv]
-
Automatic Pair Construction for Contrastive Post-training
Canwen Xu,
Corby Rosset,
Ethan C. Chau,
Luciano Del Corro,
Shweti Mahajan,
Julian McAuley,
Jennifer Neville,
Ahmed Hassan Awadallah,
and Nikhil Rao
NAACL 2024 (Findings)
[arXiv]
-
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
Tianyang Liu,
Canwen Xu,
and Julian McAuley
ICLR 2024
[URL]
2023
-
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data
Canwen Xu*,
Daya Guo*,
Nan Duan,
and Julian McAuley
EMNLP 2023
[arXiv]
-
Spoiler Detection as Semantic Text Matching
Ryan Tran*,
Canwen Xu*,
and Julian McAuley
EMNLP 2023
[URL]
-
LongCoder: A Long-Range Pre-trained Language Model for Code Completion
Daya Guo*,
Canwen Xu*,
Nan Duan,
Jian Yin,
and Julian McAuley
ICML 2023
[arXiv]
-
Mirror: A Natural Language Interface for Data Querying, Summarization, and Visualization
Canwen Xu,
Julian McAuley,
and Penghan Wang
WWW 2023 (Demo)
[arXiv]
[Code]
-
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu,
and Julian McAuley
AAAI 2023
[arXiv]
-
A Survey on Dynamic Neural Networks for Natural Language Processing
Canwen Xu,
and Julian McAuley
EACL 2023 (Findings)
[arXiv]
2022
-
InforMask: Unsupervised Informative Masking for Language Model Pretraining
Nafis Sadeq*,
Canwen Xu*,
and Julian McAuley
EMNLP 2022
[arXiv]
-
Efficiently Tuned Parameters are Task Embeddings
Wangchunshu Zhou*,
Canwen Xu*,
and Julian McAuley
EMNLP 2022
[arXiv]
-
Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification
Han Wang*,
Canwen Xu*,
and Julian McAuley
NAACL 2022
[arXiv]
-
BERT Learns to Teach: Knowledge Distillation with Meta Learning
Wangchunshu Zhou*,
Canwen Xu*,
and Julian McAuley
ACL 2022
[arXiv]
-
LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval
Canwen Xu*,
Daya Guo*,
Nan Duan,
and Julian McAuley
ACL 2022 (Findings)
[arXiv]
-
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Hugging Face + Big Science
ACL 2022 (Demo)
[arXiv]
-
Multitask Prompted Training Enables Zero-Shot Task Generalization
Hugging Face + Big Science
ICLR 2022
[Spotlight]
[URL]
-
Leashing the Inner Demons: Self-Detoxification for Language Models
Canwen Xu,
Zexue He,
Zhankui He,
and Julian McAuley
AAAI 2022
[arXiv]
2021
-
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression
Canwen Xu*,
Wangchunshu Zhou*,
Tao Ge,
Ke Xu,
Julian McAuley,
and Furu Wei
EMNLP 2021
[PDF]
[arXiv]
[URL]
-
Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
Wangchunshu Zhou,
Tao Ge,
Canwen Xu,
Ke Xu,
and Furu Wei
EMNLP 2021
[PDF]
[arXiv]
[URL]
-
Datasets: A Community Library for Natural Language Processing
The Hugging Face Team
EMNLP 2021 (Demo)
[Best demo paper award]
[PDF]
[arXiv]
[URL]
[Code]
-
Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge
Canwen Xu*,
Wangchunshu Zhou*,
Tao Ge,
Ke Xu,
Julian McAuley,
and Furu Wei
NAACL-HLT 2021
[PDF]
[arXiv]
[URL]
2020
-
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou*,
Canwen Xu*,
Tao Ge,
Julian McAuley,
Ke Xu,
and Furu Wei
NeurIPS 2020
[PDF]
[arXiv]
[URL]
[Code]
-
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu*,
Wangchunshu Zhou*,
Tao Ge,
Furu Wei,
and Ming Zhou
EMNLP 2020
[PDF]
[arXiv]
[URL]
[Code]
-
HuggingFace’s Transformers: State-of-the-art Natural Language Processing
The Hugging Face Team
EMNLP 2020 (Demo)
[Best demo paper award]
[PDF]
[arXiv]
[URL]
[Code]
-
MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization
Canwen Xu*,
Jiaxin Pei*,
Hongtao Wu,
Yiyu Liu,
and Chenliang Li
ACL 2020
[PDF]
[arXiv]
[URL]
[Video]
[Code]
-
Pre-train and Plug-in: Flexible Conditional Text Generation with Variational Auto-Encoders
Yu Duan*,
Canwen Xu*,
Jiaxin Pei*,
Jialong Han,
and Chenliang Li
ACL 2020
[PDF]
[arXiv]
[URL]
[Video]
[Code]
-
UnihanLM: Coarse-to-Fine Chinese-Japanese Language Model Pretraining with the Unihan Database
Canwen Xu,
Tao Ge,
Chenliang Li,
and Furu Wei
AACL-IJCNLP 2020
[PDF]
[URL]
2019
-
DLocRL: A Deep Learning Pipeline for Fine-Grained Location Recognition
and Linking in Tweets
Canwen Xu,
Jing Li,
Xiangyang Luo,
Jiaxin Pei,
Chenliang Li,
and Donghong Ji
WWW 2019
[arXiv]
[URL]