返回首页

联手自然语言处理专业委员会:“小样本文本分类”术语发布 | CCF术语快线

阅读量:67 2022-02-25 收藏本文

本期发布术语热词:小样本文本分类(Few-shot text classification)


开篇导语:

本期发布术语热词:小样本文本分类(Few-shot text classification)。小样本文本分类专注于处理仅包含少量标注数据的文本分类任务,它是小样本学习的经典应用领域。


小样本文本分类(Few-shot text classification)

作者:王雅晴(百度研究院),姚权铭(清华大学)


InfoBox:

中文名:小样本文本分类/少样本文本分类

外文名:Few-shot text classification

学科:自然语言处理、机器学习

实质:学习如何快速泛化到仅包含少量标注数据的文本分类任务。


基本简介:

文本分类是自然语言处理领域的基本问题,它的研究目标是设计模型为每条文本(如一个句子,一个段落或一篇文章)分配类别标签。现有文本分类方法[1]大多为深度模型,需要大量标注数据用于训练。然而,高质量的标注数据是稀缺资源[2]。小样本文本分类应运而生,专注于处理仅包含少量标注数据的文本分类任务。


研究概况:

小样本文本分类是小样本学习的经典应用领域。小样本学习[3]研究如何快速泛化到仅包含几个标注数据的新任务,能够降低大规模监督数据的收集、处理和计算成本,使得稀缺、新兴任务的学习和挖掘成为可能,是理解人工智能和人类学习之间差异的重要一环。

小样本学习的核心问题,是少量标注数据无法获得可靠的经验风险最小化(empirical risk minimization)的模型[3]。因此,除了少量标注数据,小样本学习需要借助先验知识(prior knowledge),即指『任何学习者在看到训练数据前就已知的信息』。根据使用先验知识改变了学习的哪个阶段,现有的小样本文本分类方法可以分成以下三类(图1):

640 (1)


图 1 小样本文本分类方法分类

数据:先验知识被用来进行数据增强,使标记样本增多到足够被标准模型和算法学习。样本增多的方式主要有三种:使用生成模型合成新样本[4],训练模型从大量无标注数据中挑选相似样本并打上标签[5]。

模型:先验知识被用来限制模型假设(函数)空间的复杂度,从而使得少量标注样本已经足够训练假设空间中的合适模型。其中,多任务学习同时学习多个相关的任务以捕捉不同任务间的共性,并允许每个任务有不同参数结构来保留特定信息[6]。度量学习将样本投影到一个相似样本距离近、不相似样本易于分辨的低维子空间中,并根据相似度函数度量嵌入间的相似度对样本进行分类[7]。记忆增强模型则学习一个映射函数将小样本中的知识抽取到记忆模块中[8]。

算法:先验知识被用来指导如何在假设空间中搜索最合适假设的参数,如从何处开始搜(学习参数的初始值),以及往哪个方向以什么速度搜(学习优化器)。模型一般通过(随机)梯度下降来优化参数。若标记样本有限,一方面优化迭代次数受限无法收敛到合适的值,另一方面也很容易使模型过拟合。在小样本文本分类任务上,这类技术又可细分为两种:精炼来自其他任务的参数和精炼元学习(meta learning)的参数。精炼来自其他任务的参数主要研究如何将训练自大语料无标注数据的预训练语言模型如BERT[9]、ERNIE[10]等用于当前小样本文本分类任务,如设计有效的模型微调技术(fine-tuning)[11]。特别的,提示学习技术(prompt-based learning)[12]是当前研究的热点方向。它将文本分类任务重构成预训练模型的训练任务,从而使得模型预训练和微调阶段的目标函数更匹配,在包括小样本文本分类在内的一系列自然语言处理任务上展现了有效性[13,14]。精炼元学习的参数则通过元学习从大量相关任务中捕捉通用信息,由元学习器为每个任务提供参数初始值,并通过提供的少量标注数据来微调参数来融入新任务的特有信息[15]。

当下,小样本文本分类的研究趋势包括:

(1)现有模型通常汇报不同实验设定及数据的结果,设计合理的基线数据集将便于比较不同方法的实际效果。

(2)由于标注数据优先,验证集样本往往缺失或不足,如何高效的模型选择、调整神经网络结构,超参调优是一个重要问题。

(3)研究小样本文本分类方法的理论保障,提高模型的鲁棒性和可解释性。

(4)拓展到更复杂场景,如多模态数据应用,与自监督学习、在线和终身学习、域迁移学习等问题。


参考文献

[1] Qian Li, Hao Peng, Jianxin Li, Congyin Xia, Renyu Yang, Lichao Sun, Philip S Yu, and Lifang He. 2020. A survey on text classification: From shallow to deep learning. arXiv preprint arXiv:2008.00364. 
[2] Yaqing Wang, Song Wang, Quanming Yao, and Dejing Dou. 2021. Hierarchical heterogeneous graph representation learning for short text classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Re-public, 3091–3101
[3] Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. 2020. Generalizing from a few examples: A survey on few-shot learning. Comput. Surveys 53, 3 (2020), 1–34.
[4] Thomas Dopierre, Christophe Gravier, and Wilfried Logerais. 2021. ProtAugment: Intent Detection Meta-Learning through Unsupervised Diverse Paraphrasing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 2454– 2466. 
[5] Subhabrata Mukherjee and Ahmed Awadallah. 2020. Uncertainty-aware self-training for few-shot text classification. Advances in Neural Information Processing Systems 33 (2020). 
[6] Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the 27th International Conference on Computational Linguistics, pages 487–498.
[7] Yujia Bao, Menghua Wu, Shiyu Chang, and Regina Barzilay. 2020. Few-shot Text Classification with Distributional Signatures. In International Conference on Learning Representations.
[8] Ruiying Geng, Binhua Li, Yongbin Li, Jian Sun, and Xiaodan Zhu. 2020. Dynamic memory induction networks for few-shot text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1087–1094, Online. Association for Computational Linguistics.
[9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. 
[10] Yu Sun, ShuohuanWang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223.
[11] Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q Weinberger, and Yoav Artzi. 2021. Revisiting few sample BERT fine-tuning. In International Conference on Learning Representations.
[12] Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021a. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586.
[13] Timo Schick and Hinrich Schütze. 2021b. It’s not just size that matters: Small language models are also few-shot learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2339–2352.
[14] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert- Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler,Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901.
[15] Pengfei Sun, Yawen Ouyang, Wenming Zhang, and Xin-yu Dai. 2021. MEDA: Meta-learning with data augmentation for few-shot text classification. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pages 3929–3935.