[LLM] Finetuned Language Models Are Zero-Shot Learners
·
Paper Review
Finetuned Language Models Are Zero-Shot Learners해당 논문은 instruction tuning을 사용함으로써, zero-shot 능력이 향상됨을 보여준다. 기존 언어 모델들은 다양한 dataset에 대한 instruct을 지정한 상태에서의 Input-Output을 기반으로 Fine-Tuning을 진행하였을때, Unseen(지시가 지정되지 않은) Task에 대한 zero-shot 성능은 현저히 떨어지는 문제가 발생했고, FLAN은 instruction tuning을 사용해 zero-shot 성능을 향상하였다. https://arxiv.org/abs/2109.01652 Finetuned Language Models Are Zero-Shot LearnersThis pape..
[LM/LLM] Scaling Laws for Neural Language Models
·
Paper Review
Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361 Scaling Laws for Neural Language ModelsWe study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnituarxiv.org Scaling Laws for Ne..
[LLM] Self-Consistency Improves Chain of THought Reasoning In Language Models
·
Paper Review
Chain-of-Thought Prompting Elicits Reasoning in Large Language Modelshttps://arxiv.org/abs/2203.11171 Self-Consistency Improves Chain of Thought Reasoning in Language ModelsChain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive..
[PEFT] LoRA: Low-Rank Adaptation of Large Language Models
·
Paper Review
LoRA: Low-Rank Adaptation of Large Language Modelshttps://arxiv.org/abs/2106.09685 LoRA: Low-Rank Adaptation of Large Language ModelsAn important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes learxiv.org..
[LLM] Least-to-Most Prompting Enables Complex Reasoning In Large Language Models
·
Paper Review
Least-to-Most Prompting Enables Complex Reasoning In Large Language Modelshttps://arxiv.org/abs/2205.10625 Least-to-Most Prompting Enables Complex Reasoning in Large Language ModelsChain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars sh..
[LLM] Large Language Models are Zero-Shot Reasoners
·
Paper Review
Large Language Models are Zero-Shot Reasonershttps://arxiv.org/abs/2205.11916 Large Language Models are Zero-Shot ReasonersPretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique farxiv.org Overview..
[LLM] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
·
Paper Review
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models https://arxiv.org/abs/2201.11903 Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsWe explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning ab..
[LLM] Prefix-Tuning : Optimizing Continuous Prompts for Generation
·
Paper Review
Fine-Tuning VS Prefix-Tuning자연어 처리(NLP)의 패러다임이 사전 학습(pretrained) -> 미세 조정(fine-tuning) 방식에서, 프롬프트(prompt) 기반 학습으로 발전하고 있다. 특히, GPT, T5, BERT 계열의 대형 언어 모델이 등장하면서, 기존 파인튜닝 방법에 대한 리소스 부담이 커지고 있다.Fine-Tuning전체 모델 파라미터 학습성능은 좋지만, 비용적 부담이 큼downstream task에서 변경될 경우, 전체 데이터를 재학습하여야 함Prompt-Tuning입력에 discrete prompt(자연어 형태의 값) 추가성능적으로 한계가 존재하지만, 적은 비용으로 처리 가능Adapter(PEFT)LLM 일부 레이어에 작은 모듈(Adapter) 삽입비용 절..