'Paper Review' 카테고리의 글 목록

Finetuned Language Models Are Zero-Shot Learners해당 논문은 instruction tuning을 사용함으로써, zero-shot 능력이 향상됨을 보여준다. 기존 언어 모델들은 다양한 dataset에 대한 instruct을 지정한 상태에서의 Input-Output을 기반으로 Fine-Tuning을 진행하였을때, Unseen(지시가 지정되지 않은) Task에 대한 zero-shot 성능은 현저히 떨어지는 문제가 발생했고, FLAN은 instruction tuning을 사용해 zero-shot 성능을 향상하였다. https://arxiv.org/abs/2109.01652 Finetuned Language Models Are Zero-Shot LearnersThis pape..

Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361 Scaling Laws for Neural Language ModelsWe study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnituarxiv.org Scaling Laws for Ne..

Chain-of-Thought Prompting Elicits Reasoning in Large Language Modelshttps://arxiv.org/abs/2203.11171 Self-Consistency Improves Chain of Thought Reasoning in Language ModelsChain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive..

LoRA: Low-Rank Adaptation of Large Language Modelshttps://arxiv.org/abs/2106.09685 LoRA: Low-Rank Adaptation of Large Language ModelsAn important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes learxiv.org..

Least-to-Most Prompting Enables Complex Reasoning In Large Language Modelshttps://arxiv.org/abs/2205.10625 Least-to-Most Prompting Enables Complex Reasoning in Large Language ModelsChain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars sh..

Large Language Models are Zero-Shot Reasonershttps://arxiv.org/abs/2205.11916 Large Language Models are Zero-Shot ReasonersPretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique farxiv.org Overview..

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models https://arxiv.org/abs/2201.11903 Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsWe explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning ab..

Fine-Tuning VS Prefix-Tuning자연어 처리(NLP)의 패러다임이 사전 학습(pretrained) -> 미세 조정(fine-tuning) 방식에서, 프롬프트(prompt) 기반 학습으로 발전하고 있다. 특히, GPT, T5, BERT 계열의 대형 언어 모델이 등장하면서, 기존 파인튜닝 방법에 대한 리소스 부담이 커지고 있다.Fine-Tuning전체 모델 파라미터 학습성능은 좋지만, 비용적 부담이 큼downstream task에서 변경될 경우, 전체 데이터를 재학습하여야 함Prompt-Tuning입력에 discrete prompt(자연어 형태의 값) 추가성능적으로 한계가 존재하지만, 적은 비용으로 처리 가능Adapter(PEFT)LLM 일부 레이어에 작은 모듈(Adapter) 삽입비용 절..

티스토리툴바