2024.03.28 2024.09.02

記事内に商品プロモーションを含む場合があります

HuggingFaceの大規模言語モデル（LLM）のチャットテンプレートの使い方

Aru

大規模言語モデル（LLM）を使用してチャットシステムを構築する場合、ユーザーとアシスタントを区別するために、特定の制御トークンを会話の間に挿入する必要があります。制御トークンは、LLMのモデルによってばらつきがあるため、都度調べて適切な制御トークンを挿入する必要があります。HuggingFaceでは、制御トークンの挿入を簡単に行うためのテンプレートが用意されいます。本記事では、HuggingFaceのapply_chat_template()の使い方について解説します。

大規模言語モデル関連記事一覧はこちら

大規模言語モデル（LLM）関連の記事一覧

Contents

チャットモデルのテンプレートとは
Llama2
Gemma
サポートしていないモデル
- CALM2
まとめ

チャットモデルのテンプレートとは

大規模言語モデルでチャットを行う場合、単純に文章（文字列）を入力するのではなく、「会話」を入力する必要があります。

会話では「ユーザー」と「アシスタント」のどちらの発言かを区別する必要があります。

これを行うために、文章感に制御トークンの追加が必要になります。しかしながら、残念なことに、現状では制御トークンの標準仕様が存在していません。

したがって、モデルごとに制御トークンを調べ、モデルにあった制御トークンを挿入する必要があります。

この問題を解消するために、HuggingFaceには「チャットモデル向けのテンプレート」を用意したモデルが存在しています。

具体的には、トークナイザーにapply_chat_templateという関数が用意されており、これを利用することでチャットモデルに合わせた制御コードが挿入されたプロンプトを作成することが可能です。

ここでは、apply_chat_templateの使い方をllama2とgemmaを例に解説します。

公式にも解説がありますのでそちらも参照してください。

公式のページ：Templates for Chat Models

Llama2

Meta社のLlamaの場合のテンプレートです

テンプレートを確認

テンプレートのフォーマットは、以下の変数で確認することが可能です。

tokenizer.default_chat_template

{% if messages[0][‘role’] == ‘system’ %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0][‘content’] %}{% elif false == true and not ‘<>’ in messages[0][‘content’] %}{% set loop_messages = messages %}{% set system_message = ‘You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don\’t know the answer to a question, please don\’t share false information.’ %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message[‘role’] == ‘user’) != (loop.index0 % 2 == 0) %}{{ raise_exception(‘Conversation roles must alternate user/assistant/user/assistant/…’) }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = ‘<>\n’ + system_message + ‘\n<>\n\n’ + message[‘content’] %}{% else %}{% set content = message[‘content’] %}{% endif %}{% if message[‘role’] == ‘user’ %}{{ bos_token + ‘[INST] ‘ + content.strip() + ‘ [/INST]’ }}{% elif message[‘role’] == ‘system’ %}{{ ‘<>\n’ + content.strip() + ‘\n<>\n\n’ }}{% elif message[‘role’] == ‘assistant’ %}{{ ‘ ‘ + content.strip() + ‘ ‘ + eos_token }}{% endif %}{% endfor %}

USERの質問のみのパターン

ユーザーの質問のみの場合は、以下のように"role":"uesr"だけを与えます。contentに続く文字列は、ユーザの入力になります。

tokenizer.apply_chat_templateにchatを渡すと、プロンプトを作成することができます。

from transformers import AutoTokenizer

model_name = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model_name)
chat = [
    { "role": "user", "content": "What is tallest mountain in the world?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
prompt

llama2の場合は、[INST]...[/INST]でユーザーを囲むことになっているので、正しく変換されていることがわかります。

<s>[INST] What is tallest mountain in the world? [/INST]

制御トークンが見やすいように色分けしています

USER→ASSISTANT→USERのパターン

チャットを続ける場合は、アシスタントの出力もプロンプトに含める必要があります。アシスタント側は"role":"assistant"とします。

下記の例は、アシスタントの出力も含めた例です。

from transformers import AutoTokenizer

model_name = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model_name)
chat = [
    { "role": "user", "content": "What is tallest mountain in the world?" },
    { "role": "assistant", "content": "Mount Everest, located in the Himalayas of Asia, is the tallest mountain in the world at 8,848.86 meters (29,032.4 feet) above sea level."},
    { "role": "user", "content": "Second one?"}
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
prompt

apply_chat_templateで変換されたプロンプトは以下のようになります。

<s>[INST] What is tallest mountain in the world? [/INST] Mount Everest, located in the Himalayas of Asia, is the tallest mountain in the world at 8,848.86 meters (29,032.4 feet) above sea level. </s><s>[INST] Second one? [/INST]

SYSTEMのあるパターン

llama2では、systemという項目も含めることができるようです。説明をみると、アシスタントに対する指示を入れるようです。

例えば、以下のような文章をシステムとして入力しておきます。

You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don’t know the answer to a question, please don’t share false information.

(あなたは親切で、礼儀正しく、誠実なアシスタントです。常に安全を保ちながら、できるだけ役立つように答えてください。回答には、有害、非倫理的、人種差別的、性差別的、有毒、危険、または違法なコンテンツを含めてはいけません。回答が社会的に偏見がなく、本質的に前向きであることを確認してください。\n\n質問が意味をなさない場合、または事実に一貫性がない場合は、正しくないことに答えるのではなく、その理由を説明してください。質問の答えがわからない場合は、誤った情報を共有しないでください。)

from transformers import AutoTokenizer

model_name = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model_name)
chat = [
    { "role": "system", "content": "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."},
    { "role": "user", "content": "What is tallest mountain in the world?" },
    { "role": "assistant", "content": "Mount Everest, located in the Himalayas of Asia, is the tallest mountain in the world at 8,848.86 meters (29,032.4 feet) above sea level."},
    { "role": "user", "content": "Second one?"}
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
prompt

以下が生成されたプロンプトになります。

<s>[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\nWhat is tallest mountain in the world? [/INST] Mount Everest, located in the Himalayas of Asia, is the tallest mountain in the world at 8,848.86 meters (29,032.4 feet) above sea level. </s><s>[INST] Second one? [/INST]

Gemma

Gemmaは、2024年2月21日に発表されたGoogleが開発した大規模言語モデル（LLM）です。

テンプレートを確認

Gemmaはテンプレートが用意されていないようです。

ただ、google/gemma-7b-itのモデルのWebページのサンプルプログラムででapply_chat_templateを使っています。

Gemmaは、以下のフォーマットをサポートしているようです。

tokenizer.default_chat_template

No chat template is defined for this tokenizer – using a default chat template that implements the ChatML format (without BOS/EOS tokens!). If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

USERの質問のみのパターン

llama2の時と同じで、ユーザーだけのプロンプトです。

from transformers import AutoTokenizer

model_name = "google/gemma-7b-it"

tokenizer = AutoTokenizer.from_pretrained(model_name)
chat = [
    { "role": "user", "content": "What is tallest mountain in the world?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
prompt

テンプレートがllama2と違うので、プロンプトも異なります。

ユーザーはuser、アシスタントはmodelに続けて入力するようようです。

<bos><start_of_turn>user\nWhat is tallest mountain in the world?<end_of_turn>\n<start_of_turn>model\n

USER→ASSISTANT→USERのパターン

ユーザーの質問に回答した結果と、次の質問を加えた例です。

from transformers import AutoTokenizer

model_name = "google/gemma-7b-it"

tokenizer = AutoTokenizer.from_pretrained(model_name)
chat = [
    { "role": "user", "content": "What is tallest mountain in the world?" },
    { "role": "assistant", "content": "Mount Everest, located in the Himalayas of Asia, is the tallest mountain in the world at 8,848.86 meters (29,032.4 feet) above sea level."},
    { "role": "user", "content": "Second one?"}
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
prompt

user→model→userと交互になっています。また<start_of_turn>と<start_of_turn>にそれぞれ挟まれていることも確認できました。

<bos><start_of_turn>user\nWhat is tallest mountain in the world?<end_of_turn>\n<start_of_turn>model\nMount Everest, located in the Himalayas of Asia, is the tallest mountain in the world at 8,848.86 meters (29,032.4 feet) above sea level.<end_of_turn>\n<start_of_turn>user\nSecond one?<end_of_turn>\n<start_of_turn>model\n

SYSTEMのあるパターン

gemmaはsystemをサポートしていないようで、エラーになります。

こういう違いがまだあるようなので、利用にはまだまだ注意が必要かもしれません。

from transformers import AutoTokenizer

model_name = "google/gemma-7b-it"

tokenizer = AutoTokenizer.from_pretrained(model_name)
chat = [
    { "role": "system", "content": "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."},
    { "role": "user", "content": "What is tallest mountain in the world?" },
    { "role": "assistant", "content": "I'm doing great. How can I help you today?"},
    { "role": "user", "content": "Second one?"}
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
prompt

TemplateError: System role not supported

サポートしていないモデル

現状、チャットテンプレートをサポートしていないモデルもあります。例えば、CALM2はテンプレートをサポートしていません。

CALM2

CyberAgentのCALM2についても調べてみましたが、こちらのモデルはapply_chat_templateをサポートしていないようです。

from transformers import AutoTokenizer

model_name = "cyberagent/calm2-7b-chat"

tokenizer = AutoTokenizer.from_pretrained(model_name)
chat = [
    { "role": "user", "content": "What is tallest mountain in the world?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
prompt

一応、デフォルトのテンプレートで変換は行われます。

No chat template is defined for this tokenizer – using the default template for the GPTNeoXTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

What is tallest mountain in the world?<|endoftext|>

ただ、calm2は以下のような構造のプロンプトを期待しています。これと変換されたプロンプトの構造は一致していないので、テンプレートを使った変換ではうまくいきません。

公開されているチャットのテンプレート

USER: {user_message1}
ASSISTANT: {assistant_message1}<|endoftext|>
USER: {user_message2}
ASSISTANT: {assistant_message2}<|endoftext|>
USER: {user_message3}
ASSISTANT: {assistant_message3}<|endoftext|>

まとめ

以上、apply_chat_templateを使った会話プロンプトの作り方について説明しました。

実は、これまで、この機能に気づかずに自力でプロンプト作成していました。毎回、プロンプトで悩んでいましたわけです

本当はモデルのプロンプトが標準化されると良いのですが、関数を使って生成できるだけでもかなりありがたいです。

#hugging face #LLM #Tokenizer #チャット #ディープラーニング #プロンプト

HuggingFaceの大規模言語モデル（LLM）のチャットテンプレートの使い方

チャットモデルのテンプレートとは

Llama2

テンプレートを確認

USERの質問のみのパターン

USER→ASSISTANT→USERのパターン

SYSTEMのあるパターン

Gemma

テンプレートを確認

USERの質問のみのパターン

USER→ASSISTANT→USERのパターン

SYSTEMのあるパターン

サポートしていないモデル

CALM2

まとめ

コメントをキャンセル

次元圧縮｜PCA, t-SNE, UMAPで高次元データを可視化する【初級深層学習講座】

ultralytics版のSAMでゼロショットセグメンテーションに挑戦

PyTorchテンソル操作・演算の逆引きチートシート　【初級深層学習講座】

M1とM2 Mac（GPU）のディープラーニング性能をチェック｜YOLOv8（学習）でベンチマーク

チャットモデルのテンプレートとは

Llama2

テンプレートを確認

USERの質問のみのパターン

USER→ASSISTANT→USERのパターン

SYSTEMのあるパターン

Gemma

テンプレートを確認

USERの質問のみのパターン

USER→ASSISTANT→USERのパターン

SYSTEMのあるパターン

サポートしていないモデル

CALM2

まとめ

コメントをキャンセル

次元圧縮｜PCA, t-SNE, UMAPで高次元データを可視化する【初級 深層学習講座】

ultralytics版のSAMでゼロショットセグメンテーションに挑戦

PyTorchテンソル操作・演算の逆引きチートシート 【初級 深層学習講座】

M1とM2 Mac（GPU）のディープラーニング性能をチェック｜YOLOv8（学習）でベンチマーク

次元圧縮｜PCA, t-SNE, UMAPで高次元データを可視化する【初級深層学習講座】

PyTorchテンソル操作・演算の逆引きチートシート　【初級深層学習講座】