Skip to Main Content

Introduction to Fine-tuning Models

"Fine-tuning models are the ultimate secret of alchemy." Legendary alchemist Idar Alchemy wrote in the "Alchemy Handbook"

Introduction to Fine-tuning Models

What is fine-tuning a model?

Let's consider a few real-life scenarios first.

  1. Suppose you like a certain celebrity and want the model to generate their photos. How do you tell the model that you want to generate photos of this particular celebrity?
  2. Suppose you want a chatbot to play a role-playing game with you, with the chatbot playing as a catgirl. During the conversation, you develop feelings for this catgirl, but when you start a new conversation context and want the chatbot to play as this catgirl again, the chatbot's speaking style and personality are noticeably different from before. How can you ensure that the chatbot's speaking style/personality remains consistent?
  3. Your friend X has passed away, and you want a chatbot to pretend to be them and chat with you. To make the chatbot as much like X as possible, you provide the chatbot with complex background settings. However, because the background settings are too long, the chatbot often forgets the previous conversation due to the length of the conversation context. How can you solve this problem?

The problems faced in these three scenarios cannot be solved simply by modifying the model's input (also known as the prompt). The problems include:

  1. It may not be possible to describe the desired generated content accurately through natural language.

    For example, when it comes to faces, even if we tell the model that the celebrity has big eyes, a high nose bridge, black hair, etc., the model still cannot accurately determine what kind of person we want to generate. We cannot use natural language to accurately describe a person's face.

  2. Simple natural language descriptions cannot guarantee that the content generated by the model is stable.

    For example, in a role-playing scenario, we tell the chatbot that it needs to play a certain character. Even if we describe the character's speaking style or personality through text, the chatbot's answers will still be different from our settings in multiple plays and conversations.

    Simple textual constraints on the model's generation usually have low accuracy, and we cannot accurately describe a person through simple text.

  3. Simple textual constraints on the model can result in too short conversations due to the length of the limiting text.

    To improve accuracy through natural language constraints, longer natural language constraints are needed. The longer the natural language constraint, the more dialog context space it occupies, resulting in the actual conversation being too short.

When simple input adjustments cannot solve these problems, model fine-tuning techniques are needed.

Model fine-tuning refers to retraining an existing model on specific data to make the model more suitable for a particular scenario.

For example, the problems faced in the above scenarios can all be solved through model fine-tuning.

  1. It may not be possible to describe the desired generated content accurately through natural language.

    Since we cannot describe a celebrity's face through language, we can use their photos to tell the model what kind of person we want to generate.

    First, we collect 10-40 clear face photos of the celebrity online, and then we fine-tune the model on these photos. After fine-tuning, we can use the fine-tuned model to generate photos of this celebrity.

  2. Simple natural language descriptions cannot guarantee that the content generated by the model is stable.

    First, we need to collect the chat data of the catgirl we liked before and find her speaking style. If the previous conversation data is not enough, we can let the chatbot directly generate several languages that mimic this style, and then pick out the suitable text. Then we fine-tune the model on this text, and the fine-tuned model will output text that leans toward the style of the fine-tuning data.

  3. Simple textual constraints on the model can result in too short conversations due to the length of the limiting text.

    First, we also need to prepare some data of friend X, such as their chat conversations and character background. Then we fine-tune the model on this text, and the fine-tuned model can directly converse in the style of this character without complex background settings.

In general, the significance of fine-tuning models is to make the content generated by the model more customized.

When do you not need to fine-tune a model

For example, if you want to specify that the characters in an image are all wearing white T-shirts, or if you want a chatbot to play a utilitarian role, generally speaking, in non-customized scenarios, you can use a general model without fine-tuning.

What do you need to prepare for fine-tuning a model

Equipment: A computer with an Nvidia graphics card (if you don't have such equipment, you can rent an online service)

Data:

  • For image types, you need to prepare some images
  • For text types, you need to prepare text corpus
  • For audio types, you need to prepare some music or song files

How to use the fine-tuned model

Please refer to the content in the model usage overview.