Base LLM:
- Predicts the next word, based on text training data.
Instruction Tuned LLM:
- Tries to follow instructions.
- Fine-tune on instructions and good attempts at following those instructions.
- RLHF: Reinforcement Learning with Human Feedback
- Trained to be helpful, honest and harmless.
No comments:
Post a Comment