Catastrophic forgetting destroys a fine-tuned model's general capabilities — it becomes an idiot savant
devtoolsdevtools0 views
You fine-tune GPT-3.5 on 10K legal contract reviews. The fine-tuned model is excellent at contract review — much better than base GPT-3.5. You then ask it to write a simple email. It writes the email in the format of a contract review, with 'WHEREAS' and 'HEREBY' language. You ask it a basic math question. It fails. You ask it to summarize a news article. It formats the summary as a contract clause. The model forgot how to do everything except contract review. This is catastrophic forgetting: fine-tuning on domain-specific data overwrites the general knowledge stored in the model's weights. So what? You now have a model that is great at one task and terrible at everything else. If your application needs the model to do contract review AND answer general questions AND draft emails, you must either: (a) use the base model (worse at contracts), (b) use the fine-tuned model (worse at everything else), or (c) maintain two models and route queries between them (complex and expensive). Most fine-tuning projects hit this trade-off: the more you specialize, the more you lose. The sweet spot between specialization and generalization depends on your data, training duration, and learning rate — all discovered through expensive experimentation. Why does this persist? Catastrophic forgetting is a fundamental property of neural network gradient updates — new gradients overwrite old knowledge. Techniques to mitigate it exist (elastic weight consolidation, replay buffers, low learning rates, short training) but none eliminate it. LoRA partially addresses this by keeping most weights frozen, but even LoRA can cause forgetting if rank is too high or training is too long.
Evidence
Catastrophic forgetting documented in Kirkpatrick et al. (2017) 'Overcoming catastrophic forgetting in neural networks.' OpenAI fine-tuning docs warn about forgetting but provide no mitigation guidance. LoRA reduces but does not eliminate forgetting (see QLoRA ablation studies). No fine-tuning framework provides automated forgetting detection — users discover it during evaluation.