Researchers Use Harry Potter to Train AI to Forget Copyrighted Material

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More

Are large language models like OpenAI’s ChatGPT and Meta’s Llama 2 too rigid to be altered or edited without extensive retraining? Microsoft researchers Ronen Eldan and Mark Russinovich propose a groundbreaking solution in a new paper published on arXiv.org. They demonstrate a technique to erase specific knowledge from a language model, such as the Harry Potter books, without the need for complete retraining. By fine-tuning the model for just one hour, they effectively remove its ability to generate or recall Harry Potter-related content. This breakthrough paves the way for adaptable language models that can be refined according to changing organizational needs, ensuring long-term, enterprise-safe deployments.

According to the Microsoft researchers, traditional models of machine learning lack mechanisms to “forget” or “unlearn” knowledge. To overcome this limitation, they developed a three-part technique to approximate unlearning specific information in language models. First, they trained a model on the target data (Harry Potter books) to identify related tokens. Second, they replaced unique Harry Potter expressions with generic counterparts and generated alternative predictions. Finally, they fine-tuned the model on these alternative predictions, effectively erasing the original text from its memory. The researchers tested the model’s ability to generate or discuss Harry Potter content and found that it could essentially “forget” the intricate narratives of the series while remaining unaffected in other benchmarks. This technique represents a foundational step towards creating more responsible and adaptable language models in the future.

Expelliarmus-ing expectations

While further testing is needed, this proof-of-concept offers promising potential for creating more responsible and legally compliant language models. The technique may be more effective for fictional texts due to their unique references. However, the authors emphasize the need for further research to refine and extend the methodology for broader unlearning tasks in language models. Ultimately, developing techniques for selective forgetting will ensure that AI systems can dynamically align with changing priorities and requirements.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.