Open-source developers' GPL and MIT-licensed code was used to train GitHub Copilot without required attribution, and courts dismissed their copyright claims

legal0 views
In November 2022, open-source developers filed a class-action lawsuit against GitHub, Microsoft, and OpenAI, alleging that Copilot was trained on code from public repositories licensed under open-source licenses like GPL, MIT, and Apache — licenses that explicitly require attribution, license notices, and in the case of GPL, copyleft obligations. Copilot generates code that sometimes reproduces licensed code verbatim, without any attribution or license notice. The developers argued this violates both copyright law and the specific terms of the open-source licenses their code was released under. The federal court dismissed the copyright infringement claims in 2023 for lack of specific examples of copied code, and later dismissed the DMCA claims as well. Only narrow breach-of-contract theories survived. This outcome effectively tells open-source developers: you can release code under a license that requires attribution, a multi-billion-dollar company can ingest that code to train a commercial product that generates code without attribution, and you have no practical legal remedy. For the millions of developers who contribute to open source under the social contract that their license terms will be respected, this is a betrayal of the foundational trust that open-source ecosystems depend on. The structural reason this persists is that open-source licenses were designed for a world of human-to-human code sharing, not machine learning. The GPL requires derivative works to carry the same license — but courts have not determined whether AI training constitutes creating a 'derivative work.' The MIT license requires attribution in copies — but courts have not determined whether AI-generated code that statistically resembles training data constitutes a 'copy.' These licenses assumed that the entity using the code would redistribute it in recognizable form. AI models break this assumption by ingesting code, learning patterns, and generating output that may be functionally identical but not a literal copy. The legal frameworks that sustained open-source collaboration for 30 years have no answer for this new mode of consumption.

Evidence

GitHub Copilot class-action lawsuit details (https://www.saverilawfirm.com/our-cases/github-copilot-intellectual-property-litigation). Copyright claims dismissed for lack of specific copied examples (https://www.pearlcohen.com/copyright-claims-against-github-microsoft-and-openai-largely-dismissed/). Legal analysis of open-source license gaps (https://www.techtarget.com/searchenterpriseai/tip/Examining-the-future-of-AI-and-open-source-software). NYU analysis of licensing issues (https://nyujilp.org/wp-content/uploads/2025/03/Annotation_Germano_Macroed.pdf).

Comments