Ticker

6/recent/ticker-posts

Header Ads Widget

Responsive Advertisement

Generative AI Companies Are Turning to Books to Develop Their Software


With the increasing need for generative AI publishing circles have also begun to negotiate with platforms that provide this technology in an effort to protect authors' rights and conclude contracts with entities concerned with providing these services to generate income from their content.


The major American publishing house "Harper Collins" recently proposed to some of its authors a contract with an artificial intelligence company whose identity remains confidential, allowing this company to use their published works to train its models based on generative artificial intelligence.


In a letter seen by Agence France-Presse, the artificial intelligence company offered $ 2,500 for each book it chooses to train its "LLM" language model for three years.


In order for artificial intelligence programs to be able to produce different types of content based on a simple request in everyday language, they must be fed an increasing amount of data.


After communicating with the publishing house, the latter confirmed its approval of the process. It indicated that "Harper Collins has concluded a contract with a technology company specializing in artificial intelligence to allow limited use of certain books (...) in order to train artificial intelligence models and improve their performance."


The publishing house also explains that the contract "clearly regulates what the models produce while respecting copyright".


This offer has received mixed reviews in the publishing sector, as it was rejected by writers such as the American Daniel Kibblesmith, who said in a post on the social networking platform "Blue Sky", "I would probably accept it for a billion dollars, an amount that would allow me to stop working, because that is the ultimate goal of this technology".


Although "Harper Collins" is one of the major publishing houses that have concluded contracts of this type, it is not the first. American science publisher Wiley has made available to a major technology company “the content of published academic and professional books for specific use in training models, for $23 million,” it said in March when presenting its financial results.


This type of agreement highlights the problems associated with the development of generative AI, which is trained on vast amounts of data collected from the internet, which can lead to copyright infringements.


Jada Pistelli, head of ethics at Hugging Face, a French-American platform specializing in artificial intelligence, sees this announcement as a step forward, because the content of books generates money. But she regrets that the negotiating margin is limited for authors.


“What we will see is a mechanism for bilateral agreements between technology companies and publishing houses or copyright holders, when negotiations should be broader and include stakeholders,” she says.


“We are starting from a very far place,” says Julien Chouraki, legal director of the French Publishing Federation (SNE). “If this is progress, once there is an agreement, it means "There has been a dialogue and there is a desire to achieve a balance in terms of the use of data as a source, which is subject to rights and which will generate money."


In light of these issues, newspaper publishers have also begun to organize on this issue. At the end of 2023, the American daily newspaper "The New York Times" launched lawsuits against "OpenAI", the creator of the "ChatGPT" program, and against "Microsoft", its main investor, for copyright infringement. Other media outlets have concluded agreements with "OpenAI".


Technology companies may no longer have any choice but to adopt options that require them to pay money, especially as new materials to operate the models begin to run out.


The American press has recently indicated that the new models under development seem to have reached their limits, especially the programs of "Google", "Anthropic" and "OpenAI".


Julian Shoraki says, "On the Internet, legal and illegal content can be collected, and large quantities of pirated content, which poses a legal problem. This is without forgetting the issue of data quality.

Post a Comment

0 Comments