While ChatGPT deals in words and sentences, Polymathic AI will learn from numerical data. Image Credit: Shutterstock

New York: An international team of scientists have launched a new research collaboration that will leverage the same technology behind ChatGPT to build an AI-powered tool for scientific discovery.

While ChatGPT deals in words and sentences, the new initiative, called Polymathic AI, will learn from numerical data and physics simulations from across scientific fields to aid scientists in modeling everything from supergiant stars to the Earth's climate.

"This will completely change how people use AI and machine learning in science," said Polymathic AI principal investigator Shirley Ho, a group leader at the Flatiron Institute's Center for Computational Astrophysics in New York City, US.

The idea behind Polymathic AI "is similar to how it's easier to learn a new language when you already know five languages," said Ho.

Starting with a large, pre-trained model, known as a foundation model, can be both faster and more accurate than building a scientific model from scratch.

That can be true even if the training data isn't obviously relevant to the problem at hand.

"Polymathic AI can show us commonalities and connections between different fields that might have been missed," said co-investigator Siavash Golkar, a guest researcher at the Flatiron Institute's Center for Computational Astrophysics.

The Polymathic AI team includes experts in physics, astrophysics, mathematics, artificial intelligence and neuroscience.

Polymathic AI's project will learn using data from diverse sources across physics and astrophysics (and eventually fields such as chemistry and genomics, its creators say) and apply that multidisciplinary savvy to a wide range of scientific problems.

ChatGPT has well-known limitations when it comes to accuracy.

Polymathic AI's project will avoid many of those pitfalls, Ho said, by treating numbers as actual numbers, not just characters on the same level as letters and punctuation. The training data will also use real scientific datasets that capture the physics underlying the cosmos.

Transparency and openness are a big part of the project, Ho said. "We want to make everything public. We want to democratise AI for science in such a way that, in a few years, we'll be able to serve a pre-trained model to the community that can help improve scientific analyses across a wide variety of problems and domains."