Expert Background
- AI researcher, started in speech cognition deep learning at Baidu
- Joined Meta 6 years ago, worked on AI <> medical engineering, speech work. Started to focus on AI <> chemistry about 4 years ago. Work centered around using AI <> chemistry for climate change work, discovering new materials for carbon capture
- Gen AI is really large in text, video, etc. Think in next few years, it will be really big in the sciences. Seeing models emerge for proteins but see more opportunities for materials in the next few years
Excitement about chemistry use cases
- These models work well when you have good data sets, there is inherent structure to data, and you can evaluate it well. We are seeing data ecosystem solidify; we are also generating our own data set for chemistry
- Chemistry is part of FAIR’s scope; we’re interested in advancing AI. When we started, we thought science was an underexplored area, and maybe it could help with broader AI research in general. Another part of the motivation is that we’re really focused on climate applications.
- Carbon capture is a significant area of focus. DAC is big as well as looking for catalysts for storage use cases. Catalysis was the first area we started in
- We’ve been told that DAC will be a big part of the carbon credits that we buy
- Challenges
- Availability of data set. The data is too small, particularly experimental data. Also very inconsistent and noisy
- There’s a MOF dataset called Core MOF, but 40% of the data is problematic. There’s a gig effort going on to clean that type of data. You also need it across modalities
- Evaluation is challenging as well — you can go through DFT for some, but really need a step where you close the gap between what you do in theory and in practice. So e.g., if you have an automated lab
- A lot of this is done in different academic labs, with people not really talking to each other. If you can get people together, you can accomplish a lot
MOFs
- We’re looking at MOFs, but we’re much more focused on AI. We think existing data sets are not big or accurate enough, so we’re creating our own calculations for DFTs
- All of this was done in connection with Georgia Tech academic lab. Initial part of project was in partnership with them. We’ve made a few partnerships with a few startups
Partnerships
- We’re not going to build our own systems for DAC, so Nomad partnership there makes sense
- Cusp has complementary computational part of it. Berend joined Cusp and has a Prisma platform, which is a lifecycle analysis platform. One of the reasons we’re doing joint research is so we can bring in our accurate calculations and combine it with the lifecycle analysis
- We are open sourcing and publishing everything we’ve done so far
DAC