A beach
September 20, 2024

New data science tool greatly speeds up molecular analysis of our environment

UC Riverside-led team developed the tool through an international virtual research group

Author: Iqbal Pittalwala
September 20, 2024

A research team led by scientists at the University of California, Riverside, has developed a computational workflow for analyzing large data sets in the field of metabolomics, the study of small molecules found within cells, biofluids, tissues, and entire ecosystems.

Most recently, the team applied this new computational tool to analyze pollutants in seawater in Southern California. The team swiftly captured the chemical profiles of coastal environments and highlighted potential sources of pollution.

Daniel Petras. (UCR/Petras lab)

“We are interested in understanding how such pollutants get introduced in the ecosystem,” said Daniel Petras, an assistant professor of biochemistry at UC Riverside, who led the research team. “Figuring out which molecules in the ocean are important for environmental health is not straightforward because of the ocean’s sheer chemical diversity. The protocol we developed greatly speeds up this process. More efficient sorting of the data means we can understand problems related to ocean pollution faster.”

Petras and his colleagues report in the journal Nature Protocols that their protocol is designed not only for experienced researchers but also for educational purposes, making it an ideal resource for students and early-career scientists. This computational workflow is accompanied by an accessible web application with a graphical user interface that makes metabolomics data analysis accessible for non-experts and enables them to gain statistical insights into their data within minutes. 

“This tool is accessible to a broad range of researchers, from absolute beginners to experts, and is tailored for use in conjunction with the molecular networking software my group is developing,” said coauthor Mingxun Wang, an assistant professor of computer science and engineering at UCR. “For beginners, the guidelines and code we provide make it easier to understand common data processing and analysis steps. For experts, it accelerates reproducible data analysis, enabling them to share their statistical data analysis workflows and results.”

Petras explained the research paper is unique, serving as a large educational resource organized through a virtual research group called Virtual Multiomics Lab, or VMOL. With more than 50 scientists participating from around the world, VMOL is a community-driven, open-access community. It aims to simplify and democratize the chemical analysis process, making it accessible to researchers worldwide, regardless of their background or resources.

Photo shows Abzer Pakkir Shah (left) and Paolo Stincone, coauthors on the research paper. (UCR/Petras lab)

“I’m incredibly proud to see how this project evolved into something impactful, involving experts and students from across the globe,” said Abzer Pakkir Shah, a doctoral student in Petras’ group and the first author of the paper. “By removing physical and economic barriers, VMOL provides training in computational mass spectrometry and data science and aims to launch virtual research projects as a new form of collaborative science.”

All software the team developed is free and publicly available. The software development was initiated during a summer school for non-targeted metabolomics in 2022 at the University of Tübingen, where the team also launched VMOL.

Petras expects the protocol will be especially useful to environmental researchers as well as scientists working in the biomedical field and researchers doing clinical studies in microbiome science.

“The versatility of our protocol extends to a wide range of fields and sample types, including combinatorial chemistry, doping analysis, and trace contamination of food, pharmaceuticals, and other industrial products,” he said.

Petras received his master’s degree in biotechnology from the University of Applied Science Darmstadt and his doctoral degree in biochemistry from the Technical University Berlin. He did postdoctoral research at UC San Diego, where he focused on the development of large-scale environmental metabolomics methods. In 2021, he launched the Functional Metabolomics Lab at the University of Tübingen. In January 2024 he joined UCR, where his lab focuses on the development and application of mass spectrometry-based methods to visualize and assess chemical exchange within microbial communities.

The title of the paper is “Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data.” 

Media Contacts