Data as a resource

Large datasets are increasingly widespread and valuable to researchers in the energy sector. Nature Energy has a dedicated article format — the Resource article — for their dissemination.

The generation of large datasets in scientific research has rapidly increased in recent years, driven by the opportunities offered by big data to revolutionize knowledge production, as well as advancements in high-throughput techniques. These datasets enable the investigation of scientific questions and help fill research gaps. They are also crucial for fully utilizing artificial intelligence-based methods that could accelerate scientific discoveries. As such, these large datasets constitute critical assets for future research, and it is important that they are disseminated similarly to conventional research outputs.

Over the last few years, Nature Energy has started publishing valuable datasets as Resource articles. We have observed a notable increase in submissions of papers that qualify for this article type. Therefore, we would like to revisit the format by discussing two Resource papers we’ve published on our pages.

A Resource should present a large and newly generated dataset, a new data platform, or a library of broad utility, interest, and significance to the community. The format is structured like a research article, and it should explain the utility of the resource. While the key scientific value and novelty lie in the resource or the methods, the manuscript should ideally include demonstrations of novel insights that can be derived from these resources.

An important aspect of Resource articles is their reusability. As such, data, algorithms and codes underlying the methodological frameworks should be made available to the research community. Our preferred approach for sharing data, algorithms and codes is via public repositories, as outlined in our policy (https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards). The articles should describe in detail the methods and how the data were acquired.

As anticipated above, it is increasingly important that resources are machine-readable, to take advantage of artificial intelligence-based methods. For instance, it is essential to include full metadata in the datasets. Guidelines on how to make data discoverable and usable by machines — known as the findable, accessible, interoperable, and reusable, or FAIR, principles — have been outlined by researchers (M. Wilkinson et al. Sci. Data 3, 160018; 2016).

A study that classifies as a Resource article can take on different forms. Below, we use two examples to illustrate the variety in how data are generated, disseminated, and eventually used to bridge knowledge gaps in different energy fields.

One type of Resource presents a collection of data from the literature in a database equipped with analytical and visualization functions, as exemplified in the Resource by Jacobsson et al. (Nat. Energy 7, 107–115; 2022). The research team collected data related to perovskite solar cells from over 15,000 peer-reviewed publications and made them available in an open-access database with graphical tools for analysing, filtering, and visualizing the data. Such a large collection of data is useful to gain a clearer understanding of how modifications to the device and material design and processing impact the performance of solar cells in a statistically significant way. It could help identify trends that are not obvious from the analysis of just a few research studies.

Another type of Resource describes approaches to generate new data with appropriate resolution from existing datasets to widen their usage, as shown by Buster et al. in a Resource included in this issue. The researchers use generative machine learning to generate a high-resolution meteorological dataset from the coarse data outputs of climate models. The newly generated dataset could then be fed into energy system models that require high-resolution input data, aiding in understanding how climate-induced changes impact system cost and reliability during extreme weather events.

These two examples have not yet covered another type of Resource where a large amount of data is generated from scratch for analysing energy systems. We also appreciate that some studies could fall between formats, leaving authors unsure about the best submission format. Regardless of the article type chosen at the initial submission stage, we, editors, will collaborate closely with authors to determine the most appropriate format that effectively highlights the key merit of their findings. We anticipate seeing more studies in this space that delve deeply into both existing and uncharted scientific territory, thereby advancing our understanding and application of energy systems.