| Home > Publications database > Large-scale extraction and annotation of quantitative information on energy technologies from scientific literature |
| Abstract | FZJ-2026-02437 |
; ; ; ;
2026
Abstract: Systematic literature reviews are fundamental to energy system analysis, yet are often timeconsuming,incomplete, and inconsistent. While manually curated datasets provide valuablestructured information for specific subdomains of energy research [1, 2], extending such effortsto the entire field remains challenging. At the same time, easily accessible and extensiblequantitative evidence would substantially benefit the research community.In this contribution, we present a large-scale, automatically compiled dataset of quantitativeinformation extracted from 15 years of energy systems literature using Quinex, an LLM-basedinformation extraction tool [3]. Quinex identifies quantitative statements and transforms theminto structured data containing numerical values, units, quantified properties, entities, and contextualmetadata such as spatial and temporal scope. The literature corpus was compiled usingadvanced searches in Scopus and Web of Science, covering a broad range of keywords. Itcomprises approximately 76,000 abstracts, of which around 31,000 include full texts.Applying Quinex to this corpus yielded roughly three million quantitative datapoints. As the toolis domain-agnostic, the extracted information includes values unrelated to energy systems. Toenable meaningful analysis, we implemented a filtering and normalization workflow based onregular expressions, resulting in a dataset tailored to energy system research.A preliminary analysis demonstrates the dataset’s potential applications. Photovoltaic and windtechnologies constitute the largest share, with cost and efficiency being the most frequentlyreported properties. The distribution of technologies exhibits strong regional patterns, reflectingdifferences in research focus across countries. Normalized data and metadata further enabletemporal analyses, revealing trends in key techno-economic parameters such as efficiency,lifetime, and capacity factor.The processed data are made available through an interactive dashboard that allows users tofilter, visualize, and download customized subsets. Future work will map extracted metadatato the Open Energy Ontology [4] and integrate the dataset into a collaborative infrastructure tosupport community-driven data sharing.
|
The record appears in these collections: |