Abstract FZJ-2026-02437

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Large-scale extraction and annotation of quantitative information on energy technologies from scientific literature

 ;  ;  ;  ;

2026

HMC Konferenz 2026, HeidelbergHeidelberg, Germany, 28 Apr 2026 - 30 Apr 20262026-04-282026-04-30

Abstract: Systematic literature reviews are fundamental to energy system analysis, yet are often timeconsuming,incomplete, and inconsistent. While manually curated datasets provide valuablestructured information for specific subdomains of energy research [1, 2], extending such effortsto the entire field remains challenging. At the same time, easily accessible and extensiblequantitative evidence would substantially benefit the research community.In this contribution, we present a large-scale, automatically compiled dataset of quantitativeinformation extracted from 15 years of energy systems literature using Quinex, an LLM-basedinformation extraction tool [3]. Quinex identifies quantitative statements and transforms theminto structured data containing numerical values, units, quantified properties, entities, and contextualmetadata such as spatial and temporal scope. The literature corpus was compiled usingadvanced searches in Scopus and Web of Science, covering a broad range of keywords. Itcomprises approximately 76,000 abstracts, of which around 31,000 include full texts.Applying Quinex to this corpus yielded roughly three million quantitative datapoints. As the toolis domain-agnostic, the extracted information includes values unrelated to energy systems. Toenable meaningful analysis, we implemented a filtering and normalization workflow based onregular expressions, resulting in a dataset tailored to energy system research.A preliminary analysis demonstrates the dataset’s potential applications. Photovoltaic and windtechnologies constitute the largest share, with cost and efficiency being the most frequentlyreported properties. The distribution of technologies exhibits strong regional patterns, reflectingdifferences in research focus across countries. Normalized data and metadata further enabletemporal analyses, revealing trends in key techno-economic parameters such as efficiency,lifetime, and capacity factor.The processed data are made available through an interactive dashboard that allows users tofilter, visualize, and download customized subsets. Future work will map extracted metadatato the Open Energy Ontology [4] and integrate the dataset into a collaborative infrastructure tosupport community-driven data sharing.


Contributing Institute(s):
  1. Jülicher Systemanalyse (ICE-2)
Research Program(s):
  1. 1111 - Effective System Transformation Pathways (POF4-111) (POF4-111)
  2. 1112 - Societally Feasible Transformation Pathways (POF4-111) (POF4-111)

Appears in the scientific report 2026
Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Abstracts
Institute Collections > ICE > ICE-2
Workflow collections > Public records
Publications database

 Record created 2026-05-07, last modified 2026-05-07



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)