{{PAPER_TITLE}} - Blog Post

Abstract

Metadata plays a critical role in indexing, documenting, and analyzing scientific literature, yet extracting it accurately and efficiently remains a challenging task. Traditional approaches often rely on rule-based or task-specific models, which struggle to generalize across domains and schema variations. In this paper, we present MeXtract, a family of lightweight language models designed for metadata extraction from scientific papers. The models, ranging from 0.5B to 3B parameters, are built by fine-tuning Qwen 2.5 counterparts. In their size family, MeXtract achieves state-of-the-art performance on metadata extraction on the MOLE benchmark. To further support evaluation, we extend the MOLE benchmark to incorporate model-specific metadata, providing an out-of-domain challenging subset. Our experiments show that fine-tuning on a given schema not only yields high accuracy but also transfers effectively to unseen schemas, demonstrating the robustness and adaptability of our approach. We release all the code, datasets, and models openly for the research community..

🚀 Introduction

MeXtract is a set of light-weight models for metadata extraction from scientific papers. The models were created by finetuning 0.5B, 1.5B, and 3B versions of Qwen2.5. The SFT stage was followed by a preference optimization stage. We evaluate the models on the MOLE+ benchmark, and we achieve state-of-the-art results with repsect similar models in the literature.

📄 Example

Here is an example of extracted metadata from a given sample paper. Note this example is smilplified, the actual paper is more complex and contain more pages.

📚 Models

We finetune 3 different size models for metadata extraction. You can find the models on Hugging Face 🤗.

📊 Results

Our models achieve state of the art results on the MOLE+ benchmark compared to models similar in size.

Model	ar	en	jp	fr	ru	multi	model	Average
Falcon3 3B Instruct	20.46	16.30	20.29	17.81	17.23	16.13	15.96	17.74
Llama3.2 3B Instruct	28.77	25.17	33.14	27.73	22.21	22.58	33.37	27.57
Gemma 3 4B It	44.88	46.50	48.46	43.85	46.06	42.05	56.04	46.83
Qwen2.5 3B Instruct	49.99	56.72	61.13	57.08	64.10	52.07	59.05	57.16
MOLE 3B	23.03	50.88	50.83	50.05	57.72	43.34	17.17	41.86
Nuextract 2.0 4B	44.61	43.57	43.82	48.96	47.78	40.14	49.90	45.54
Nuextract 2.0 8B	51.93	58.93	62.11	58.41	63.21	38.21	53.70	55.21
MeXtract 0.5B	65.96	69.95	73.79	68.42	72.07	68.20	32.41	64.40
MeXtract 1.5B	67.06	73.71	75.08	71.57	76.28	71.87	52.05	69.66
MeXtract 3B	70.81	78.02	78.32	72.87	77.51	74.92	60.18	73.23

📝 Citation

If you find this work useful, please cite it as follows:

@misc{mextract,
        title={MeXtract: Light-Weight Metadata Extraction from Scientific Papers}, 
        author={Zaid Alyafeai and Maged S. Al-Shaibani and Bernard Ghanem},
        year={2025},
        eprint={2510.06889},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2510.06889}, 
}

📑 References

1. Alyafeai, Zaid, et al. "Masader: Metadata sourcing for arabic text and speech data resources." arXiv preprint arXiv:2110.06744 (2021).