Using AI to Extract Structured Insights from Seerah Texts

Published: 20 September 2024
Author: Mohammad Galib Shams, Nabil Mosharraf Hossain
At Greentech Apps Foundation (GTAF), one of our research initiatives involves applying cutting-edge AI models to classical Islamic texts, including the Seerah (biography of the Prophet Muhammad ﷺ). Our recent project focused on automatically extracting structured insights—such as people, places, events, and dates—from raw Seerah paragraphs.
The Challenge
The Seerah is rich with historical events, characters, and locations. However, much of this information is buried in long, narrative paragraphs, making it difficult to analyze or search programmatically. Our goal was to convert unstructured text into structured data to enable better study, visualisation, and linking across related events.
Our Approach
We used the LLaMA 3 70B model from Groq through the LlamaIndex framework to generate high-confidence keywords from Seerah passages. Here’s a high-level overview of how it works:
- Prompt Engineering: We crafted detailed prompts that guide the model to extract keywords by category: Person, Location, Event, Group, and Date. For each event, the model also links the keyword back to the exact part of the paragraph it was derived from, and assigns a confidence score.
- JSON Output: The model is instructed to return a structured JSON output, which allows us to save and query the extracted data easily.
- Keyword Mapping: We plan to use this structured data to map events chronologically and thematically—helping build a knowledge graph for the Seerah.
- Automation: The project includes code that runs inference on paragraphs, stores results in a database, and periodically checks for weekly summaries to send to stakeholders.
Example
From a passage describing the Quraysh’s reaction to the Prophet’s migration, the model successfully extracted:
- Persons: ‘Ali, Abu Bakr, Asma’
- Locations: Al-Ka’bah, Makkah
- Events: Beating of ‘Ali, interrogation of ‘Ali, assault on Asma’
Each item is tagged with confidence scores (e.g., 0.98) and anchored back to the paragraph for verification.
Potential Applications
- Enhanced Seerah Visualisation: Timeline builders, maps, and character networks.
- Cross-Linking with Hadith: Match extracted people or events with related hadith entries.
- Educational Use: Build interactive learning experiences with tagged Seerah content.
What’s Next?
This project is still in its early stages. We plan to fine-tune prompts, evaluate accuracy more rigorously, and eventually publish a dataset and research paper based on our findings. The long-term vision is to support a new way of engaging with Islamic history through intelligent, structured technologies.