July 11th, 2025

Extracting Knowledge from Hadith Using Named Entity Recognition and LLMs

Discover how we use LLMs (Large Language Models) to extract structured insights from hadith texts using Named Entity Recognition (NER). Our approach leverages advanced prompt engineering, sentence parsing, and JSON formatting to identify key information such as narrators, Prophet Muhammad’s sayings and actions, relevant people and tribes, emotional tone, and more. Built using Python, Llama-70B, and cosine similarity techniques, this system enables semantic tagging of hadiths and facilitates meaningful search and categorization across large Islamic collections. This pipeline is part of Greentech Apps Foundation’s ongoing R&D in Islamic AI and digital knowledge organization.

Published: 23 October 2024
Author: Mohammad Galib Shams, Nabil Mosharraf Hossain, Riasat Islam

Introduction

At Greentech Apps Foundation (GTAF), we are always looking for new ways to enhance user engagement with Islamic content. In our latest research project, we focused on automating the extraction of structured information from hadith using Named Entity Recognition (NER) powered by Large Language Models (LLMs). The aim is to convert rich hadith texts into structured insights that can be searchable, categorized, and explored contextually.

Problem Statement

We wanted to extract key semantic elements from hadith—such as:

Emotions conveyed by the hadith
Sayings of the Prophet Muhammad (peace be upon him)
Narrators in the chain
Tags and key concepts
Contextual topics
Entities like people, time, tribes, and locations
Prophet’s actions
Possible questions a user might ask that would lead to this hadith

This structured metadata would allow us to build smarter search, recommendation, and educational tools for hadith-based learning.

Our Approach

We developed a Python pipeline leveraging a powerful LLM (Meta’s Llama-3 70B Instruct) to extract this information from raw hadith text. Here’s how it worked:

Prompt Engineering

We crafted a custom prompt template that:

Categorized emotion (Instruction, Motivation, Comforting, Warning, Neutral)
Extracted Prophet Muhammad’s sayings
Identified all narrators in the chain
Isolated time, location, people (excluding the Prophet), and tribe entities
Listed the Prophet’s actions in the hadith
Suggested appropriate tags (based only on the hadith text)
Generated topics, concepts, and questions to understand and apply the hadith

Processing Pipeline

Load & Clean Hadith Text: We preprocess hadith to remove inconsistent characters and format issues.
Send Request to LLM: We construct a JSON prompt using our template and send it to our Azure-hosted Llama-3 70B endpoint.
Parse Response: The LLM returns a structured JSON with all extracted elements.
Save & Log: Results are saved into CSV and logs are prepended using a custom reverse log file handler for better traceability.
Error Handling: If the LLM fails to return valid JSON or if there is a rate limit, the exception is logged and retried up to 3 times.

Evaluation

We evaluated the system on a CSV of hadiths (hadith_collection_new_db.csv).

Output was saved in llama_70B_Al-hadith_collection_new_db.csv
Accuracy in trials was consistently above 90%

Key Innovations

Named Entity Extraction: This goes beyond simple keyword tagging by identifying specific relationships, people, tribes, and settings.
Prophet’s Sayings & Actions: Our model isolates the direct words and deeds of the Prophet (PBUH), a valuable tool for both educators and learners.
Emotion Classification: Adds a layer of affective tagging useful in UX and search design.
Dynamic Questions: Suggested questions improve how we present hadith to users, especially for AI chatbots or quiz tools.

Challenges

Parsing references-only hadiths
Handling null/irrelevant translations
Maintaining output JSON structure from LLM responses
Entity disambiguation (e.g., differentiating a tribe vs. a location)

What’s Next?

Validate across larger hadith datasets
Train smaller, fine-tuned models for offline inference
Integrate this into our Hadith app’s semantic search and recommendations
Enable feedback loops from users to improve tagging accuracy

Conclusion

This project represents a step forward in structuring Islamic knowledge for modern interfaces. By combining LLMs, prompt engineering, and NER, we are unlocking new ways for Muslims around the world to learn from and engage with Hadith in a meaningful and personalised way.

Want to contribute to our Islamic AI efforts? Reach out at https://gtaf.org.

Share the Knowledge