logoHome

Extracting Knowledge from Hadith Using Named Entity Recognition and LLMs

Discover how we use LLMs (Large Language Models) to extract structured insights from hadith texts using Named Entity Recognition (NER). Our approach leverages advanced prompt engineering, sentence parsing, and JSON formatting to identify key information such as narrators, Prophet Muhammad’s sayings and actions, relevant people and tribes, emotional tone, and more. Built using Python, Llama-70B, and cosine similarity techniques, this system enables semantic tagging of hadiths and facilitates meaningful search and categorization across large Islamic collections. This pipeline is part of Greentech Apps Foundation’s ongoing R&D in Islamic AI and digital knowledge organization.
Extracting Knowledge from Hadith Using Named Entity Recognition and LLMs

Published: 23 October 2024
Author: Mohammad Galib Shams, Nabil Mosharraf Hossain, Riasat Islam

Introduction

At Greentech Apps Foundation (GTAF), we are always looking for new ways to enhance user engagement with Islamic content. In our latest research project, we focused on automating the extraction of structured information from hadith using Named Entity Recognition (NER) powered by Large Language Models (LLMs). The aim is to convert rich hadith texts into structured insights that can be searchable, categorized, and explored contextually.

Problem Statement

We wanted to extract key semantic elements from hadith—such as:

  • Emotions conveyed by the hadith
  • Sayings of the Prophet Muhammad (peace be upon him)
  • Narrators in the chain
  • Tags and key concepts
  • Contextual topics
  • Entities like people, time, tribes, and locations
  • Prophet’s actions
  • Possible questions a user might ask that would lead to this hadith

This structured metadata would allow us to build smarter search, recommendation, and educational tools for hadith-based learning.

Our Approach

We developed a Python pipeline leveraging a powerful LLM (Meta’s Llama-3 70B Instruct) to extract this information from raw hadith text. Here’s how it worked:

Prompt Engineering

We crafted a custom prompt template that:

  • Categorized emotion (Instruction, Motivation, Comforting, Warning, Neutral)
  • Extracted Prophet Muhammad’s sayings
  • Identified all narrators in the chain
  • Isolated time, location, people (excluding the Prophet), and tribe entities
  • Listed the Prophet’s actions in the hadith
  • Suggested appropriate tags (based only on the hadith text)
  • Generated topics, concepts, and questions to understand and apply the hadith

Processing Pipeline

  1. Load & Clean Hadith Text: We preprocess hadith to remove inconsistent characters and format issues.
  2. Send Request to LLM: We construct a JSON prompt using our template and send it to our Azure-hosted Llama-3 70B endpoint.
  3. Parse Response: The LLM returns a structured JSON with all extracted elements.
  4. Save & Log: Results are saved into CSV and logs are prepended using a custom reverse log file handler for better traceability.
  5. Error Handling: If the LLM fails to return valid JSON or if there is a rate limit, the exception is logged and retried up to 3 times.

Evaluation

We evaluated the system on a CSV of hadiths (hadith_collection_new_db.csv).

  • Output was saved in llama_70B_Al-hadith_collection_new_db.csv
  • Accuracy in trials was consistently above 90%

Key Innovations

  • Named Entity Extraction: This goes beyond simple keyword tagging by identifying specific relationships, people, tribes, and settings.
  • Prophet’s Sayings & Actions: Our model isolates the direct words and deeds of the Prophet (PBUH), a valuable tool for both educators and learners.
  • Emotion Classification: Adds a layer of affective tagging useful in UX and search design.
  • Dynamic Questions: Suggested questions improve how we present hadith to users, especially for AI chatbots or quiz tools.

Challenges

  • Parsing references-only hadiths
  • Handling null/irrelevant translations
  • Maintaining output JSON structure from LLM responses
  • Entity disambiguation (e.g., differentiating a tribe vs. a location)

What’s Next?

  • Validate across larger hadith datasets
  • Train smaller, fine-tuned models for offline inference
  • Integrate this into our Hadith app’s semantic search and recommendations
  • Enable feedback loops from users to improve tagging accuracy

Conclusion

This project represents a step forward in structuring Islamic knowledge for modern interfaces. By combining LLMs, prompt engineering, and NER, we are unlocking new ways for Muslims around the world to learn from and engage with Hadith in a meaningful and personalised way.

Want to contribute to our Islamic AI efforts? Reach out at https://gtaf.org.


Share the Knowledge

Logo
Logo
Logo
Logo
Logo
Explore More Inspiring Reads
Sadaqah Jariah

What is Sadaqah Jariyah? Virtues + 10 Powerful Examples 🌱

Others

|

July 14th, 2025


What is the Day of Ashura

What is the Day of Ashura: History, Significance, Do’s & Don’ts

Islamic Month

|

July 14th, 2025


History of Karbala

The History of Karbala and Martyrdom of Hussain Ibn Ali (R)

Islamic Month

|

July 14th, 2025


Stay Up To Date
Don't miss our latest updates & releases