Projects

Selected engineering and research projects. Filter by tech stack to see where specific tools and models were used.

Nepali RAG: Large-Scale Nepali News Retrieval System

Nepali RAG: Large-Scale Nepali News Retrieval System

In Progress
QdrantSentenceTransformerPythonFastAPIRetrieval-Augmented Generation

Nepali RAG is a large-scale data ingestion and retrieval system designed to collect and analyze Nepali-language news from multiple online sources.

The system crawls thousands of articles from major Nepali news outlets and processes them through a structured pipeline including metadata extraction, Bikram Sambat date conversion, deduplication, and unified JSON schema storage.

Articles are chunked and embedded into a vector database to enable semantic search and retrieval over Nepali news content, supporting downstream applications such as question answering, narrative comparison across media outlets, and research on Nepali-language NLP.

LLM-Based Resume Screening and Candidate Ranking

LLM-Based Resume Screening and Candidate Ranking

Completed
LLMSBERTPythonFastAPISemantic EmbeddingsTransformersHugging FacePyTorch

This project implements an end-to-end pipeline for automated resume screening and candidate ranking using large language models and semantic embeddings.

The system parses PDF resumes into structured representations and uses LLM prompting to generate standardized candidate summaries. Job descriptions are encoded using SBERT embeddings and compared with candidate profiles to compute semantic similarity scores.

A composite ranking framework combines embedding similarity, keyword coverage, and experience signals to retrieve and rank the most relevant candidates for a given job description.

Multimodal Hate Meme Detection

Multimodal Hate Meme Detection

Completed
LLMPythonTransformersVision TransformersBERTHugging FacePyTorch

This project explores multimodal hate speech detection by jointly modeling textual and visual signals present in internet memes.

The system combines Vision Transformers for image representation with BERT-based language models for text understanding to perform cross-modal reasoning.

LLM prompting was used to enhance alignment between visual and textual signals, enabling improved classification performance on multimodal hate meme datasets.

Semantic Search Benchmarking with Transformer Embeddings

Semantic Search Benchmarking with Transformer Embeddings

Completed
LDALSAPythonTransformersSBERTHugging FacePyTorch

This project benchmarks traditional topic modeling approaches against modern transformer-based embeddings for semantic document retrieval.

Classical methods including Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) were compared against dense embeddings generated by SBERT.

Experimental results demonstrated that transformer-based embeddings significantly improve retrieval accuracy and semantic relevance compared to earlier approaches such as GloVe and bag-of-words models.