Abdul
Rahman

AI/ML Engineer · PhD Candidate · IEEE VIS Best Paper

I build production LLM systems and the data science pipelines behind them. My work spans RAG pipelines, LLM fine-tuning (LoRA/QLoRA), multi-agent workflows, predictive modeling, and interactive visual analytics, delivering 4× LLM throughput, 26% retrieval gains, and 38% hallucination reduction in production. 10+ peer-reviewed papers, 100+ citations, and prior ML experience at Amazon.

View Work Get in Touch

About

Open to AI/ML & Data Science roles · Available now

Years in ML

10+

Publications

100+

Citations

Best

Paper · IEEE VIS

AI/ML engineer and data scientist completing a PhD in Computer Science at Northern Illinois University (GPA 3.9). My research spans large language models, multimodal learning, predictive modeling, and interactive visual analytics, with publications at IEEE VIS, TVCG, JCDL, and Scientometrics.

Before my PhD, I built data pipelines at Amazon that reduced query latency by 30% across 500K+ daily records. At NIU, I lead the Visual Analytics Lab and build production AI and data science systems — from RAG pipelines (82% precision@5, −38% hallucination rate) and fine-tuned LLMs (LoRA/QLoRA, 4× throughput) to predictive models (94.5 F1, 88 AUC) and A/B-validated deployments. Currently building LLM-driven systems to reshape multi-view data exploration.

Education

Ph.D. Computer Science

Northern Illinois University

2020 – Present

GPA 3.9

M.S. Computer Science

Northern Illinois University

2018 – 2020

GPA 3.9

B.E. Computer Science

Osmania University, Hyderabad

2013 – 2017

Core Stack

ML & Data Science

PyTorch HuggingFace Scikit-learn XGBoost LoRA/QLoRA CLIP SHAP/LIME OpenCV A/B Testing Feature Engineering Statistical Modeling Evidently

LLM / GenAI

LangChain LangGraph AutoGen CrewAI FAISS RAG Ragas/TruLens DeepEval RLHF/DPO

Data & Infrastructure

PostgreSQL MongoDB Snowflake BigQuery Docker Kubernetes FastAPI MLflow AWS Kafka

Languages & Visualization

Python SQL JavaScript R C/C++ Pandas NumPy D3.js Tableau Power BI Streamlit

Experience

Research and industry, with measurable outcomes.

2024 – Present Plano, TX (Remote)

AI/ML Engineer & Data Scientist

Visual Technologies

· Architected RAG pipelines (5K+ daily queries, AWS ECS, p95 < 900ms), improving retrieval from 61% MRR to 82% precision@5; cut hallucination rate 38% via Ragas/TruLens regression suites.
· Fine-tuned Llama-3-8B & Mixtral-8x7B with LoRA/QLoRA (−60% GPU memory, −45% training time); boosted inference throughput 4× via INT4/INT8 quantization and speculative decoding.
· Built A/B-validated predictive pipelines (0.88 R², +20% AUC) with Evidently drift monitoring and Tableau/Power BI dashboards for stakeholder self-service.

2020 – 2024 DeKalb, IL

AI/ML Researcher

Northern Illinois University · DATA Lab, VA Lab & WASTE Lab

· Built multimodal pipelines (1M+ records, 300K+ images) using CLIP, BERTopic, and FAISS; LLM-assisted exploration system cut analyst task time 35% and improved insight accuracy 45% (25+ participants).
· Developed predictive models on 800K+ scholarly records (94.5 F1); benchmarked 5 model families before deploying XGBoost at 88 AUC with 20ms p95 inference.
· Standardized reproducible ML workflows across 3 labs, supporting 10+ peer-reviewed publications and mentoring 5+ researchers.

2021 – 2026 DeKalb, IL

Lab Head, Visual Analytics Lab

Northern Illinois University

· Leading AI-driven visualization research; mentored 6 grad students to 100% completion with 4 co-authored publications.
· Served as PC member at CIKM, WWW, and JCDL.

2018 – 2020 DeKalb, IL

Research Assistant

Northern Illinois University

· Built applied-ML models for research-impact prediction integrating YouTube (180K+ videos), Twitter, and scholarly metadata; contributed to publications in Scientometrics and IEEE TVCG.
· Designed end-to-end pipelines (collection, cleaning, feature engineering, cross-validation) that reduced experiment iteration time ~40% across 4 concurrent projects.

2016 – 2017 Hyderabad, India

Data Analyst

Amazon

· Reduced query latency 25–30% by optimizing SQL and MongoDB pipelines processing 500K+ daily records.
· Automated ETL validation with Python, Pandas, and NumPy, eliminating ~15 hours of weekly manual reconciliation.

2018 – 2023 DeKalb, IL

Teaching Assistant

Northern Illinois University · ML, Algorithms, Databases, C/C++, Java

· Led recitations, designed assignments, and graded for 40 students across 8 semesters in ML, Algorithms, Databases, C/C++, and Java.
· Contributed to a 20% improvement in overall class performance.

Research

Peer-reviewed work in AI, LLMs, data visualization, and scientometrics.

10+

Publications

100+

Citations

In Review

Featured Published Scientometrics 2023

YouTube and Science: Models for Research Impact

Abdul Rahman Shaikh, Hamed Alhoori, M. Sun

Can YouTube videos predict a paper's real-world influence? This work introduces new datasets linking video content to scholarly articles and trains ML models that forecast citation counts and public engagement using altmetrics signals, measuring scientific impact beyond academia.

Paper Code

Published Graphics Interface 2025

iTrace: Interactive Tracing of Cross-View Data Relationships

Abdul Rahman Shaikh, Maoyuan Sun, Xingchen Liu, Hamed Alhoori, David Koop

When dashboards have many linked views, finding connections between distant data points is hard. iTrace introduces smooth focus transitions that guide attention across views, making cross-view relationship tracing faster and less error-prone.

Paper Code

Published IEEE VIS 2022

Toward systematic design considerations of organizing multiple views

Abdul Rahman Shaikh, David Koop, Hamed Alhoori, Maoyuan Sun

How should multiple visualization panels be arranged? This paper reviews dozens of multi-view systems and distills layout principles grounded in perception and content, providing a framework for designing dashboards that help users connect information across views.

Paper

Published TVCG 2021 Best Paper

SightBi: Exploring Cross-View Data Relationships with Biclusters

Maoyuan Sun, Abdul Rahman Shaikh, Hamed Alhoori, Jian Zhao

Exploring linked data across views usually involves tedious trial-and-error. SightBi formalizes cross-view relationships as biclusters and creates dedicated relationship-views that surface hidden connections, turning guesswork into guided exploration. Awarded Best Paper Honorable Mention at IEEE VIS 2021.

Paper Code

Published JCDL 2025

Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts

Mosab Rezaei, Mina Rajaei Moghadam, Abdul Rahman Shaikh, Hamed Alhoori, Reva Freedman

Can you teach an LLM a novelist's writing style with a single token? This work fine-tunes language models to generate 19th-century literary styles using minimal prompts, then evaluates the output with a transformer-based detector and explainable AI analyses.

Paper Code

View all on Google Scholar

Projects

Open-source tools, systems, and experiments.

@sabdulrahman

LLM RAG

LLMFlow: Scholarly Document Summarization & QA

End-to-end pipeline that chunks scholarly PDFs, generates hierarchical summaries via Llama/Ollama and GPT, and supports multi-turn Q&A with source attribution. Evaluated retrieval quality using MRR and NDCG on annotated query sets; reduced reading time for 30+ page papers by ~60%.

Python · LangChain · FastAPI · React

Code

RAG Crawl

Rufus: Intelligent Web Data Extraction for LLMs

AI-powered web crawler that navigates sites, extracts relevant content, and synthesizes it into structured documents optimized for RAG ingestion. Handles JS-rendered pages via async Selenium, cutting knowledge-base creation time by 30%.

Python · Asyncio · Selenium · RAG

Code

Health AI Multimodal

GenHealth: Multimodal Medical Report Analysis

Multimodal AI system fusing clinical text, medical imaging (BLIP-2/CLIP), and structured EHR signals via transformer encoders and domain-specific preprocessing. Improved diagnostic-information extraction F1 by 30% in experimental evaluation.

Python · PyTorch · Transformers · FastAPI

Code

Sandbox Docker

Pexos: Safe Python Execution Sandbox

Secure code execution service for running untrusted Python with syscall restrictions, resource limits, and network isolation. Built for safe LLM code generation evaluation.

Python · Flask · nsjail · Docker

Code

Chatbot GPT

ChatFit: Personalized Fitness Chatbot

Conversational fitness assistant that collects user goals through dialogue and generates personalized workout and diet plans using GPT with structured output parsing.

Python · OpenAI GPT · Streamlit · Flask

Code

Audio Auth

VoxCore: Voice Authentication System

Voice authentication using OpenAI Whisper for transcription and custom PyTorch models for speaker verification. Achieves 89% accuracy in multi-speaker environments with real-time processing.

Python · Whisper · PyTorch

Code

View all repositories on GitHub

Let's build with AI.

iamsabdulrahman@gmail.com

Schedule

Book a call

Social

Abdul
Rahman

About

Education

Core Stack

ML & Data Science

LLM / GenAI

Data & Infrastructure

Languages & Visualization

Experience

AI/ML Engineer & Data Scientist

AI/ML Researcher

Lab Head, Visual Analytics Lab

Research Assistant

Data Analyst

Teaching Assistant

Research

YouTube and Science: Models for Research Impact

iTrace: Interactive Tracing of Cross-View Data Relationships

Toward systematic design considerations of organizing multiple views

SightBi: Exploring Cross-View Data Relationships with Biclusters

Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts

Quantifying the online long-term interest in research

Examining the Representation of Youth in the US Policy Documents through the Lens of Research

Predicting patent citations to measure economic impact of scholarly research

Modeling the Broader Impact of Science and Health Using Social Media

Boundary Blending: Reconsidering the Design of Multi-View Visualizations

Projects

LLMFlow: Scholarly Document Summarization & QA

Rufus: Intelligent Web Data Extraction for LLMs

GenHealth: Multimodal Medical Report Analysis

Pexos: Safe Python Execution Sandbox

ChatFit: Personalized Fitness Chatbot

VoxCore: Voice Authentication System

Let's build with AI.

AbdulRahman

About

Education

Core Stack

ML & Data Science

LLM / GenAI

Data & Infrastructure

Languages & Visualization

Experience

AI/ML Engineer & Data Scientist

AI/ML Researcher

Lab Head, Visual Analytics Lab

Research Assistant

Data Analyst

Teaching Assistant

Research

Projects

LLMFlow: Scholarly Document Summarization & QA

Rufus: Intelligent Web Data Extraction for LLMs

GenHealth: Multimodal Medical Report Analysis

Pexos: Safe Python Execution Sandbox

ChatFit: Personalized Fitness Chatbot

VoxCore: Voice Authentication System

Let's build with AI.

Abdul
Rahman