Abdul
Rahman
AI/ML Engineer · PhD Candidate · IEEE VIS Best Paper
I build production LLM systems and the data science pipelines behind them. My work spans RAG pipelines, LLM fine-tuning (LoRA/QLoRA), multi-agent workflows, predictive modeling, and interactive visual analytics, delivering 4× LLM throughput, 26% retrieval gains, and 38% hallucination reduction in production. 10+ peer-reviewed papers, 100+ citations, and prior ML experience at Amazon.
Years in ML
Publications
Citations
Paper · IEEE VIS
AI/ML engineer and data scientist completing a PhD in Computer Science at Northern Illinois University (GPA 3.9). My research spans large language models, multimodal learning, predictive modeling, and interactive visual analytics, with publications at IEEE VIS, TVCG, JCDL, and Scientometrics.
Before my PhD, I built data pipelines at Amazon that reduced query latency by 30% across 500K+ daily records. At NIU, I lead the Visual Analytics Lab and build production AI and data science systems — from RAG pipelines (82% precision@5, −38% hallucination rate) and fine-tuned LLMs (LoRA/QLoRA, 4× throughput) to predictive models (94.5 F1, 88 AUC) and A/B-validated deployments. Currently building LLM-driven systems to reshape multi-view data exploration.
Education
Ph.D. Computer Science
Northern Illinois University
2020 – Present
GPA 3.9
M.S. Computer Science
Northern Illinois University
2018 – 2020
GPA 3.9
B.E. Computer Science
Osmania University, Hyderabad
2013 – 2017
Core Stack
ML & Data Science
LLM / GenAI
Data & Infrastructure
Languages & Visualization
Experience
Research and industry, with measurable outcomes.
AI/ML Engineer & Data Scientist
Visual Technologies
- · Architected RAG pipelines (5K+ daily queries, AWS ECS, p95 < 900ms), improving retrieval from 61% MRR to 82% precision@5; cut hallucination rate 38% via Ragas/TruLens regression suites.
- · Fine-tuned Llama-3-8B & Mixtral-8x7B with LoRA/QLoRA (−60% GPU memory, −45% training time); boosted inference throughput 4× via INT4/INT8 quantization and speculative decoding.
- · Built A/B-validated predictive pipelines (0.88 R², +20% AUC) with Evidently drift monitoring and Tableau/Power BI dashboards for stakeholder self-service.
AI/ML Researcher
Northern Illinois University · DATA Lab, VA Lab & WASTE Lab
- · Built multimodal pipelines (1M+ records, 300K+ images) using CLIP, BERTopic, and FAISS; LLM-assisted exploration system cut analyst task time 35% and improved insight accuracy 45% (25+ participants).
- · Developed predictive models on 800K+ scholarly records (94.5 F1); benchmarked 5 model families before deploying XGBoost at 88 AUC with 20ms p95 inference.
- · Standardized reproducible ML workflows across 3 labs, supporting 10+ peer-reviewed publications and mentoring 5+ researchers.
Lab Head, Visual Analytics Lab
Northern Illinois University
- · Leading AI-driven visualization research; mentored 6 grad students to 100% completion with 4 co-authored publications.
- · Served as PC member at CIKM, WWW, and JCDL.
Research Assistant
Northern Illinois University
- · Built applied-ML models for research-impact prediction integrating YouTube (180K+ videos), Twitter, and scholarly metadata; contributed to publications in Scientometrics and IEEE TVCG.
- · Designed end-to-end pipelines (collection, cleaning, feature engineering, cross-validation) that reduced experiment iteration time ~40% across 4 concurrent projects.
Data Analyst
Amazon
- · Reduced query latency 25–30% by optimizing SQL and MongoDB pipelines processing 500K+ daily records.
- · Automated ETL validation with Python, Pandas, and NumPy, eliminating ~15 hours of weekly manual reconciliation.
Teaching Assistant
Northern Illinois University · ML, Algorithms, Databases, C/C++, Java
- · Led recitations, designed assignments, and graded for 40 students across 8 semesters in ML, Algorithms, Databases, C/C++, and Java.
- · Contributed to a 20% improvement in overall class performance.
Research
Peer-reviewed work in AI, LLMs, data visualization, and scientometrics.
Publications
Citations
In Review
YouTube and Science: Models for Research Impact
Abdul Rahman Shaikh, Hamed Alhoori, M. Sun
Can YouTube videos predict a paper's real-world influence? This work introduces new datasets linking video content to scholarly articles and trains ML models that forecast citation counts and public engagement using altmetrics signals, measuring scientific impact beyond academia.
iTrace: Interactive Tracing of Cross-View Data Relationships
Abdul Rahman Shaikh, Maoyuan Sun, Xingchen Liu, Hamed Alhoori, David Koop
When dashboards have many linked views, finding connections between distant data points is hard. iTrace introduces smooth focus transitions that guide attention across views, making cross-view relationship tracing faster and less error-prone.
Toward systematic design considerations of organizing multiple views
Abdul Rahman Shaikh, David Koop, Hamed Alhoori, Maoyuan Sun
How should multiple visualization panels be arranged? This paper reviews dozens of multi-view systems and distills layout principles grounded in perception and content, providing a framework for designing dashboards that help users connect information across views.
SightBi: Exploring Cross-View Data Relationships with Biclusters
Maoyuan Sun, Abdul Rahman Shaikh, Hamed Alhoori, Jian Zhao
Exploring linked data across views usually involves tedious trial-and-error. SightBi formalizes cross-view relationships as biclusters and creates dedicated relationship-views that surface hidden connections, turning guesswork into guided exploration. Awarded Best Paper Honorable Mention at IEEE VIS 2021.
Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts
Mosab Rezaei, Mina Rajaei Moghadam, Abdul Rahman Shaikh, Hamed Alhoori, Reva Freedman
Can you teach an LLM a novelist's writing style with a single token? This work fine-tunes language models to generate 19th-century literary styles using minimal prompts, then evaluates the output with a transformer-based detector and explainable AI analyses.
Projects
Open-source tools, systems, and experiments.
LLMFlow: Scholarly Document Summarization & QA
End-to-end pipeline that chunks scholarly PDFs, generates hierarchical summaries via Llama/Ollama and GPT, and supports multi-turn Q&A with source attribution. Evaluated retrieval quality using MRR and NDCG on annotated query sets; reduced reading time for 30+ page papers by ~60%.
Rufus: Intelligent Web Data Extraction for LLMs
AI-powered web crawler that navigates sites, extracts relevant content, and synthesizes it into structured documents optimized for RAG ingestion. Handles JS-rendered pages via async Selenium, cutting knowledge-base creation time by 30%.
GenHealth: Multimodal Medical Report Analysis
Multimodal AI system fusing clinical text, medical imaging (BLIP-2/CLIP), and structured EHR signals via transformer encoders and domain-specific preprocessing. Improved diagnostic-information extraction F1 by 30% in experimental evaluation.
Pexos: Safe Python Execution Sandbox
Secure code execution service for running untrusted Python with syscall restrictions, resource limits, and network isolation. Built for safe LLM code generation evaluation.
ChatFit: Personalized Fitness Chatbot
Conversational fitness assistant that collects user goals through dialogue and generates personalized workout and diet plans using GPT with structured output parsing.
VoxCore: Voice Authentication System
Voice authentication using OpenAI Whisper for transcription and custom PyTorch models for speaker verification. Achieves 89% accuracy in multi-speaker environments with real-time processing.
Let's build with AI.
© Abdul Rahman Shaikh 2026 · Open to full-time AI/ML & Data Science roles & research collaboration