Abdul
Rahman

AI/ML Engineer · PhD Candidate · IEEE VIS Best Paper

I build production LLM systems and the data science pipelines behind them. My work spans RAG pipelines, LLM fine-tuning (LoRA/QLoRA), multi-agent workflows, predictive modeling, and interactive visual analytics, delivering 4× LLM throughput, 26% retrieval gains, and 38% hallucination reduction in production. 10+ peer-reviewed papers, 100+ citations, and prior ML experience at Amazon.

Abdul Rahman Shaikh, AI/ML Engineer and Researcher
8+

Years in ML

10+

Publications

100+

Citations

Best

Paper · IEEE VIS

AI/ML engineer and data scientist completing a PhD in Computer Science at Northern Illinois University (GPA 3.9). My research spans large language models, multimodal learning, predictive modeling, and interactive visual analytics, with publications at IEEE VIS, TVCG, JCDL, and Scientometrics.

Before my PhD, I built data pipelines at Amazon that reduced query latency by 30% across 500K+ daily records. At NIU, I lead the Visual Analytics Lab and build production AI and data science systems — from RAG pipelines (82% precision@5, −38% hallucination rate) and fine-tuned LLMs (LoRA/QLoRA, 4× throughput) to predictive models (94.5 F1, 88 AUC) and A/B-validated deployments. Currently building LLM-driven systems to reshape multi-view data exploration.

Education

Ph.D. Computer Science

Northern Illinois University

2020 – Present

GPA 3.9

M.S. Computer Science

Northern Illinois University

2018 – 2020

GPA 3.9

B.E. Computer Science

Osmania University, Hyderabad

2013 – 2017

Core Stack

ML & Data Science

PyTorch HuggingFace Scikit-learn XGBoost LoRA/QLoRA CLIP SHAP/LIME OpenCV A/B Testing Feature Engineering Statistical Modeling Evidently

LLM / GenAI

LangChain LangGraph AutoGen CrewAI FAISS RAG Ragas/TruLens DeepEval RLHF/DPO

Data & Infrastructure

PostgreSQL MongoDB Snowflake BigQuery Docker Kubernetes FastAPI MLflow AWS Kafka

Languages & Visualization

Python SQL JavaScript R C/C++ Pandas NumPy D3.js Tableau Power BI Streamlit

Experience

Research and industry, with measurable outcomes.

2024 – Present Plano, TX (Remote)

AI/ML Engineer & Data Scientist

Visual Technologies

  • · Architected RAG pipelines (5K+ daily queries, AWS ECS, p95 < 900ms), improving retrieval from 61% MRR to 82% precision@5; cut hallucination rate 38% via Ragas/TruLens regression suites.
  • · Fine-tuned Llama-3-8B & Mixtral-8x7B with LoRA/QLoRA (−60% GPU memory, −45% training time); boosted inference throughput 4× via INT4/INT8 quantization and speculative decoding.
  • · Built A/B-validated predictive pipelines (0.88 R², +20% AUC) with Evidently drift monitoring and Tableau/Power BI dashboards for stakeholder self-service.
4× LLM throughput −38% hallucinations +26% precision@5 5K+ daily queries 88 AUC · 0.88 R²
2020 – 2024 DeKalb, IL

AI/ML Researcher

Northern Illinois University · DATA Lab, VA Lab & WASTE Lab

  • · Built multimodal pipelines (1M+ records, 300K+ images) using CLIP, BERTopic, and FAISS; LLM-assisted exploration system cut analyst task time 35% and improved insight accuracy 45% (25+ participants).
  • · Developed predictive models on 800K+ scholarly records (94.5 F1); benchmarked 5 model families before deploying XGBoost at 88 AUC with 20ms p95 inference.
  • · Standardized reproducible ML workflows across 3 labs, supporting 10+ peer-reviewed publications and mentoring 5+ researchers.
1M+ records 94.5 F1 · 88 AUC −35% task time 10+ publications
2021 – 2026 DeKalb, IL

Lab Head, Visual Analytics Lab

Northern Illinois University

  • · Leading AI-driven visualization research; mentored 6 grad students to 100% completion with 4 co-authored publications.
  • · Served as PC member at CIKM, WWW, and JCDL.
6 mentored 4 papers LLMs · ML · User Studies
2018 – 2020 DeKalb, IL

Research Assistant

Northern Illinois University

  • · Built applied-ML models for research-impact prediction integrating YouTube (180K+ videos), Twitter, and scholarly metadata; contributed to publications in Scientometrics and IEEE TVCG.
  • · Designed end-to-end pipelines (collection, cleaning, feature engineering, cross-validation) that reduced experiment iteration time ~40% across 4 concurrent projects.
180K+ videos −40% iteration time Python · Scikit-learn · APIs
2016 – 2017 Hyderabad, India

Data Analyst

Amazon

  • · Reduced query latency 25–30% by optimizing SQL and MongoDB pipelines processing 500K+ daily records.
  • · Automated ETL validation with Python, Pandas, and NumPy, eliminating ~15 hours of weekly manual reconciliation.
−30% latency 500K+ daily SQL · Python · Pandas
2018 – 2023 DeKalb, IL

Teaching Assistant

Northern Illinois University · ML, Algorithms, Databases, C/C++, Java

  • · Led recitations, designed assignments, and graded for 40 students across 8 semesters in ML, Algorithms, Databases, C/C++, and Java.
  • · Contributed to a 20% improvement in overall class performance.
40 students · 8 sems +20% perf

Research

Peer-reviewed work in AI, LLMs, data visualization, and scientometrics.

Featured Published Scientometrics 2023

YouTube and Science: Models for Research Impact

Abdul Rahman Shaikh, Hamed Alhoori, M. Sun

Can YouTube videos predict a paper's real-world influence? This work introduces new datasets linking video content to scholarly articles and trains ML models that forecast citation counts and public engagement using altmetrics signals, measuring scientific impact beyond academia.

Published Graphics Interface 2025

iTrace: Interactive Tracing of Cross-View Data Relationships

Abdul Rahman Shaikh, Maoyuan Sun, Xingchen Liu, Hamed Alhoori, David Koop

When dashboards have many linked views, finding connections between distant data points is hard. iTrace introduces smooth focus transitions that guide attention across views, making cross-view relationship tracing faster and less error-prone.

Published IEEE VIS 2022

Toward systematic design considerations of organizing multiple views

Abdul Rahman Shaikh, David Koop, Hamed Alhoori, Maoyuan Sun

How should multiple visualization panels be arranged? This paper reviews dozens of multi-view systems and distills layout principles grounded in perception and content, providing a framework for designing dashboards that help users connect information across views.

Published TVCG 2021 Best Paper

SightBi: Exploring Cross-View Data Relationships with Biclusters

Maoyuan Sun, Abdul Rahman Shaikh, Hamed Alhoori, Jian Zhao

Exploring linked data across views usually involves tedious trial-and-error. SightBi formalizes cross-view relationships as biclusters and creates dedicated relationship-views that surface hidden connections, turning guesswork into guided exploration. Awarded Best Paper Honorable Mention at IEEE VIS 2021.

Published JCDL 2025

Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts

Mosab Rezaei, Mina Rajaei Moghadam, Abdul Rahman Shaikh, Hamed Alhoori, Reva Freedman

Can you teach an LLM a novelist's writing style with a single token? This work fine-tunes language models to generate 19th-century literary styles using minimal prompts, then evaluates the output with a transformer-based detector and explainable AI analyses.

Projects

Open-source tools, systems, and experiments.

@sabdulrahman
LLM RAG

LLMFlow: Scholarly Document Summarization & QA

End-to-end pipeline that chunks scholarly PDFs, generates hierarchical summaries via Llama/Ollama and GPT, and supports multi-turn Q&A with source attribution. Evaluated retrieval quality using MRR and NDCG on annotated query sets; reduced reading time for 30+ page papers by ~60%.

Python · LangChain · FastAPI · React
Code
RAG Crawl

Rufus: Intelligent Web Data Extraction for LLMs

AI-powered web crawler that navigates sites, extracts relevant content, and synthesizes it into structured documents optimized for RAG ingestion. Handles JS-rendered pages via async Selenium, cutting knowledge-base creation time by 30%.

Python · Asyncio · Selenium · RAG
Code
Health AI Multimodal

GenHealth: Multimodal Medical Report Analysis

Multimodal AI system fusing clinical text, medical imaging (BLIP-2/CLIP), and structured EHR signals via transformer encoders and domain-specific preprocessing. Improved diagnostic-information extraction F1 by 30% in experimental evaluation.

Python · PyTorch · Transformers · FastAPI
Code
Sandbox Docker

Pexos: Safe Python Execution Sandbox

Secure code execution service for running untrusted Python with syscall restrictions, resource limits, and network isolation. Built for safe LLM code generation evaluation.

Python · Flask · nsjail · Docker
Code
Chatbot GPT

ChatFit: Personalized Fitness Chatbot

Conversational fitness assistant that collects user goals through dialogue and generates personalized workout and diet plans using GPT with structured output parsing.

Python · OpenAI GPT · Streamlit · Flask
Code
Audio Auth

VoxCore: Voice Authentication System

Voice authentication using OpenAI Whisper for transcription and custom PyTorch models for speaker verification. Achieves 89% accuracy in multi-speaker environments with real-time processing.

Python · Whisper · PyTorch
Code

Let's build with AI.

© Abdul Rahman Shaikh 2026 · Open to full-time AI/ML & Data Science roles & research collaboration