Abdul
Rahman

AI/ML Engineer · PhD Candidate · IEEE VIS Best Paper

I build production LLM systems and publish the research behind them. My work spans RAG pipelines, multi-agent workflows, fine-tuned language models, and interactive visual analytics, with 9 peer-reviewed papers, 90+ citations, and prior ML infrastructure experience at Amazon.

Abdul Rahman Shaikh, AI/ML Engineer and Researcher
8+

Years in ML

9

Publications

90+

Citations

Best

Paper · IEEE VIS

AI/ML engineer and researcher completing a PhD in Computer Science at Northern Illinois University (GPA 3.9). My research spans large language models, multimodal learning, and interactive visual analytics, with publications at IEEE VIS, TVCG, JCDL, and Scientometrics.

Before my PhD, I built data pipelines at Amazon that reduced query latency by 30% across 500K+ daily records. At NIU, I lead the Visual Analytics Lab and build production systems, from RAG pipelines and multi-agent workflows to computer vision with ViT and SAM. Currently exploring how LLMs can reshape multi-view data exploration.

Education

Ph.D. Computer Science

Northern Illinois University

2020 – Present

GPA 3.9

M.S. Computer Science

Northern Illinois University

2018 – 2020

GPA 3.9

B.E. Computer Science

Osmania University, Hyderabad

2013 – 2017

Core Stack

ML & Data Science

PyTorch HuggingFace Scikit-learn XGBoost LoRA/QLoRA CLIP SHAP/LIME OpenCV A/B Testing Feature Engineering

LLM / GenAI

LangChain LangGraph AutoGen CrewAI FAISS RAG

Data & Infrastructure

PostgreSQL MongoDB Snowflake Docker Kubernetes FastAPI MLflow AWS Kafka

Languages & Visualization

Python SQL JavaScript R C/C++ Pandas NumPy D3.js Tableau Streamlit

Experience

Research and industry, with measurable outcomes.

2018 – Present DeKalb, IL

Researcher

Northern Illinois University · DATA Lab, VA Lab & WASTE Lab

Built LLM-powered pipelines, multi-agent workflows, and computer vision systems across 3 labs. Fine-tuned LLMs with LoRA (−60% GPU memory, −45% training time). Published 9 papers at IEEE VIS, TVCG, JCDL.

4× inference −60% GPU −45% training 300K+ images
2020 – Present DeKalb, IL

Lab Head, Visual Analytics Lab

Northern Illinois University

Leading AI-driven visualization research. Mentored 5 grad students (100% completion, 4 co-authored papers). PC member: CIKM, WWW, JCDL.

5 mentored 4 papers LLMs · D3.js · User Studies
2016 – 2017 Hyderabad, India

Data Analyst

Amazon

SQL/MongoDB pipelines that cut query latency by 30%. Built data quality workflows validating 500K+ daily records across e-commerce datasets.

−30% latency 500K+ daily SQL · Python · Pandas
2018 – 2023 DeKalb, IL

Teaching Assistant

Northern Illinois University · Algorithms, Databases, C/C++, Java

Supported 70+ students across core CS courses; 20% improvement in class performance.

70+ students +20% perf

Research

Peer-reviewed work in AI, LLMs, data visualization, and scientometrics.

Featured Published Scientometrics 2023

YouTube and Science: Models for Research Impact

Abdul Rahman Shaikh, Hamed Alhoori, M. Sun

Can YouTube videos predict a paper's real-world influence? This work introduces new datasets linking video content to scholarly articles and trains ML models that forecast citation counts and public engagement using altmetrics signals, measuring scientific impact beyond academia.

Published Graphics Interface 2025

iTrace: Interactive Tracing of Cross-View Data Relationships

Abdul Rahman Shaikh, Maoyuan Sun, Xingchen Liu, Hamed Alhoori, David Koop

When dashboards have many linked views, finding connections between distant data points is hard. iTrace introduces smooth focus transitions that guide attention across views, making cross-view relationship tracing faster and less error-prone.

Published IEEE VIS 2022

Toward systematic design considerations of organizing multiple views

Abdul Rahman Shaikh, David Koop, Hamed Alhoori, Maoyuan Sun

How should multiple visualization panels be arranged? This paper reviews dozens of multi-view systems and distills layout principles grounded in perception and content, providing a framework for designing dashboards that help users connect information across views.

Published TVCG 2021 Best Paper

SightBi: Exploring Cross-View Data Relationships with Biclusters

Maoyuan Sun, Abdul Rahman Shaikh, Hamed Alhoori, Jian Zhao

Exploring linked data across views usually involves tedious trial-and-error. SightBi formalizes cross-view relationships as biclusters and creates dedicated relationship-views that surface hidden connections, turning guesswork into guided exploration. Awarded Best Paper Honorable Mention at IEEE VIS 2021.

Published JCDL 2025

Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts

Mosab Rezaei, Mina Rajaei Moghadam, Abdul Rahman Shaikh, Hamed Alhoori, Reva Freedman

Can you teach an LLM a novelist's writing style with a single token? This work fine-tunes language models to generate 19th-century literary styles using minimal prompts, then evaluates the output with a transformer-based detector and explainable AI analyses.

Projects

Open-source tools, systems, and experiments.

@sabdulrahman
LLM RAG

LLMFlow: Scholarly Document Summarization & QA

End-to-end pipeline that chunks scholarly PDFs, generates hierarchical summaries via Llama/Ollama and GPT, and supports multi-turn Q&A with source attribution. Reduced reading time for 30+ page papers by ~60%.

Python · LangChain · FastAPI · React
Code
RAG Crawl

Rufus: Intelligent Web Data Extraction for LLMs

AI-powered web crawler that navigates sites, extracts relevant content, and synthesizes it into structured documents optimized for RAG ingestion. Handles JS-rendered pages via async Selenium.

Python · Asyncio · Selenium · RAG
Code
Health AI Multimodal

GenHealth: Multimodal Medical Report Analysis

Multimodal AI system that fuses clinical text, medical imaging, and structured signals to boost diagnostic extraction accuracy. Combines transformer encoders with domain-specific preprocessing.

Python · PyTorch · Transformers · FastAPI
Code
Sandbox Docker

Pexos: Safe Python Execution Sandbox

Secure code execution service for running untrusted Python with syscall restrictions, resource limits, and network isolation. Built for safe LLM code generation evaluation.

Python · Flask · nsjail · Docker
Code
Chatbot GPT

ChatFit: Personalized Fitness Chatbot

Conversational fitness assistant that collects user goals through dialogue and generates personalized workout and diet plans using GPT with structured output parsing.

Python · OpenAI GPT · Streamlit · Flask
Code
Audio Auth

VoxCore: Voice Authentication System

Voice authentication using OpenAI Whisper for transcription and custom PyTorch models for speaker verification. Achieves 89% accuracy in multi-speaker environments with real-time processing.

Python · Whisper · PyTorch
Code

Let's build with AI.

© Abdul Rahman Shaikh 2025 · Open to full-time AI/ML roles & research collaboration