Abdul Rahman Shaikh

Ever since I typed my very first line of code, I've been captivated by the endless potential of technology to shape our world. Today, as a computer scientist and PhD candidate at Northern Illinois University, I focus on harnessing AI, LLMs, and other cutting-edge tools to tackle complex problems and drive meaningful change. By bridging theoretical research with practical applications, I'm working to make complex systems more efficient, scalable, and impactful.

Here, you’ll find my latest projects, insights, and collaborative endeavors aimed at pushing technology forward. Whether you’re seeking fresh inspiration or eager to dive into groundbreaking discussions, let’s connect and explore the possibilities!

Focus

I am proficient in AI, LLM, and CV with hands-on experience in building and deploying intelligent systems.

Development

Most of my work has been in Python, but I am quite familiar with JavaScript/ TypeScript, C++ and Java as well.

Tools

Git, Docker, Hugging Face, TensorFlow, and NPM, you name it, I have used a bajillion tools for effiecient workflows!

Research

YouTube and Science: Models for Research Impact

Abdul Rahman Shaikh, Hamed Alhoori, and M. Sun; Journal of Scientometrics, 2023.

Abstract: Video communication has been rapidly increasing over the past decade, with YouTube providing a medium where users can post, discover, share, and react to videos. There has also been an increase in the number of videos citing research articles, especially since it has become relatively commonplace for academic conferences to require video submissions. However, the relationship between research articles and YouTube videos is not clear, and the purpose of the present paper is to address this issue. We created new datasets using YouTube videos and mentions of research articles on various online platforms. We found that most of the articles cited in the videos are related to medicine and biochemistry. We analyzed these datasets through statistical techniques and visualization, and built machine learning models to predict (1) whether a research article is cited in videos, (2) whether a research article cited in a video achieves a level of popularity, and (3) whether a video citing a research article becomes popular. The best models achieved F1 scores between 80% and 94%. According to our results, research articles mentioned in more tweets and news coverage have a higher chance of receiving video citations. We also found that video views are important for predicting citations and increasing research articles’ popularity and public engagement with science.

Keywords: YouTube, Machine Learning, Altmetrics Tools: Python, Tableau, MATLAB

PDF

iTrace : Interactive Tracing of Cross-View Data Relationships

Abdul Rahman Shaikh, Maoyuan Sun, Xingchen Liu, Hamed Alhoori, and David Koop; Graphics Interface, 2025

Abstract: Exploring data relations across multiple views has been a common task in many domains such as bioinformatics, cybersecurity, and healthcare. To support this, various techniques (e.g., visual links and brushing & linking) are used to show related visual elements across views via lines and highlights. However, understanding the relations using these techniques, when many related elements are scattered, can be difficult due to spatial distance and complexity. To address this, we present iTrace, an interactive visualization technique to effectively trace cross-view data relationships. iTrace leverages the concept of interactive focus transitions, which allows users to see and directly manipulate their focus as they navigate between views. By directing the user’s attention through smooth transitions between related elements, iTrace makes it easier to follow data relationships. We demonstrate the effectiveness of iTrace with a user study, and we conclude with a discussion of how iTrace can be broadly used to enhance data exploration in various types of visualizations.

Keywords:Visual Analytics, Interaction Design, Multiple viewsTools: JavaScript, Node.js, Python

Toward systematic design considerations of organizing multiple views

Abdul Rahman Shaikh, David Koop, Hamed Alhoori, and Maoyuan Sun; IEEE VIS, 2022

Abstract: Multiple-view visualization (MV) has been used for visual analytics in various fields (e.g., bioinformatics, cybersecurity, and intelligence analysis). Because each view encodes data from a particular per-spective, analysts often use a set of views laid out in 2D space to link and synthesize information. The difficulty of this process is impacted by the spatial organization of these views. For instance, connecting information from views far from each other can be more challenging than neighboring ones. However, most visual analysis tools currently either fix the positions of the views or completely delegate this organization of views to users (who must manually drag and move views). This either limits user involvement in managing the layout of MV or is overly flexible without much guidance. Then, a key design challenge in MV layout is determining the factors in a spatial organization that impact understanding. To address this, we review a set of MV-based systems and identify considerations for MV layout rooted in two key concerns: perception, which considers how users perceive view relationships, and content, which considers the relationships in the data. We show how these allow us to study and analyze the design of MV layout systematically.

Keywords:Visual Analytics, Layout Design, Multiple viewsTools: JavaScript, Node.js, Power BI

PDF

Performing an analysis task with SightBi

SightBi: Exploring Cross-View Data Relationships with Biclusters

Maoyuan Sun, Abdul Rahman Shaikh, Hamed Alhoori, and Jian Zhao; IEEE Transactions on Visualization and Computer Graphics, 2021 (Best Paper Honorable Mention)

Abstract: Multiple-view visualization (MV) has been heavily used in visual analysis tools for sensemaking of data in various domains (e.g., bioinformatics, cybersecurity and text analytics). One common task of visual analysis with multiple views is to relate data across different views. For example, to identify threats, an intelligence analyst needs to link people from a social network graph with locations on a crime-map, and then search for and read relevant documents. Currently, exploring cross-view data relationships heavily relies on view-coordination techniques (e.g., brushing and linking), which may require significant user effort on many trial-and-error attempts, such as repetitiously selecting elements in one view, and then observing and following elements highlighted in other views. To address this, we present SightBi, a visual analytics approach for supporting cross-view data relationship explorations. We discuss the design rationale of SightBi in detail, with identified user tasks regarding the use of cross-view data relationships. SightBi formalizes cross-view data relationships as biclusters, computes them from a dataset, and uses a bi-context design that highlights creating stand-alone relationship-views. This helps preserve existing views and offers an overview of cross-view data relationships to guide user exploration. Moreover, SightBi allows users to interactively manage the layout of multiple views by using newly created relationship-views. With a usage scenario, we demonstrate the usefulness of SightBi for sensemaking of cross-view data relationships.

Keywords: Visual Analysis, Data Visualization, Biclustering Tools: JavaScript, Node.js, D3, CSS

PDF

Quantifying the online long-term interest in research

Murtuza Shahzad, Hamed Alhoori, Reva Freedman, and Abdul Rahman Shaikh; Journal of Informetrics, 2022

Abstract: Research articles are being shared in increasing numbers on multiple online platforms. Although the scholarly impact of these articles has been widely studied, the online interest determined by how long the research articles are shared online remains unclear. Being cognizant of how long a research article is mentioned online could be valuable information to the researchers. In this paper, we analyzed multiple social media platforms on which users share and/or discuss scholarly articles. We built three clusters for papers, based on the number of yearly online mentions having publication dates ranging from the year 1920 to 2016. Using the online social media metrics for each of these three clusters, we built machine learning models to predict the long-term online interest in research articles. We addressed the prediction task with two different approaches: regression and classification. For the regression approach, the Multi-Layer Perceptron model performed best, and for the classification approach, the tree-based models performed better than other models. We found that old articles are most evident in the contexts of economics and industry (i.e., patents). In contrast, recently published articles are most evident in research platforms (i.e., Mendeley) followed by social media platforms (i.e., Twitter).

Keywords: Machine Learning, Scholarly Impact, Altmetrics, Regression analysis Tools: Python, R, Tableau

PDF

Youth in Policy Documents study architecture image

Examining the Representation of Youth in the US Policy Documents through the Lens of Research

Miftahul Jannat Mokarrama, Abdul Rahman Shaikh, and Hamed Alhoori; IEEE BigData , 2024

Abstract: This study explores the representation of youth in US policy documents by analyzing how research on youth topics is cited within these policies. The research focuses on three key questions: identifying the frequently discussed topics in youth research that receive citations in policy documents, discerning patterns in youth research that contribute to higher citation rates in policy, and comparing the alignment between topics in youth research and those in citing policy documents. Through this analysis, the study aims to shed light on the relationship between academic research and policy formulation, highlighting areas where youth issues are effectively integrated into policy and contributing to the broader goal of enhancing youth engagement in societal decision-making processes.

Keywords: NLP, Topic Modeling, Policy Documents Tools: Python, R, Tableau

PDF

Results of classification models on patent related data

Predicting patent citations to measure economic impact of scholarly research

Abdul Rahman Shaikh, and Hamed Alhoori; ACM/IEEE Joint Conference on Digital Libraries, 2019

Abstract: A crucial goal of funding research and development has always been to advance economic development. On this basis, a considerable body of research undertaken with the purpose of determining what exactly constitutes economic impact and how to accurately measure that impact has been published. Numerous indicators have been used to measure economic impact, although no single indicator has been widely adapted. Based on patent data collected from Altmetrics we predict patent citations through various social media features using several classification models. Patents citing a research paper implies the potential it has for direct application in its field. These predictions can be utilized by researchers in determining the practical applications for their work when applying for patents.

Keywords: Machine Learning, Economic Impact, Patent Citations Tools: Python, R, Tableau

PDF

Modeling the Broader Impact of Science and Health Using Social Media

Abdul Rahman Shaikh, Master's Thesis; Northern Illinois University, 2022

Abstract: Research and development have always initiated innovation and breakthroughs in technology. These technological advancements in recent years have provided a global medium for research to be disseminated through online platforms. These web-based platforms and the interactions that take place on them affect the dissemination, impact, and perception of online information. This thesis investigates the broader impact of science and health using social media posts, online patents, videos, and images by building machine learning and topic models. First, this study predicts patent citations to scientific research and identifies important factors essential to economic impact. We found that the citation of research in patents is a strong indicator of economic impact and strengthens the popularity of scholarly research. Second, we studied video communication of scholarly research and found that it has been increasing and there is a lack of studies in this area. Therefore, this study bridges the gap between scientific videos and research by building models to predict videos’ scholarly and societal impact. Finally, this study aims to understand the impact of health-related topics on the public. Instagram images with textual features express different views on topics from users’ perspectives worldwide. We built topic models on the posts related to health and COVID-19 to analyze users' perceptions across different locations. The thesis identifies factors essential in recognizing the broader influence of science and health. Based on the results, we will have a better understanding of the economic and societal impact of science and the public understanding of health.

Keywords: Machine Learning, Computer Vision, Societal Impact, Altmetric Tools: Python, TensorFlow

PDF

Projects

LLMFlow: Summarization of Scholarly Documents

LLMFlow integrates LLama3.2 with Ollama, DeepSeek and GPT with LangChain to generate concise summaries of large-scale research documents, reducing reading time and effort for academics. The app allows Q&A's related to the research paper improving efficiency in reviewing scholarly literature.

Python LangChain React FastAPI

ChatFit: Personalized Fitness Chatbot

Built a conversational fitness chatbot powered by LLMs that provides personalized workout and diet recommendations based on user inputs. The chatbot adapts to user goals, fitness level, and dietary preferences, offering a dynamic, interactive experience.

Python OpenAI GPT Streamlit Flask

MixArt: Generative Artwork with Stable Diffusion

Developed a pipeline for generating abstract and photorealistic art using Stable Diffusion, leveraging LoRA fine-tuning to personalize outputs for user-requested styles. The system enables artists and designers to create unique, customized visual content efficiently.

Python Stable Diffusion LoRA

Pexos: Safe Python Execution Sandbox

Developed a secure Python execution service using nsjail and Docker to sandbox untrusted code. Enabled safe REST-based script execution with resource limits, system call restrictions, and error handling to prevent abuse or crashes.

Python Flask nsjail Docker

Rufus: Intelligent Web Data Extraction for LLMs

Developed an AI-powered web crawler that extracts and synthesizes relevant content into structured documents for Retrieval-Augmented Generation (RAG) systems. Rufus intelligently navigates complex websites based on user-defined prompts, supports asynchronous crawling, and outputs data in JSON, text, or CSV formats.

Python Asyncio Selenium RAG

VoxCore: Voice Authentication System

Implemented a voice authentication system using OpenAI's Whisper for transcription and custom PyTorch models for speaker verification, achieving 89% accuracy in multi-speaker environments. This system enhances security in voice-driven applications.

Python Whisper PyTorch

InstaHealth: Fine-Tuning Caption Generation for Instagram Posts

Fine-tuned a language model to generate context-aware captions for health-related content on Instagram data, automating large-scale social media captioning. This tool optimizes social media engagement for health and wellness organizations.

Python CV

SightBi: Interactive Cross-View Data relationships with Biclusters

A visual analytics approach for supporting cross-view data relationship explorations. SightBi formalizes cross-view data relationships as biclusters, computes them from a dataset, and uses a bi-context design that highlights creating stand-alone relationship-views.

JavaScript D3 Node.JS HTML CSS

YouTube and Science: Models for Research Impact

The research explores the growing intersection of YouTube videos and academic research citations, highlighting increased video references to scholarly articles. To analyze this trend, new datasets were created, and machine-learning models were developed to predict citation patterns and the influence of video views on the popularity and public engagement of research articles.

Python SQL Tableau MATLAB

LineGuider: Exploring Cross-View Data Relationships

LineGuider is an interactive tool that enables users to seamlessly explore and trace visual relationships between multiple views or data visualizations. By linking data points across different visualizations, the tool enhances analytical accuracy, aids in pattern recognition, and simplifies complex decision-making processes.

JavaScript CSS3 HTML5 D3

iFoc: Interactive Tracing of Cross-View Data Relationships

iFoc is an advanced tool designed to empower users with an interactive approach to analyzing cross-view data relationships. By providing a clear and intuitive way to trace connections between data points across various views, iFoc increases the accuracy and confidence level of analytical tasks, making it invaluable for researchers and data analysts.

JavaScript CSS3 HTML5 D3

Get In Touch

Hello, World! I’m Abdul - AI Enthusiast & Code Whisperer

Focus

Development

Tools

Research

Projects

LLMFlow: Summarization of Scholarly Documents

ChatFit: Personalized Fitness Chatbot

MixArt: Generative Artwork with Stable Diffusion

Pexos: Safe Python Execution Sandbox

Rufus: Intelligent Web Data Extraction for LLMs

VoxCore: Voice Authentication System

InstaHealth: Fine-Tuning Caption Generation for Instagram Posts

SightBi: Interactive Cross-View Data relationships with Biclusters

YouTube and Science: Models for Research Impact

LineGuider: Exploring Cross-View Data Relationships

iFoc: Interactive Tracing of Cross-View Data Relationships

Get In Touch

Email

GitHub

LinkedIn