Research

Automating the Data Science Analysis process with AI using Fundamental Components of Data Science

February 2024 • Capstone

Abstract

This project was developed as a data science capstone exploring the potential of a prototype agentic architecture built through structured multi-step prompt orchestration, developed at a time when autonomous AI agents were still largely emerging and experimental. Rather than replacing the role of the data scientist, the objective was to examine whether foundational analytical processes could be abstracted, generalized, and autonomously executed across diverse datasets with minimal human intervention. The central premise was that the structural logic underlying data science workflows could be learned, operationalized, and dynamically sequenced toward a defined analytical goal.

The system was grounded in the observation that many data science tasks (particularly exploratory data analysis, feature inspection, and preliminary model selection) follow recurring structural patterns across domains. By formalizing these patterns, the architecture was designed to evaluate incoming datasets, infer relevant statistical properties, detect trends and anomalies, and conditionally determine appropriate analytical techniques without explicit task-by-task scripting. Emphasis was placed on exploratory data analysis as the entry point for automation, leveraging statistical reasoning and machine learning methods to surface insights that would traditionally require iterative manual exploration.

Ultimately, the project demonstrates how early agent-inspired systems could potentially augment data science workflows by reducing repetitive early-stage tasks while reinforcing the continued importance of human oversight, interpretation, and strategic judgment.

PDF Viewer

open in new tab