AutoAgent | Ahmed

Concept & Achievement

Scientific reproducibility is a major issue. AutoAgent is an experimental framework that uses Large Language Models (LLMs) to "read" a GitHub repository, identify the entry points, configure the environment (Docker/Conda), and execute the code automatically.

🏆 Awarded 2nd Place at the University Hackathon. The project was subsequently selected by the university for continued development.

⚠️ Status Update: The live instance is currently under development. The project previously relied on a University-provided API which has since been revoked. We are working on a new deployment.

Demo Run

Pipeline Architecture

The autonomous pipeline follows a strict 5-step process to ensure reliable execution:

Link Extraction: Identifies potential GitHub repositories from the input (PDF/Text).
Validation & Cloning: Verifies the repository's validity and clones it into a sandboxed environment.
Deep Analysis: The "Scanner Agent" traverses the file tree to understand the codebase structure.
Demo Generation: The "Architect Agent" synthesizes a demo script based on found entry points.
Execution: The "Executor Agent" runs the demo and captures the output (plots, logs).

Engineering Strategy

We rigorously benchmarked Groq (too many hallucinations), Gemini 2.5 (good, but struggled with validation), and GPT-4o. GPT-4o was selected for its superior adherence to complex instructions.

To maximize reliability (Average Score: 7/10), we adopted a Hybrid Strategy:

Heuristics: Used for deterministic tasks like language detection and file finding.
LLMs: Deployed only for complex reasoning tasks (e.g., generating config files or summarizing logic) when heuristics fail.

Evaluation System

We developed a custom scoring algorithm (0-10) to grade each run:

Binary Checks (+5pts): Syntax, Exit Codes, Execution Time (< 70s avg), and Error streams.
LLM Judge (+5pts): A separate agent evaluates the output for Clarity, Completeness, and Relevance.