Concept & Achievement
Scientific reproducibility is a major issue. AutoAgent is an experimental framework that uses Large Language Models (LLMs) to "read" a GitHub repository, identify the entry points, configure the environment (Docker/Conda), and execute the code automatically.
🏆 Awarded 2nd Place at the University Hackathon. The project was subsequently selected by the university for continued development.
⚠️ Status Update: The live instance is currently under development. The project previously relied on a University-provided API which has since been revoked. We are working on a new deployment.
Demo Run
Pipeline Architecture
The autonomous pipeline follows a strict 5-step process to ensure reliable execution:
- Link Extraction: Identifies potential GitHub repositories from the input (PDF/Text).
- Validation & Cloning: Verifies the repository's validity and clones it into a sandboxed environment.
- Deep Analysis: The "Scanner Agent" traverses the file tree to understand the codebase structure.
- Demo Generation: The "Architect Agent" synthesizes a demo script based on found entry points.
- Execution: The "Executor Agent" runs the demo and captures the output (plots, logs).
Engineering Strategy
We rigorously benchmarked Groq (too many hallucinations), Gemini 2.5 (good, but struggled with validation), and GPT-4o. GPT-4o was selected for its superior adherence to complex instructions.
To maximize reliability (Average Score: 7/10), we adopted a Hybrid Strategy:
- Heuristics: Used for deterministic tasks like language detection and file finding.
- LLMs: Deployed only for complex reasoning tasks (e.g., generating config files or summarizing logic) when heuristics fail.
Evaluation System
We developed a custom scoring algorithm (0-10) to grade each run:
- Binary Checks (+5pts): Syntax, Exit Codes, Execution Time (< 70s avg), and Error streams.
- LLM Judge (+5pts): A separate agent evaluates the output for Clarity, Completeness, and Relevance.