sddec25-01 • Machine Learning: Semantic Segmentation Optimization

Project Overview

Note on Project Description

The initial project description submitted to Iowa State University by our client (see approved projects list) stated: "This project will break up an existing U-Net model into code segments that can then be pipelined. The result will be slightly higher latency but also higher throughput of the algorithm." This description represents the client's original proposal and has not been edited or construed by our team. Our actual research findings, detailed below, differ significantly from this initial description due to the reality of the given hardware and restrictions imposed by the client.

Our Machine Learning: Semantic Segmentation Optimization project focuses on optimizing semantic segmentation algorithms for eye tracking in assistive technology applications. We aim to improve the performance of these algorithms for individuals with mobility disabilities, particularly those with conditions like cerebral palsy.

Our original approach involved pipelining the U-Net neural network algorithm into four equal parts to run concurrently. However, through rigorous testing and output validation, we discovered that the Vitis AI model compiler incorrectly scaled the last two split segments by a factor of two, requiring input tensor rescaling for correct output. Our performance analysis showed the Single Model is 9.20x faster than the Split Model with 4 segments demonstrating that the DPU architecture does not benefit from model splitting for this use case as main processing bottleneck is the single model inference constraint imposed by the DPU synthesized on the FPGA fabric. Client also imposed that the fabric and model could not be changed.

Key contributions from our research include:

Comprehensive analysis of U-Net model splitting for DPU-based inference
Discovery of Vitis AI compiler scaling issues with split model segments
Performance benchmarking methodology for split vs single model inference
Optimization for the AMD Kria KV260 development board
ONNX Model Decomposition and analysis
Proprietary Machine Learning Models
AMD Development Tools: Petalinux Tools, Vitis AI, Vitis AI Model Optimizer
Docker Development Environment

Project Information

Team Number

sddec25-01

Client

JR Spidell

Team Members

Tyler Schaefer

ML Algorithm Analyst

Specializes in algorithm optimization and mathematical validation for ML models. Focuses on maintaining accuracy during the optimization and model splitting process.

Conner Ohnesorge

ML Integration HWE

Specializes in hardware optimization for ML models with experience in FPGA implementation. Also serves as the development environment manager.

Aidan Perry

Multithreaded Program Developer

Leads threading implementation and synchronization for parallel processing. Experienced in real-time systems and FPGA programming.

Joey Metzen

Kria Board Manager

Responsible for hardware management and memory optimization. Leads testing and benchmarking efforts for the system.

Project Timeline

Project Initialization

Jan 2025 - Feb 2025

Team formation
Problem definition
Initial research

Design Phase

Mar 2025 - Apr 2025

Architecture planning
Algorithm selection
Hardware requirements analysis

Implementation

May 2025 - Aug 2025

Mathematical division of U-Net
Thread management system
Memory allocation strategy

Testing & Validation

Sep 2025 - Oct 2025

Comprehensive testing
Performance benchmarking
Accuracy validation

Final Delivery

Nov 2025 - Dec 2025

System refinement
Documentation completion
Final presentation & handover

Project Documentation

Weekly Reports

Semester 1

Weekly Report 1 [PDF]

Conducted client meetings and preliminary research, but remained in a holding pattern waiting for NDAs, hardware, and codebase access.

Weekly Report 2 [PDF]

Onboarded into specialized project areas with Joseph studying Kria board specifications and memory optimization, Tyler meeting with the ISU team and familiarizing himself with ONNX files, Conner establishing communications with Oregon State and setting up Dev environments, and Aidan focusing on the multi-threading implementation in the C++ codebase for image processing optimization.

Weekly Report 3 [PDF]

Deepened our understanding of the codebase through comments, diagrams, and meetings with former team members. Joseph researched Kria Board components for our assignment, created a slide deck, and explored Xilinx Vitis, while Tyler met with Mason Inman and our client to discuss algorithm environment setup and quantization needs. Conner built the Petalinux OS, wrote nix derivations for multithreaded builds, created a data version control presentation, and gave a lightning talk on U-Net architecture. Aidan connected with last semester's multithreaded programming developer, reviewed previous presentations, and tested matrix implementations to better understand the algorithm.

Weekly Report 4 [PDF]

Transition from theoretical learning to hands-on experimentation. Joseph finalized component selections for the project and studied data flow through the Kria Board. Tyler successfully configured the Vitis-AI docker container and trained the existing model to demonstrate proper environment setup to the client. Conner created a test application that split a PyTorch segmentation model into separate encoder and decoder ONNX models for initial multicore processing experiments (later analysis would reveal DPU limitations with this approach). Aidan continued collaboration with the previous codebase owner while developing matrix equations for the multithreaded pipeline and becoming familiar with the `cv::mat` class.

Weekly Report 5 [PDF]

Making progress on software installations for properly running and visualizing the Kria board operations. Tyler researched optimization techniques specifically for U-Nets and CNNs on embedded systems. Conner fixed and improved the broken Docker development environment, making it more configurable, testing it across multiple platforms with GitHub Actions, and submitted PRs for both this fix and a slide deck presenting his data version control system proposal. Aidan formulated test equations for the multi-threading system and consulted with the previous codebase owner to validate his approach.

Weekly Report 6 [PDF]

Further optimizing the Kria board while advancing research on model splitting to improve computational efficiency. Joseph studied memory flow optimization for the Kria board. Tyler continued researching computational complexity and began creating a slide deck to propose a strategic model split. Aidan continued research on the `cv::mat` class, re-theorized the algorithm input structure, and held one-on-one meetings with both the client and past developers. Planning initializing benchmark testing of the old code, presenting the proposed model division to the client, testing the custom Petalinux images with previous code, and developing a test input in `cv::mat` format to identify flaws in the current system.

Weekly Report 7 [PDF]

Benchmarking and technical documentation while preparing for future implementation. Joseph worked with Conner to benchmark memory performance on the Kria board, specifically measuring data transfer rates between L1/L2 caches and RAM while exploring optimization strategies producing an 36-page memory benchmark documentation with approximately 12 graphs, complete with code examples (using Python for plotting, C for benchmarking, and Markdown for documentation). Tyler completed his computational complexity analysis and prepared a presentation for both the team and client to propose his algorithm division approach. Aidan reached out to a professor for expert consultation on optimal data structures for our implementation while continuing his theoretical work on designing inputs for thread testing and visualization.

Weekly Report 8 [PDF]

Moved from evaluation to implementation as we transitioned to the next phase of the project. Joseph continued RAM benchmarking and testing on the Kria Board to further optimize performance. Tyler presented his research findings to both the team and client, gaining approval to begin coding the model division that forms the core of our optimized implementation. Conner documented benchmark results for RAM and on-chip memory in the main repository, complete with code used to generate and graph these findings, and has begun more focused testing on ONNX model -> x-model inference for our U-Net model. Aidan scheduled a one-on-one meeting with an advisor, continuing his research on cv::mat implementation within the codebase while developing a better understanding of board operations.

Semester 2

Weekly Report 1 [PDF]

Review of project goals and requirements with the client, revisiting team roles and duties. Reworked the Gantt chart and planned tasks into smaller increments. Summer accomplishments included Kria board image testing, ML repo review, git repository cleanup, and matrix algorithm revisions.

Weekly Report 2 [PDF]

Met with client and faculty advisor to re-align on progress and future steps. Joseph created a data management slide deck, Tyler shared the split model, Conner presented project management improvements, and Aidan refined the matrix evaluation script for CVmat image format.

Weekly Report 3 [PDF]

Significant progress on model optimization. Conner implemented multiple splitting techniques with reproducible Docker execution and deployed versions to the board. Tyler conducted initial split model testing (later analysis revealed Vitis AI compiler scaling issues). Aidan completed C++ code for multi-threaded model confirmation. Joseph helped get segments running on the board.

Weekly Report 4 [PDF]

Relocated Kria Board to Senior Design lab for easier access. Successfully compiled and verified optimized models on the board. Conner documented the full generation process for creating optimized split segments of the U-Net algorithm. Team debugged UNET algorithm segments and reorganized the git repository structure.

Weekly Report 5 [PDF]

Joseph created a Blink Detection script and helped plan the design poster. Conner improved training script performance by over 10x and planned the live demo. Tyler worked on documentation handoff for future teams. Aidan created an eye tracking accuracy script and contributed to split U-Net model analysis preparation.

Weekly Report 6 [PDF]

Completed performance analysis comparing split vs single UNET model inference. Initial results were measured without evaluating outputs; upon validation, we discovered the Vitis AI model compiler incorrectly scaled the last two split segments by a factor of two, requiring input tensor rescaling for correct output. Corrected results: Single Model (534.65 ms, 1.87 FPS, 42.99 MB) is 9.20x faster than Split Model with 4 segments (4918.27 ms, 0.20 FPS, 134.00 MB). Team finalized design document changes, transferred data from the Kria board for future teams, and began planning the final presentation and live xfce demo.

Design Documents

Semester 1 Design Doc: Semantic Segmentation Optimization [PDF]

This document outlines the design specifications for our eye tracking semantic segmentation optimization project. It includes the mathematical approach to dividing the U-Net algorithm, performance analysis findings (Single Model 9.20x faster than Split Model due to Vitis AI compiler scaling issues), memory allocation strategies, and hardware configurations for the AMD Kria KV260 development board platform.

Final Design Document [PDF]

Complete design document detailing the project architecture, implementation, and research findings for the semantic segmentation optimization project.

Project Poster [PDF]

Visual summary of our project showcasing the research findings, performance analysis, and key contributions to semantic segmentation optimization for eye tracking on the AMD Kria KV260 platform.

Final Presentation Slides [PDF]

Final presentation slides summarizing the project objectives, research methodology, performance analysis results, and key findings from our semantic segmentation optimization work.

Lightning Talks

Lightning Talk 1: Project Overview [PDF]

A presentation covering our project's objectives, technical approach, and preliminary results for optimizing semantic segmentation for eye tracking applications on resource-constrained hardware.

Engineering Standards

Engineering Standards [PDF]

Documentation of the engineering standards used in our project, including IEEE standards for AI-based image recognition (IEEE 3129-2023), medical device performance evaluation (IEEE 2802-2022), and data privacy (IEEE 7002-2022).

Testing

Testing Strategy [PDF]

Comprehensive testing strategy document outlining our approach for validating the semantic segmentation optimization, including unit testing, integration testing, system testing, and performance benchmarking.

User Testing [PDF]

User testing plan document outlining the testing procedures and metrics for ensuring the quality of the provided product to users.

Hover over a PDF link to preview