Skip to content

Building an ML App with SwiftUI (Part 1)

Published at  at 08:22 PM

Origin

Recently, due to a company project, I revisited Machine Learning—more specifically, deep learning. I started learning from PyTorch, moved on to selecting and training a model, and then ported it to iOS to create a simple demo. Throughout this process, I went through many tutorials and noticed a few issues:

Background Introduction

This series will use YOLO, an object detection model, as an example. YOLO’s primary function is object detection—it identifies objects in an image (e.g., apples, bananas, people) and marks them with bounding boxes. The effect looks something like this:
YOLO Example

Apple’s official site provides YOLOv3, a compact and practical model. There are also YOLOv4, YOLOv5, YOLOv7, YOLOv8, and even YOLOv9. Hilariously, one guy named his project YOLO9000, outdoing version-number fanatics like Node.js, Chrome, and Firefox.

On a side note, YOLO underwent a major shift starting with v5. Earlier versions were written in C, but from v5 onward, it switched to Python, adding many features and greatly improving usability. The project also split from its original repository. This sparked a lot of debate when people compared it to earlier versions—check out this post for an interesting read.

Anyway, this article will ultimately use a custom dataset to train an ML model that identifies furniture types. I’ll sprinkle ML-related details throughout without diving too deep, so don’t worry too much!

Data Preparation

Data Download

Roboflow offers public datasets for training or academic research, available here. For this project, we’ll use the pre-annotated furniture dataset. Click the button to download it, making sure to select the YOLOv8 format zip file.
Dataset Download

A quick note about Roboflow: you can upload unannotated images, manually label them on their platform, and export the dataset in a standard format for model training. It’s a great experience and free to use. Check out the Preparing a custom dataset for YOLOv8 section here for the workflow.

Data Explanation

After unzipping the file, let’s take a look at the folder contents:
Dataset Structure

Inside, you’ll find a data.yaml file that describes the dataset. Here’s its content:

train: ../train/images
val: ../valid/images
test: ../test/images

nc: 3
names: ["Chair", "Sofa", "Table"]

roboflow:
  workspace: roboflow-100
  project: furniture-ngpea
  version: 2
  license: CC BY 4.0
  url: https://universe.roboflow.com/roboflow-100/furniture-ngpea/dataset/2

Here’s a brief explanation of the fields above:

As for the other subfolders, we can see that test, val, and train each contain two folders: one called images, which holds the corresponding training pictures, and another called labels, which contains some interesting content. Let’s take a look at one. For example, the image Chairs--1-_jpg.rf.7104107727daae1f8c000a66cf0dd7b1.jpg has a corresponding label file Chairs--1-_jpg.rf.7104107727daae1f8c000a66cf0dd7b1.txt with the following content:

0 0.49666666666666665 0.48333333333333334 0.46 0.9383333333333334

Training

With the data preparation above, we’ve set up the data.yaml file and prepared the data. Now, let’s start training our own model.

Environment

Before training, let me list my training setup:

First, install the required Python libraries:

pip install -U ultralytics
pip install -U pytorch

How to Train

Here’s the code to train the model:

from ultralytics import YOLO

# If you’re using an NVIDIA GPU, you can use this code to check
device = "cuda" if torch.cuda.is_available() else "cpu"

# If you’re using an M-series chip MacBook, use this code. However, on my own computer, I found that MPS didn’t seem to work—it was slower than CPU.
# device = "mps" if torch.backends.mps.is_available() else "cpu"

# Load the pre-trained model
model = YOLO('yolov8n.pt')
# Train the model with custom data
results = model.train(data='/Users/danieljia/datasets/data.yaml', epochs=50, batch=16, imgsz=640, device=device)

Let’s break down the details below.

Additionally, training generates validation metrics, which, along with the model file, are saved somewhere. I’ll write a separate post about these metrics once I’ve organized them.

Validation

With a GPU, training should take about 20 minutes. Once done, the model file remains in .pt format, and the save path will be shown in the training logs—let’s assume it’s saved to /Users/danieljia/yolo/best.pt.
Now, let’s validate it with an image.

from ultralytics import YOLO

# Load the trained model
model = YOLO('/Users/danieljia/yolo/best.pt')
results = model(['im1.jpg'])
results[0].show()

Summary

This article walked through training an object detection model using custom image data, explaining the steps and terminology. The next article will cover how to integrate it into an iOS SwiftUI project—stay tuned!

Also, if anything in the article isn’t clear or seems off, please leave feedback in the comments.

References

If you’re interested in PyTorch, I strongly recommend checking out Zero to Mastery’s 00 and 01 chapters. They’re easy to understand, include exercises (which I suggest doing), and explain ML concepts practically from an engineering perspective. They also clearly outline the workflow of machine learning with code examples. I recommend taking a look and, if you can, going through all the chapters and practicing hands-on!

Share on: