如何减少iOS上的本地AI模型大小

前言

目前越来越多的iOS App 集成了AI模型来给用户提供服务，在这一点上隐私性很好，但是有些模型比较大(100M以上), 在一定程度上会影响模型的初始化速度，在处理任务的时候也会造成更多的数据交互，进一步影响处理速度。最近尝试了一些优化的方法，今天尽量用简单直白的语言把这个事讲清楚。

例子

我们用背景移除这个应用场景经常用到的提取mask的 IsNet 模型作为例子，看看怎么一步步的优化这个模型，并且给出相关的评价指标来衡量模型的优劣。所以我们需要下载2个东西，一个是 IsNet 的coreml模型，可以去这里下载。然后就是图片测试数据集，这里用到的是这个项目里面的测试数据(图片原图以及mask数据)。

如何压缩模型

压缩模型的方法均由苹果官方的 coremltools 库提供。一般有 Palettization, Quantization, Pruning 这几种方式。下面是如何尝试的代码。(假设原模型是mlpackage格式)

from coremltools.optimize.coreml import (
    OpPalettizerConfig,
    OpMagnitudePrunerConfig,
    OpLinearQuantizerConfig,
    OptimizationConfig,

    prune_weights,
    palettize_weights,
    linear_quantize_weights
)

mlmodel = ct.models.MLModel("a.mlpackage")

# you can change sparsity to 0.10, 0.25, 0.50, 0.75...
op_config = OpMagnitudePrunerConfig(target_sparsity=0.10)
config = OptimizationConfig(op_config)
compressed_mlmodel = prune_weights(mlmodel, config=config)

# you can change nbits to 8, 6, 4, 2
# op_config = OpPalettizerConfig(nbits=8)
# config = OptimizationConfig(op_config)
# compressed_mlmodel = palettize_weights(mlmodel, config=config)


# op_config = OpLinearQuantizerConfig()
# config = OptimizationConfig(op_config)
# compressed_mlmodel = palettize_weights(mlmodel, config=config)

compressed_mlmodel.save("compress.mlpackage")

但是因为IsNet导出的模型目前是mlmodel, 而不是mlpackage格式的，那么只支持 quantize_weights 这种方式, 具体看这里。

注意: nbits如果越大，模型越大，所以目前我们就先只尝试 8 和 4

import coremltools as ct
import coremltools.optimize as cto
from coremltools.models.neural_network import quantization_utils

# load model
mlmodel = ct.models.MLModel("ISNet_1024_1024.mlmodel")

# you can change nbits to be 8, 6, 4, 2, then the size will be less
compressed_mlmodel = quantization_utils.quantize_weights(mlmodel, nbits=8)
compressed_mlmodel.save("ISNet_1024_1024_8.mlmodel")

compressed_mlmodel = quantization_utils.quantize_weights(mlmodel, nbits=4)
compressed_mlmodel.save("ISNet_1024_1024_4.mlmodel")

模型分析

经过上面的压缩我发现原模型是176M, ISNet_1024_1024_8是 44M, ISNet_1024_1024_4是 22M, 较之之前的模型大大的缩小了。那么精度如何呢？此时我们需要引入一个损失函数(loss function)用来评价压缩模型的效果。原理如下:

使用这3个模型（原模型，2个压缩模型）分别对测试数据集里面的图片进行处理，然后将生成的mask放到不同的文件夹里
将生成的mask图片和测试数据集里面的人工处理得到的mask进行对比，得到一个数值a，主要就是为了衡量模型处理的mask和人工处理的mask的偏移程度。同样的，其他2个模型也会得到数据b 和 c。做对比的方式很简单，因为mask图片都已经灰度化了，所以我们可以简单认为每个像素点上就1个值，而不是 RGB 3个颜色值。所以我们直接对2组mask图片进行向量相减就行，得到的结果向量再把所有值相加，再除以这个原图的像素个数(width X height)。
最后看 b 和 c 这2个值跟a相比，差距大不大，就能知道压缩模型和原模型的结果如何。

下面是损失函数以及如何得到a b c 这3个值的方法。

import coremltools as ct
import os
import numpy as np
import matplotlib.pyplot as plt
import PIL



# process image and generate masks in output_folder
def processImage(imagePath, model, output_folder):
    input_width = 1024
    input_height = 1024
    img = PIL.Image.open(imagePath)
    ori_size = img.size

    img = img.resize((input_width, input_height), PIL.Image.Resampling.LANCZOS)
    out_dict = model.predict({'x_1': img})
    extension = imagePath.split("/")[-1].split(".")[-1]
    result_full_path = output_folder + "/" + imagePath.split("/")[-1].split(".")[0] + "." + extension
    file_format = "JPEG" if (extension == "jpg" or extension == "jpeg") else "PNG"
    out_dict["activation_out"].resize(ori_size).save(result_full_path, format=file_format)


# process batch images in folder_path
def processBatchImage(folder_path, model, output_folder):
    image_extensions = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff']
    for filename in os.listdir(folder_path):
        file_path = os.path.join(folder_path, filename)
        if os.path.isfile(file_path) and os.path.splitext(filename)[1].lower() in image_extensions:
            print(f'Processing file: {file_path}')
            processImage(file_path, model, output_folder)


# get all of the mask np array
def getImageArray(folder_path):
    result = []
    image_extensions = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff']
    for filename in sorted(os.listdir(folder_path)):
        file_path = os.path.join(folder_path, filename)
        if os.path.isfile(file_path) and os.path.splitext(filename)[1].lower() in image_extensions:
            image = PIL.Image.open(file_path).convert('L')
            result.append(np.array(image))
    return result

# caculate difference between original mask and different model mask result
def loss_function(folder1, folder2):
    ground_image_array = getImageArray(folder1)
    output_image_array = getImageArray(folder2)

    total_loss = 0
    total_pixels = 0

    for model_output, ground_truth in zip(output_image_array, ground_image_array):
        diff = np.abs(model_output - ground_truth)
        image_loss = np.sum(diff)
        pixelSize = ground_truth.size

        total_loss += image_loss
        total_pixels += pixelSize
    # normalize the loss value by pixel number in one image
    normalized_loss = total_loss / total_pixels
    return normalized_loss


# load the model with different compress level
isnet_model = ct.models.MLModel("ISNet_1024_1024.mlmodel")
isnet_model_4 = ct.models.MLModel("ISNet_1024_1024_4.mlmodel")
isnet_model_8 = ct.models.MLModel("ISNet_1024_1024_8.mlmodel")

# generate mask images by different models
processBatchImage("./datasets/original_test", isnet_model, "./datasets/isnet_mask")
processBatchImage("./datasets/original_test", isnet_model_8, "./datasets/isnet_mask_8")
processBatchImage("./datasets/original_test", isnet_model_4, "./datasets/isnet_mask_4")

# caculate the loss value
# original_model_loss: 18.37, size: 176M
# model_4_loss 14.86, size: 22M
# model_8_loss 18.29, size: 44M
original_model_loss = loss_function("./datasets/original_mask", "./datasets/isnet_mask")
model_4_loss = loss_function("./datasets/original_mask", "./datasets/isnet_mask_4")
model_8_loss = loss_function("./datasets/original_mask", "./datasets/isnet_mask_8")

最终根据大小和这个loss值，我们选择 isnet_model_4 这个压缩模型。然后我又手动看了下结果文件夹，看起来效果还可以。

运行

根据苹果的介绍，在 iOS16, iOS17后，运行模型会有缓存，如果在app启动后，在后台先加载一下模型，相当于预热，那么在模型真正执行的时候，会从缓存里面去找是不是之前有启动过，那样的话处理速度会非常快的。我之前测试过，加载模型大概需要7 8s的样子，对于上面的那个170M左右的模型来说。当然缓存也会失效，在手机过热，app重启，手机重启的时候，缓存都会消失。所以我们要设计好相关的机制，给模型提前预热。

远程下载模型

iOS还支持远程下载模型文件，然后本地编译，异步加载，这样就能使得app本身很小，当然如果能配合上面的那个模型压缩的方式，会让app小的同时，下载的模型也小。