2023.11.18

記事内に商品プロモーションを含む場合があります

Checking the Performance of M1 and M2 Mac GPUs with YOLOv8| Building the Execution Environment and Benchmarking

Aru

Are you curious about the capabilities of using M1/M2/M3 Macs for deep learning?

In this article, we benchmark the YOLOv8 on MacBook models equipped with M1 Pro and M2 to explore their performance in the realm of deep learning.

Contents

Introduction
Install
- Setting Up the Environment with Miniconda
Benchmarking with YOLOv8
Summary of the results
(Appendix) M1 Pro Execution Log
- Train on CPU（M1 Pro）
- Train on GPU（M1 Pro）

Introduction

During the evaluation of YOLOv8, I noticed that YOLOv8 inherently supports Apple’s M1/M2 chips. With PyTorch having long adapted to Apple Silicon, I’ve been conducting various benchmarks. However, until now, I hadn’t explored benchmarking for object detection models.

So, I conducted benchmarks to assess its performance during object detection training using YOLOv8.

Install

Setting Up the Environment with Miniconda

First, let’s set up a Python environment using Miniconda. If you are installing Miniconda on Apple Silicon via the command line, execute the following command:

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
sh Miniconda3-latest-MacOSX-arm64.sh

I referred to the following page for this method.

Install PyTorch 2.0 GPU/MPS for Mac M1/M2 with Conda

After installation, restart the terminal or execute “source ~/.zshrc” to ensure that the conda command can be executed.

Once you confirm that the conda command can be executed, create a virtual environment for YOLOv8.

By creating a virtual environment, you can avoid cluttering the Python environment with YOLOv8 installation. Additionally, it provides an easy way to clean up by allowing simple removal of the virtual environment.

Execute the following commands to create a virtual environment named ‘yolov8’, switch to the created virtual environment ‘yolov8’, and install YOLOv8 in the switched environment:

conda create -n yolov8 python=3.11 ⏎
conda activate yolov8 ⏎
pip install ultralytics ⏎

Run the following command to start Python from the command line and confirm if the GPU is available.

If torch.backends.mps.is_available() returns True, the GPU is recognized.

> python
Python 3.11.4 (main, Jul  5 2023, 08:40:20) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.backends.mps.is_available()
True
>>>

Benchmarking with YOLOv8

Operating environment

The environment used for the experiment this time is a MacBook Air M2 2022 with 16GB of memory. It is a compact-sized laptop that is easy to carry, and if it can handle machine learning tasks effectively, it would offer significant advantages.

MacBook Air M2 2022スペック

8-core CPU（performance core x４、efficiency core x４）
8-core GPU
16-core Nural Engine
memory bandwidth 100GB/s

Additionally, I tried running it on a MacBook Pro M1 16-inch 2021 with 16GB of memory.

MacBook Pro M1 16インチ 2021スペック

10-core CPU（performance core x8、efficiency core x2）
16-core GPU
16-core Nural Engine
memory bandwidth 200GB/s

The execution log for M1 Pro has been added at the end.

Prepare data

The data was prepared using the Car Object Detection dataset available on Kaggle. Since the dataset is not formatted for YOLOv8, it needs to be converted.

Please refer to the this article for the conversion code.

Regarding the dataset, you can download the output of the following Kaggle notebook and copy the “datasets” folder to the root of your working directory.

https://www.kaggle.com/code/aruaru0/yolov8-car-object-detection/output

dataset.yaml

Create the YOLOv8 data file named dataset.yaml. The contents of the file should include the path to the data, the number of classes, and the class names. Once created, place it in the root of your working directory.

# Path
path: ./cars
train: images/train
val: images/valid

# Classes
nc: 1
names: ['car']

Train on CPU（M2）

First, run YOLOv8 on CPU. For now, set the number of epochs to 10 and the batch size to 8.

The argument with device="cpu" represents the parameter for CPU configuration.

from ultralytics import YOLO
import time


if __name__ == "__main__":
    model = YOLO('yolov8n.pt')

    start = time.time()

    model.train(data="dataset.yaml", epochs=10, batch=8, pretrained=True, device="cpu")

    end = time.time()
    print(f"{end-start}sec.")

Here is a partial excerpt from the execution log. The processing time is around 60 seconds per epoch, and it took approximately 630 seconds for the entire 10 epochs. Even with the smallest YOLOv8 model, it seems feasible to train on CPU for this number of images.

<span class="jinr-d--font-size d--fontsize-11px">Transferred 319/355 items from pretrained weights
train: Scanning /Users/tadanori/tmp/m1test/datasets/cars/labels/train.cache... 284 images, 0 backgrounds, 0 corrupt: 100%|██████████| 284/284 [00:00<?, ?it/s]
val: Scanning /Users/tadanori/tmp/m1test/datasets/cars/labels/valid.cache... 71 images, 0 backgrounds, 0 corrupt: 100%|██████████| 71/71 [00:00<?, ?it/s]
Plotting labels to runs/detect/train53/labels.jpg...
optimizer: AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to runs/detect/train53
Starting training for 10 epochs...
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/10         0G      1.415      2.612      1.143          5        640: 100%|██████████| 36/36 [00:47<00:00,  1.33s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:06<00:00,  1.35s/it]
                   all         71        100     0.0046       0.98     0.0113    0.00679

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       2/10         0G      1.348      1.832      1.171          6        640: 100%|██████████| 36/36 [00:47<00:00,  1.31s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:06<00:00,  1.35s/it]
                   all         71        100          1      0.125      0.927      0.581

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       3/10         0G      1.377       1.75      1.163          8        640: 100%|██████████| 36/36 [00:46<00:00,  1.30s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:06<00:00,  1.34s/it]
                   all         71        100      0.958      0.916      0.982      0.614

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       4/10         0G      1.299      1.453      1.155          8        640: 100%|██████████| 36/36 [00:47<00:00,  1.33s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:07<00:00,  1.52s/it]
                   all         71        100       0.99      0.945      0.987      0.639

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       5/10         0G      1.304      1.371       1.15          8        640: 100%|██████████| 36/36 [00:57<00:00,  1.58s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.67s/it]
                   all         71        100       0.97       0.95      0.987        0.6

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       6/10         0G      1.268      1.321       1.15          4        640: 100%|██████████| 36/36 [00:55<00:00,  1.55s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.61s/it]
                   all         71        100      0.988       0.96      0.972      0.624

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       7/10         0G      1.234      1.197      1.099          8        640: 100%|██████████| 36/36 [00:56<00:00,  1.56s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.66s/it]
                   all         71        100       0.99       0.96      0.986      0.604

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       8/10         0G      1.236      1.128      1.119          4        640: 100%|██████████| 36/36 [00:57<00:00,  1.60s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.67s/it]
                   all         71        100          1      0.962      0.992      0.638

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       9/10         0G      1.195      1.025      1.106          4        640: 100%|██████████| 36/36 [00:58<00:00,  1.63s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.74s/it]
                   all         71        100          1      0.958      0.986      0.645

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      10/10         0G      1.171     0.9713      1.087          8        640: 100%|██████████| 36/36 [01:00<00:00,  1.68s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.80s/it]
                   all         71        100          1       0.96      0.989      0.658

10 epochs completed in 0.172 hours.
Optimizer stripped from runs/detect/train53/weights/last.pt, 6.2MB
Optimizer stripped from runs/detect/train53/weights/best.pt, 6.2MB

Validating runs/detect/train53/weights/best.pt...
Ultralytics YOLOv8.0.138 🚀 Python-3.11.4 torch-2.0.1 CPU (Apple M2)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.62s/it]
                   all         71        100          1       0.96      0.989      0.658
Speed: 0.3ms preprocess, 103.2ms inference, 0.0ms loss, 1.4ms postprocess per image
Results saved to runs/detect/train53
630.5438086986542sec.

Train on GPU（M2）

To train on GPU, simply change the ‘device‘ in the code to ‘mps‘.

from ultralytics import YOLO
import time


if __name__ == "__main__":
    model = YOLO('yolov8n.pt')

    start = time.time()

    model.train(data="dataset.yaml", epochs=10, batch=8, pretrained=True, device="mps")

    end = time.time()
    print(f"{end-start}sec.")

The execution on GPU took approximately 30 seconds per epoch and a total of 330 seconds. This is roughly half the processing time compared to CPU, confirming the effectiveness of GPU acceleration.

Transferred 319/355 items from pretrained weights
train: Scanning /Users/tadanori/tmp/m1test/datasets/cars/labels/train.cache... 284 images, 0 backgrounds, 0 corrupt: 100%|██████████| 284/284 [00:00<?, ?it/s]
val: Scanning /Users/tadanori/tmp/m1test/datasets/cars/labels/valid.cache... 71 images, 0 backgrounds, 0 corrupt: 100%|██████████| 71/71 [00:00<?, ?it/s]
Plotting labels to runs/detect/train54/labels.jpg...
optimizer: AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train54
Starting training for 10 epochs...
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/10         0G      1.375      2.962      1.052          6        640: 100%|██████████| 36/36 [00:28<00:00,  1.27it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:06<00:00,  1.36s/it]
                   all         71        100     0.0046       0.98      0.841      0.404

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       2/10         0G      1.262      2.398      1.034          7        640: 100%|██████████| 36/36 [00:25<00:00,  1.39it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  2.16it/s]
                   all         71        100      0.826      0.714      0.857      0.408

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       3/10         0G      1.317      2.273      1.059          3        640: 100%|██████████| 36/36 [00:24<00:00,  1.46it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  1.89it/s]
                   all         71        100       0.89      0.888      0.923      0.484

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       4/10         0G      1.261      2.128      1.022          4        640: 100%|██████████| 36/36 [00:28<00:00,  1.28it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  2.06it/s]
                   all         71        100      0.905       0.86      0.931      0.431

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       5/10         0G      1.278      2.011      1.022          7        640: 100%|██████████| 36/36 [00:28<00:00,  1.26it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  2.22it/s]
                   all         71        100      0.818       0.88      0.887      0.461

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       6/10         0G       1.24      1.933      1.009          3        640: 100%|██████████| 36/36 [00:27<00:00,  1.31it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  2.05it/s]
                   all         71        100      0.843        0.9      0.912      0.466

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       7/10         0G      1.268      1.835      1.009          7        640: 100%|██████████| 36/36 [00:29<00:00,  1.24it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  2.00it/s]
                   all         71        100       0.94       0.96      0.971      0.587

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       8/10         0G      1.207      1.751     0.9709          6        640: 100%|██████████| 36/36 [00:28<00:00,  1.26it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  2.07it/s]
                   all         71        100      0.952       0.94      0.978      0.558

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       9/10         0G      1.143      1.718     0.9495          8        640: 100%|██████████| 36/36 [00:26<00:00,  1.37it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  2.07it/s]
                   all         71        100       0.96      0.951       0.98      0.577

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      10/10         0G      1.149      1.607     0.9572          5        640: 100%|██████████| 36/36 [00:27<00:00,  1.32it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:04<00:00,  1.08it/s]
                   all         71        100       0.97      0.957      0.983      0.608

10 epochs completed in 0.089 hours.
Optimizer stripped from runs/detect/train54/weights/last.pt, 6.2MB
Optimizer stripped from runs/detect/train54/weights/best.pt, 6.2MB

Validating runs/detect/train54/weights/best.pt...
Ultralytics YOLOv8.0.138 🚀 Python-3.11.4 torch-2.0.1 MPS (Apple M2)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  1.70it/s]
                   all         71        100       0.97      0.957      0.983      0.605
Speed: 0.5ms preprocess, 19.7ms inference, 0.0ms loss, 1.9ms postprocess per image
Results saved to runs/detect/train54
334.40202498435974sec.</span>

Summary of the results

Conducted benchmarking during training with YOLOv8 optimized for Apple Silicon. It appears that utilizing the GPU on the MacBook Air can result in approximately twice the speed. While the M2 CPU is relatively fast even compared to Intel CPUs, the advantage of achieving twice the speed in training on a laptop could be noteworthy.

Surprisingly, the GPU processing time for M1 Pro was slower than M2. Considering that M1 Pro has twice the number of GPU cores, one might expect it to be faster. However, the results showed that M2 had a shorter processing time.

On the other hand, regarding CPU execution, M1 Pro was faster than M2. This could be attributed to the difference in the number of CPU cores.

It’s possible that training a larger model of YOLOv8 could yield different results, and the improved performance on GPU with M2 might become more pronounced.

I also measured the results of training on a P100 (an Nvidia GPU provided on Kaggle).

Even though it’s a slightly older GPU, it’s significantly faster, being approximately 6 times faster than the CPU and around 3 times faster than the GPU on the M2.

The straightforward impression is that it doesn’t quite match the training speed achieved with the Nvidia GPU.

DEVICE		Processing time(sec.)
M2	device=”cpu”	630.54 sec.
M2	device=”mps”	334.40 sec.
M1Pro	device=”cpu”	582.68 sec.
M1Pro	device=”mps”	372.99 sec.
参考：GPU=P100（kaggle notebook）		119 sec.

処理時間（グラフ） — **Processing Time (Shorter is Faster):**

Using M2 Ultra with 76 GPU cores might catch up to the P100, but considering the cost, it might be more straightforward to invest in an Nvidia GPU.

Nevertheless, the convenience of easily conducting training on an entry-level laptop is significant. It can be quite useful for running small pieces of code, engaging in deep learning studies, and other similar tasks.

The newly released Mac with M3 seems to have significantly improved GPU performance. However, based on the benchmarks so far, it appears that it may not match up to Nvidia GPUs. If you’re serious about deep learning, purchasing a PC with an Nvidia GPU might be a better choice.

Certainly, for small-scale training, it seems to have reached a level where it’s feasible. If you already own a Mac and want to do some deep learning training, utilizing it can be a good option.

(Appendix) M1 Pro Execution Log

Here is the execution log for M1 Pro.

Train on CPU（M1 Pro）

Transferred 319/355 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
train: Scanning /Users/tadanori/tmp/yolov8/datasets/cars/labels/train.cache... 284 images, 0 backgrounds, 0 corrupt: 100%|██████████| 284/284 [00:00<?, ?it/s]
val: Scanning /Users/tadanori/tmp/yolov8/datasets/cars/labels/valid.cache... 71 images, 0 backgrounds, 0 corrupt: 100%|██████████| 71/71 [00:00<?, ?it/s]
Plotting labels to runs/detect/train4/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to runs/detect/train4
Starting training for 10 epochs...
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/10         0G      1.414      2.611      1.143          5        640: 100%|██████████| 36/36 [00:49<00:00,  1.36s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.66s/it]
                   all         71        100     0.0046       0.98     0.0113    0.00683

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       2/10         0G      1.347      1.816      1.173          6        640: 100%|██████████| 36/36 [00:48<00:00,  1.35s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.63s/it]
                   all         71        100          1      0.284      0.956      0.586

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       3/10         0G      1.375      1.765      1.164          8        640: 100%|██████████| 36/36 [00:49<00:00,  1.38s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.69s/it]
                   all         71        100      0.968       0.92      0.983      0.622

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       4/10         0G      1.317      1.602      1.147          8        640: 100%|██████████| 36/36 [00:48<00:00,  1.35s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.65s/it]
                   all         71        100      0.968       0.94       0.97      0.598

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       5/10         0G      1.321      1.419      1.145          8        640: 100%|██████████| 36/36 [00:48<00:00,  1.35s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.66s/it]
                   all         71        100      0.968       0.91      0.987      0.573

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       6/10         0G       1.27      1.321      1.143          4        640: 100%|██████████| 36/36 [00:48<00:00,  1.35s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.64s/it]
                   all         71        100      0.978       0.95      0.986      0.621

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       7/10         0G      1.276      1.261      1.088          8        640: 100%|██████████| 36/36 [00:48<00:00,  1.34s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.67s/it]
                   all         71        100       0.98       0.96      0.986      0.609

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       8/10         0G      1.244      1.134      1.109          4        640: 100%|██████████| 36/36 [00:48<00:00,  1.34s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.65s/it]
                   all         71        100      0.971       0.97       0.99      0.642

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       9/10         0G      1.193      1.052      1.097          4        640: 100%|██████████| 36/36 [00:47<00:00,  1.33s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.66s/it]
                   all         71        100      0.989       0.93      0.987      0.654

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      10/10         0G      1.157     0.9957      1.081          8        640: 100%|██████████| 36/36 [00:48<00:00,  1.34s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.65s/it]
                   all         71        100      0.971       0.95      0.988      0.642

10 epochs completed in 0.158 hours.
Optimizer stripped from runs/detect/train4/weights/last.pt, 6.2MB
Optimizer stripped from runs/detect/train4/weights/best.pt, 6.2MB

Validating runs/detect/train4/weights/best.pt...
Ultralytics YOLOv8.0.208 🚀 Python-3.11.5 torch-2.1.0 CPU (Apple M1 Pro)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:07<00:00,  1.60s/it]
                   all         71        100      0.989      0.931      0.987      0.654
Speed: 0.8ms preprocess, 106.1ms inference, 0.0ms loss, 0.9ms postprocess per image
Results saved to runs/detect/train4
582.6759631633759sec.

Train on GPU（M1 Pro）

Transferred 319/355 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
train: Scanning /Users/tadanori/tmp/yolov8/datasets/cars/labels/train.cache... 284 images, 0 backgrounds, 0 corrupt: 100%|██████████| 284/284 [00:00<?, ?it/s]
val: Scanning /Users/tadanori/tmp/yolov8/datasets/cars/labels/valid.cache... 71 images, 0 backgrounds, 0 corrupt: 100%|██████████| 71/71 [00:00<?, ?it/s]
Plotting labels to runs/detect/train6/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to runs/detect/train6
Starting training for 10 epochs...
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/10         0G      1.337      3.056      1.098          5        640: 100%|██████████| 36/36 [00:27<00:00,  1.30it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:09<00:00,  1.96s/it]
                   all         71        100    0.00263       0.56     0.0741     0.0188

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       2/10         0G      1.345      2.391      1.098          6        640: 100%|██████████| 36/36 [00:23<00:00,  1.55it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  3.24it/s]
                   all         71        100      0.201      0.173     0.0829     0.0309

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       3/10         0G      1.287      2.299      1.064          8        640: 100%|██████████| 36/36 [00:28<00:00,  1.27it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  3.22it/s]
                   all         71        100      0.221       0.29      0.106     0.0463

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       4/10         0G      1.273      2.164      1.042          8        640: 100%|██████████| 36/36 [00:28<00:00,  1.26it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  3.32it/s]
                   all         71        100      0.229       0.27      0.114     0.0464

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       5/10         0G      1.294      2.094      1.049          8        640: 100%|██████████| 36/36 [00:30<00:00,  1.20it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  1.96it/s]
                   all         71        100      0.344      0.289      0.146      0.065

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       6/10         0G      1.201       2.03      1.042          4        640: 100%|██████████| 36/36 [00:31<00:00,  1.14it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  3.09it/s]
                   all         71        100      0.269       0.32      0.127     0.0509

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       7/10         0G      1.235      1.865     0.9928          8        640: 100%|██████████| 36/36 [00:29<00:00,  1.21it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  3.26it/s]
                   all         71        100      0.276       0.33      0.115     0.0499

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       8/10         0G      1.186      1.749      0.983          4        640: 100%|██████████| 36/36 [00:30<00:00,  1.19it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  3.05it/s]
                   all         71        100      0.294       0.31      0.122     0.0532

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       9/10         0G      1.146      1.661     0.9614          4        640: 100%|██████████| 36/36 [00:39<00:00,  1.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  3.04it/s]
                   all         71        100       0.33       0.33       0.13     0.0595

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      10/10         0G      1.138       1.64     0.9573          8        640: 100%|██████████| 36/36 [00:39<00:00,  1.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  3.25it/s]
                   all         71        100      0.335       0.32      0.129     0.0623

10 epochs completed in 0.094 hours.
Optimizer stripped from runs/detect/train6/weights/last.pt, 6.2MB
Optimizer stripped from runs/detect/train6/weights/best.pt, 6.2MB

Validating runs/detect/train6/weights/best.pt...
Ultralytics YOLOv8.0.208 🚀 Python-3.11.5 torch-2.1.0 MPS (Apple M1 Pro)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:14<00:00,  2.86s/it]
                   all         71        100      0.343      0.287      0.146     0.0653
Speed: 2.0ms preprocess, 47.9ms inference, 0.0ms loss, 2.1ms postprocess per image
Results saved to runs/detect/train6
372.99035024642944sec.

#Apple M1 #Apple M2 #pytorch #YOLO #YOLOv8 #ディープラーニング #機械学習

Cancel reply

ABOUT ME