YOLOv5与PyTorch实战：Python物体检测推理全流程解析

作者：半吊子全栈工匠2025.09.19 17:33浏览量：0

简介：本文详细介绍了如何使用YOLOv5模型和PyTorch框架在Python环境中实现高效的物体检测推理，涵盖环境配置、模型加载、预处理、推理及后处理全流程，适合开发者快速上手。

YOLOv5与PyTorch实战：Python物体检测推理全流程解析

摘要

本文系统阐述了基于YOLOv5和PyTorch框架在Python中实现物体检测推理的完整流程，涵盖环境配置、模型加载、图像预处理、推理执行及结果后处理等关键环节。通过分步说明和代码示例，开发者可快速掌握从模型部署到实际应用的完整技术路径，适用于工业检测、智能监控等场景的快速原型开发。

一、技术背景与核心优势

YOLOv5作为Ultralytics团队开发的实时物体检测模型，凭借其优化的网络架构（CSPDarknet backbone + PANet neck）和高效的训练策略，在速度与精度平衡方面表现突出。结合PyTorch的动态计算图特性，开发者可灵活调整模型结构，同时利用GPU加速实现毫秒级推理。相较于传统两阶段检测器（如Faster R-CNN），YOLOv5在COCO数据集上可达140 FPS（Tesla V100），mAP@0.5:0.95指标提升23%。

二、环境配置与依赖管理

2.1 系统要求

Python 3.8+（推荐3.10以获得最佳兼容性）
PyTorch 1.12+（需与CUDA版本匹配）
CUDA 11.6+（若使用GPU加速）
OpenCV 4.5+（图像处理）
NumPy 1.21+（数值计算）

2.2 依赖安装

推荐使用conda创建虚拟环境：

conda create -n yolov5_env python=3.10
conda activate yolov5_env
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
pip install opencv-python numpy matplotlib tqdm
git clone https://github.com/ultralytics/yolov5  # 克隆官方仓库
cd yolov5 && pip install -r requirements.txt

三、模型加载与初始化

3.1 预训练模型选择

YOLOv5提供多种规模模型：

YOLOv5s：6.4M参数，140 FPS（轻量级）
YOLOv5m：20.9M参数，80 FPS（平衡型）
YOLOv5l：46.5M参数，60 FPS（高精度）
YOLOv5x：86.7M参数，40 FPS（最高精度）

加载预训练权重示例：

from models.experimental import attempt_load
import torch
# 加载模型（自动下载权重到runs/train/exp/weights）
model = attempt_load('yolov5s.pt', map_location='cuda')  # GPU加速
# model = attempt_load('yolov5s.pt', map_location='cpu')  # CPU模式
model.eval()  # 切换至推理模式

3.2 自定义模型训练（进阶）

若需训练自定义数据集：

准备标注文件（YOLO格式：class x_center y_center width height）
修改data/coco.yaml配置文件

执行训练命令：

python train.py --img 640 --batch 16 --epochs 50 --data custom.yaml --weights yolov5s.pt --name custom_model

四、推理流程实现

4.1 图像预处理

import cv2
import numpy as np
def preprocess_image(img_path, img_size=640):
    # 读取图像并保持宽高比缩放
    img = cv2.imread(img_path)
    h, w = img.shape[:2]
    r = img_size / max(h, w)
    if r != 1:
        new_h, new_w = int(h * r), int(w * r)
        img = cv2.resize(img, (new_w, new_h))
    # 填充至正方形
    new_img = np.ones((img_size, img_size, 3), dtype=np.uint8) * 114
    new_img[:new_h, :new_w] = img
    # 归一化并转换通道顺序
    img_tensor = torch.from_numpy(new_img).permute(2, 0, 1).float() / 255.0
    img_tensor = img_tensor.unsqueeze(0)  # 添加batch维度
    return img_tensor, (h, w)

4.2 执行推理与后处理

def detect_objects(model, img_tensor, conf_thres=0.25, iou_thres=0.45):
    with torch.no_grad():
        # 执行推理
        pred = model(img_tensor)[0]
    # NMS后处理
    pred = non_max_suppression(pred, conf_thres, iou_thres)
    # 解析检测结果
    detections = []
    for det in pred:  # 每张图像的检测结果
        if len(det):
            det[:, :4] = scale_boxes(img_tensor.shape[2:], det[:, :4], (h, w)).round()
            for *xyxy, conf, cls in det:
                detections.append({
                    'bbox': [int(x) for x in xyxy],
                    'confidence': float(conf),
                    'class': int(cls),
                    'class_name': model.names[int(cls)]
                })
    return detections

4.3 完整推理示例

# 初始化
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = attempt_load('yolov5s.pt', map_location=device)
model.to(device).eval()
# 处理单张图像
img_path = 'test.jpg'
img_tensor, (h, w) = preprocess_image(img_path)
img_tensor = img_tensor.to(device)
# 推理与解析
detections = detect_objects(model, img_tensor)
# 可视化结果
img = cv2.imread(img_path)
for det in detections:
    x1, y1, x2, y2 = det['bbox']
    cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
    label = f"{det['class_name']}: {det['confidence']:.2f}"
    cv2.putText(img, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cv2.imwrite('result.jpg', img)

五、性能优化策略

5.1 硬件加速方案

TensorRT加速：将PyTorch模型转换为TensorRT引擎，推理速度提升3-5倍

ONNX导出：

torch.onnx.export(model, img_tensor, 'yolov5s.onnx', 
                input_names=['images'], 
                output_names=['output'],
                dynamic_axes={'images': {0: 'batch'}, 'output': {0: 'batch'}})

5.2 批量推理优化

# 构建批量数据
batch_size = 4
batch_imgs = [preprocess_image(f'test_{i}.jpg')[0] for i in range(batch_size)]
batch_tensor = torch.cat(batch_imgs, 0).to(device)
# 批量推理
with torch.no_grad():
    pred = model(batch_tensor)

六、实际应用场景

6.1 实时视频流处理

cap = cv2.VideoCapture('video.mp4')
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    # 实时预处理与推理
    img_tensor, _ = preprocess_image('temp.jpg')  # 需保存临时文件或直接处理
    detections = detect_objects(model, img_tensor.unsqueeze(0).to(device))
    # 可视化（略）
    cv2.imshow('Detection', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

6.2 工业缺陷检测

调整conf_thres至0.5以上减少误检
添加特定类别过滤（如仅检测’crack’类）
集成到Flask API实现Web服务：
```python
from flask import Flask, request, jsonify
app = Flask(name)

@app.route(‘/detect’, methods=[‘POST’])
def detect():
file = request.files[‘image’]
img_path = f’temp/{file.filename}’
file.save(img_path)

# 推理代码（同上）
return jsonify({'detections': detections})

```

七、常见问题解决方案

CUDA内存不足：
- 减小img_size参数（如从640改为416）
- 使用torch.cuda.empty_cache()清理缓存
模型加载失败：
- 检查权重文件完整性（MD5校验）
- 确保PyTorch版本与模型兼容
检测精度低：
- 增加conf_thres阈值
- 使用更大模型（如yolov5l.pt）
- 在自定义数据集上微调

八、扩展应用方向

多模态检测：结合文本提示实现CLIP-YOLOv5
3D物体检测：通过双目摄像头生成点云
小目标检测：采用高分辨率输入（1280x1280）和注意力机制

本文提供的完整代码可在YOLOv5官方仓库的utils/general.py中找到辅助函数实现。开发者可通过调整conf_thres、iou_thres等参数优化特定场景性能，建议从YOLOv5s模型开始实验，逐步升级至更大模型以平衡精度与速度需求。

发表评论

开发者关注产品榜

最热文章

关于作者

被阅读数
被赞数
被收藏数

开发者热搜

YOLOv5与PyTorch实战：Python物体检测推理全流程解析

YOLOv5与PyTorch实战：Python物体检测推理全流程解析

摘要

一、技术背景与核心优势

二、环境配置与依赖管理

2.1 系统要求

2.2 依赖安装

三、模型加载与初始化

3.1 预训练模型选择

3.2 自定义模型训练（进阶）

四、推理流程实现

4.1 图像预处理

4.2 执行推理与后处理

4.3 完整推理示例

五、性能优化策略

5.1 硬件加速方案

5.2 批量推理优化

六、实际应用场景

6.1 实时视频流处理

6.2 工业缺陷检测

七、常见问题解决方案

八、扩展应用方向

相关文章推荐

文心一言接入指南：通过百度智能云千帆大模型平台API调用

从 MLOps 到 LMOps 的关键技术嬗变

Sugar BI教你怎么做数据可视化 - 拓扑图，让节点连接信息一目了然

更轻量的百度百舸，CCE Stack 智算版发布

打造合规数据闭环，加速自动驾驶技术研发

LMOps 工具链与千帆大模型平台

发表评论

开发者关注产品榜

千帆大模型服务与开发平台ModelBuilder

千帆大模型应用开发平台AppBuilder

秒哒-生成式应用开发平台

百度智能云客悦智能客服平台

最热文章

关于作者