《YOLOv8》将VOC格式数据集转为YOLO格式数据集工具

事件背景

在YOLOv8学习过程中找到了IP102数据集但是发现数据集为VOC格式需要自己转换为YOLO格式，一万多张图片，肯定是不可能用手一张一张转的，所以使用了Python来编写一个小程序达到这个效果。

准备

Python
Pycharm
lxml
需要自己安装 pip install lxml
VOC格式的数据集

实现

在操作之前我们首先要知道VOC格式的数据集是XML文件其中哪些东西是转换为YOLO格式所需要的让我们先打开XML文件看一看

width - 图像宽度
height - 图像高度
name - 类别
bndbox/xmin - 左上坐标
bndbox/ymin - 左上坐标
bndbox/xmax - 右下坐标
bndbox/ymax - 右下坐标

但是在YOLO格式中注解的表达并不是这样的

x_cneter - 归一化坐标
y_center - 归一化坐标
width - 宽度比例

height - 高度比例
它们的转换公式如下

x_center = (xmin + xmax) / (2.0 * image_width)
y_center = (ymin + ymax) / (2.0 * image_height)
width = (xmax - xmin) / image_width
height = (ymax - ymin) / image_height

注意上面image_width以及image_height指的是XML格式中的width以及height
接下来就只需要使用XML对文件进行标签的查找、取值、计算再输出就好了
XML的使用就不再演示
使用OS进行文件的输出也不再演示
完整代码如下：

import xml.etree.ElementTree as Et
import os
import shutil


def convert_into_yolo(xml_path, txt_path):
    xml_files = os.listdir(xml_path)
    os.chdir(xml_path)
    for xml_name in xml_files:
        if xml_name[-3::] == "txt":
            shutil.copy2(os.path.join(xml_path, xml_name), os.path.join(txt_path, xml_name))
            continue
        convert_single_into_yolo(xml_name, txt_path)


def convert_single_into_yolo(xml_name, txt_path):
    # print(xml_file)
    # 定义输出路径
    out_txt_file_path = os.path.join(txt_path, xml_name.split('.')[0] + '.txt')
    tree = Et.parse(xml_name)
    root = tree.getroot()
    # 寻找指定值并进行计算
    name = root.find('object').find('name').text
    image_width = root.find('size').find('width').text
    image_height = root.find('size').find('height').text
    xmin = root.find('object').find('bndbox').find('xmin').text
    ymin = root.find('object').find('bndbox').find('ymin').text
    xmax = root.find('object').find('bndbox').find('xmax').text
    ymax = root.find('object').find('bndbox').find('ymax').text
    x_center = round((float(xmin) + float(xmax)) / (2.0 * float(image_width)), 2)  # YOLO格式的归一化坐标
    y_center = round((float(ymin) + float(ymax)) / (2.0 * float(image_height)), 2)  # YOLO格式的归一化坐标
    width = round((float(xmax) - float(xmin)) / float(image_width), 2)  # 边界框宽度对于图像宽度的比例
    height = round((float(ymax) - float(ymin)) / float(image_height), 2)  # 边界框高度对于图像高度的比例
    # print(name, image_width, image_height, xmin, ymin, xmax, ymax)  # 原始数据
    # print(name, width, height, x_center, y_center)  # 处理后的数据
    # 写出数据
    with open(out_txt_file_path, "w") as fp:
        fp.write("{} {} {} {}".format(name, x_center, y_center, width, height))


if __name__ == '__main__':
    # XML文件存放地址
    xml_path = r"XML文件夹地址"
    # TXT文件存放地址
    txt_path = r"TXT文件夹地址"
    convert_into_yolo(xml_path, txt_path)