1.数据处理
整理好数据集按照要求:
先运行
import os
import tarfile
TRAIN_SRC_DIR = '/root/autodl-pub/ImageNet/ILSVRC2012/ILSVRC2012_img_train.tar'
TRAIN_DEST_DIR = '/root/autodl-tmp/imagenet/train'
VAL_SRC_DIR = '/root/autodl-pub/ImageNet/ILSVRC2012/ILSVRC2012_img_val.tar'
VAL_DEST_DIR = '/root/autodl-tmp/imagenet/val'
def extract_train():
with open(TRAIN_SRC_DIR, 'rb') as f:
tar = tarfile.open(fileobj=f, mode='r:')
for i, item in enumerate(tar):
cls_name = item.name.strip(".tar")
a = tar.extractfile(item)
b = tarfile.open(fileobj=a, mode="r:")
e_path = "{}/{}/".format(TRAIN_DEST_DIR, cls_name)
if not os.path.isdir(e_path):
os.makedirs(e_path)
print("#", i, "extract train dateset to >>>", e_path)
names = b.getnames()
for name in names:
b.extract(name, e_path)
def extract_val():
with open(VAL_SRC_DIR, 'rb') as f:
tar = tarfile.open(fileobj=f, mode='r:')
if not os.path.isdir(VAL_DEST_DIR):
os.makedirs(VAL_DEST_DIR)
print("extract val dateset to >>>", VAL_DEST_DIR)
names = tar.getnames()
for name in names:
tar.extract(name, VAL_DEST_DIR)
if __name__ == '__main__':
extract_train()
extract_val()
再运行
import os
import tarfile
# TRAIN_SRC_DIR = '/root/autodl-pub/ImageNet/ILSVRC2012/ILSVRC2012_img_train.tar'
# TRAIN_DEST_DIR = '/root/autodl-tmp/imagenet/train'
VAL_SRC_DIR = '/root/autodl-pub/ImageNet/ILSVRC2012/ILSVRC2012_img_test.tar'
VAL_DEST_DIR = '/root/autodl-tmp/imagenet/test'
# def extract_train():
# with open(TRAIN_SRC_DIR, 'rb') as f:
# tar = tarfile.open(fileobj=f, mode='r:')
# for i, item in enumerate(tar):
# cls_name = item.name.strip(".tar")
# a = tar.extractfile(item)
# b = tarfile.open(fileobj=a, mode="r:")
# e_path = "{}/{}/".format(TRAIN_DEST_DIR, cls_name)
# if not os.path.isdir(e_path):
# os.makedirs(e_path)
# print("#", i, "extract train dateset to >>>", e_path)
# names = b.getnames()
# for name in names:
# b.extract(name, e_path)
def extract_val():
with open(VAL_SRC_DIR, 'rb') as f:
tar = tarfile.open(fileobj=f, mode='r:')
if not os.path.isdir(VAL_DEST_DIR):
os.makedirs(VAL_DEST_DIR)
print("extract val dateset to >>>", VAL_DEST_DIR)
names = tar.getnames()
for name in names:
tar.extract(name, VAL_DEST_DIR)
if __name__ == '__main__':
# extract_train()
extract_val()
处理好数据 差一个 label 文件
我帮你处理好了 处理过程就不说了比较繁琐
2.使用这个 生成 extra 文件夹用于训练
from dinov2.data.datasets import ImageNet
for split in ImageNet.Split:
dataset = ImageNet(split=split, root="/root/autodl-tmp/imagenet", extra="/root/autodl-tmp/extra")
dataset.dump_extra()
过程中会报错 label
在报错位置
class_id, class_name = row
修改为
class_id, class_name,*_ = row
3.OK 环境已经配好
如果需要重新配
输入
conda env create -f conda.yaml
conda activate dinov2
即可
运行过程中会报字符串错误 将报错位置为止修改为:
def remove_suffix(s, suffix):
if s.endswith(suffix):
return s[:-len(suffix)]
return s
args.arch = remove_suffix(args.arch, "_memeff")
4.运行
github 给出的运行代码是在集群运行我们没法用
下面是单卡运行
配置我写好了:
首先
cd /root/dinov2
python setup.py install (已经做过了 不用重复做)
然后 cd 到/root/dinov2/dinov2/train
source activte base
python main.py
直接跑起来了就
5.配置
vitl16_short.yaml 修改为
train:
dataset_path: ImageNet:split=TRAIN:root=/root/autodl-tmp/imagenet:extra=/root/autodl-tmp/extra
batch_size_per_gpu: 8
student:
block_chunks: 1
train.py 修改为
parser.add_argument("--config-file", default="/root/dinov2/dinov2/configs/train/vitl16_short.yaml", metavar="FILE", help="path to config file")`
parser.add_argument(
"--output-dir",
"--output_dir",
default="~/output",
type=str,
help="Output directory to save logs and checkpoints",
)
欢迎来到这里!
我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。
注册 关于