GitHub – yangxy/PASD: [ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

4个月前发布 99 00

[ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization - yangxy/PASD

所在地：

中国

语言：

收录时间：

2025-10-06

其他站点:

打开网站手机查看

Ai开源项目

GitHub – yangxy/PASD: [ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

打开网站

项目概述
PASD（PixelAware Stable Diffusion）是一项发表于ECCV2024的研究成果，旨在通过改进的Stable Diffusion模型实现真实图像超分辨率与个性化风格化任务。项目由字节跳动、香港理工大学、阿里巴巴达摩院的团队共同开发，核心目标是解决传统超分辨率模型细节丢失、风格化不自然的问题，让图像处理更贴近真实场景需求。

项目提供论文链接（[arXiv](https://arxiv.org/abs/2308.14469)）、HuggingFace模型（[PASDSDXL](https://huggingface.co/yangtao9009/PASDSDXL)）及数据集（[PASD_dataset](https://huggingface.co/datasets/yangtao9009/PASD_dataset)），方便研究者复现与扩展。

核心功能
PASD支持四类主要任务，覆盖日常与专业图像处理需求：
1. 真实图像超分辨率：将低分辨率图像（如模糊的手机照片）提升至高清，保留纹理、细节（如青蛙皮肤纹理、房屋墙面质感），避免过度平滑。
2. 老照片修复：还原旧照片的清晰度与色彩（如修复泛黄、破损的老照片），恢复历史图像的真实感。
3. 个性化风格化：给图像添加特定风格（如卡通、迪士尼风格），支持调整风格强度（通过`conditioning_scale`参数）。
4. 图像上色：为灰度图自动添加自然色彩（如给黑白动物、风景图上色），效果贴合场景逻辑。

安装与使用
1. 安装方式
通过pip安装（需先安装PyTorch）：
“`bash
pip install git+https://github.com/yangxy/PASD.git
“`
或克隆仓库后本地安装：
“`bash
git clone https://github.com/yangxy/PASD.git
cd PASD
pip install e .
“`

下载模型配置：
需将`checkpoints`文件夹从GitHub主分支下载至本地（含模型配置文件），命令：
“`bash
wget O https://github.com/yangxy/PASD/archive/main.tar.gz | tar xz strip=1 “PASDmain/checkpoints”
“`

2. 训练与测试
训练准备：
1. 下载Stable Diffusion 1.5模型（[HuggingFace](https://huggingface.co/runwayml/stablediffusionv15)），放入`checkpoints/stablediffusionv15`。
2. 准备训练数据集（支持DIV2K、FFHQ_5K、Flickr2K等，仓库提供数据集下载链接）。

训练命令：
运行脚本启动训练（支持`use_pasd_light`参数训练轻量版模型）：
“`bash
bash ./train_pasd.sh
“`

测试准备：
下载预训练模型（[pasd](https://publicvigenvideo.osscnshanghai.aliyuncs.com/robin/models/PASD/pasd.zip)、[pasd_light](https://publicvigenvideo.osscnshanghai.aliyuncs.com/robin/models/PASD/pasd_light.zip)等），放入`runs/`文件夹。

测试命令：
安装测试依赖后运行测试脚本（支持调整风格化、超分辨率参数）：
“`bash
pip install r requirementstest.txt
python test_pasd.py 可加use_pasd_light（轻量模型）、use_personalized_model（个性化风格）
“`

3. 直观体验：Gradio Demo
运行以下命令启动Web界面，无需代码即可尝试超分辨率、风格化等功能：
“`bash
python gradio_pasd.py
“`

项目结构
仓库核心文件与文件夹功能：
`checkpoints/`：存放模型配置文件与Stable Diffusion 1.5基础模型。
`datasets/`：数据集加载代码（支持本地数据集与WebDataset格式）。
`pasd/`：核心算法代码（含模型结构、扩散过程实现）。
`runs/`：训练输出文件夹（保存训练好的模型）。
`samples/`：示例结果（超分辨率、风格化后的图像与动图）。
`train_pasd.sh`/`test_pasd.py`：训练与测试脚本。

关键更新
2024年9月：发布PASDSDXL（性能优于PASDSD1.5），支持更高分辨率处理。
2024年7月：论文被ECCV2024接收，更新论文版本。
2023年10月：上线Colab Demo（[链接](https://colab.research.google.com/drive/1lZ_rSGcmreLCiRniVT973x6JLjFiCb?usp=sharing)）与Gradio Demo。
2023年9月：首次上传源代码、预训练模型与训练脚本。

注意事项
GPU内存优化：使用`tiled vae`方法（来自multidiffusionupscaler项目），可节省内存，支持超高清图像处理。
个性化模型：支持加载社区模型（如majicMIX realistic用于超分辨率、ToonYou用于卡通风格），需放入`checkpoints/personalized_models`。
参数调整：若默认效果不佳，可调整`seed`（随机种子）、`prompt`（文本提示）、`upscale`（超分倍数）等参数。

联系与引用
若有问题，可联系作者：[yangtao9009@gmail.com](mailto:yangtao9009@gmail.com)。
引用格式：
“`bibtex
@inproceedings{yang2023pasd,
title={PixelAware Stable Diffusion for Realistic Image SuperResolution and Personalized Stylization},
author={Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang},
booktitle={The European Conference on Computer Vision (ECCV) 2024},
year={2023}
}
“`

该项目为图像超分辨率与风格化提供了高效、灵活的解决方案，适合研究者与开发者探索基于Diffusion模型的图像生成任务。

GitHub – camenduru/PanoHead-colab

Contribute to camenduru/PanoHead-colab development by creating an account on GitHub.

码多多技术社区 – AI知识库PHP-Java版 – AI数字人 – AI变现

ChatMoneyAI是专注提供AI系统源代码解决方案的技术团队，ChatMoneyAI目前已开源「ChatMoney-超级全能AI变现系统」、「ChatAI-聊天绘画系统」、「ChatPaper-论文写作系统」，拥有PHP和Java两种语言版本，技术实力强，系统体验好。

GitHub – LeeeSe/MessAuto: 自动提取Mac平台的短信和邮箱验证码；Automatic extraction of 2FA codes from iMassage and Mail App for Mac platform

自动提取Mac平台的短信和邮箱验证码；Automatic extraction of 2FA codes from iMassage and Mail App for Mac platform - LeeeSe/MessAuto

GitHub – EutropicAI/Final2x: a cross-platform image super-resolution tool

a cross-platform image super-resolution tool. Contribute to EutropicAI/Final2x development by creating an account on GitHub.

GitHub – microsoft/TypeChat: TypeChat is a library that makes it easy to build natural language interfaces using types.

TypeChat is a library that makes it easy to build natural language interfaces using types. - microsoft/TypeChat

QAnything-网易有道本地知识库问答系统

QAnything是网易有道推出的基于自研子曰大模型及RAG（检索增强生成）能力构建的本地知识库问答系统。目前支持的上传文档的格式包括：PDF(pdf),Word(docx),PPT(pptx),XLS(xlsx),Markdown(md) ,电子邮件(eml),TXT(txt),图像(jpg，jpeg，png),CSV(csv),网页链接(html)

GitHub – AntonOsika/gpt-engineer: CLI platform to experiment with codegen. Precursor to: https://lovable.dev

CLI platform to experiment with codegen. Precursor to: https://lovable.dev - AntonOsika/gpt-engineer

GitHub – eosphoros-ai/DB-GPT: AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents - eosphoros-ai/DB-GPT

暂无评论

您必须登录才能参与评论！

立即登录

暂无评论...

GitHub – yangxy/PASD: [ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

相关导航

GitHub – camenduru/PanoHead-colab

码多多技术社区 – AI知识库PHP-Java版 – AI数字人 – AI变现

GitHub – LeeeSe/MessAuto: 自动提取Mac平台的短信和邮箱验证码；Automatic extraction of 2FA codes from iMassage and Mail App for Mac platform

GitHub – EutropicAI/Final2x: a cross-platform image super-resolution tool

GitHub – microsoft/TypeChat: TypeChat is a library that makes it easy to build natural language interfaces using types.

QAnything-网易有道本地知识库问答系统

GitHub – AntonOsika/gpt-engineer: CLI platform to experiment with codegen. Precursor to: https://lovable.dev

GitHub – eosphoros-ai/DB-GPT: AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

暂无评论