Skip to content

Latest commit

 

History

History
executable file
·
88 lines (43 loc) · 5.34 KB

README_pmv2.md

File metadata and controls

executable file
·
88 lines (43 loc) · 5.34 KB

PhotoMaker V2: Improved ID Fidelity and Better Controllability Compared to PhotoMaker V1

[🤗 Demo]

When training PhotoMaker V2, we focused on improving ID fidelity. Compared to PhotoMaker V1, we introduced 1️⃣ new training strategies, incorporated 2️⃣ more portrait datasets, and utilized 3️⃣ a more powerful ID extraction encoder. We will release a technical report soon. Thank you all for your attention.

🌠 Key improvements in PhotoMaker V2:

  1. ID fidelity has been further improved, especially for single image input and Asian facial inputs. Of course, feeding more facial images can still yield better results.
  2. By integrating ControlNet, T2I-Adapter, and IP-Adapter, the generation process becomes more controllable. We provide corresponding scripts for reference. Additionally, PhotoMaker V2 allows users to achieve better ID consistency by combining it with IP-Adapter-FaceID, InstantID, and character LoRA.
  3. PhotoMaker V2 inherits the promising features of PhotoMaker V1, such as high-quality and diverse generation capabilities, and powerful text control. Additionally, it can still integrate previous applications like bringing characters from old photos or paintings back to reality, identity mixing, and changing age or gender.

Comparisons with PhotoMaker V1, IP-Adapter-FaceID and InstantID

We selected the three most prevalent methods in ID personalization generation, namely PhotoMaker V1, IP-Adapter-FaceID-Plus-V2 (best of IP-Adapter-FaceID), and InstantID.

To ensure a fair comparison, we used the same base model (RealVisXL-V4.0) and scheduler (Euler), and selected the best out of four randomly generated images from each method for visualization. The prompts and negative prompts were consistent:

Prompt: instagram photo, portrait photo of a woman img holding two cats, colorful, perfect face, natural skin, hard shadows, film grain

Negative Prompt: (asymmetry, worst quality, low quality, illustration, 3d, 2d, painting, cartoons, sketch), open mouth

We can see that our method has advantages in maintaining ID fidelity and in the quality of the generated images

comp_pm_v2_reba

comp_pm_v2_musk

comp_pm_v2_yanzu

comp_pm_v2_yifei

Cooperation with ControlNet / T2I-Adapter / IP-Adapter

PhotoMaker V2 can collaborate with T2I-Adapter’s doodle mode, allowing for controlled image generation based on user drawings and prompts. This feature can be experienced in [🤗 our official demo]. The following video is an example of the experience process:

photomaker_v2_demo_small.mp4

Additionally, PhotoMaker V2 can work with ControlNet and T2I-Adapter for layout control, such as edge, pose, depth, and more.

We provide two example scripts:

  1. inference_pmv2_contronet.py
  2. inference_pmv2_t2i_adapter.py

The image below is an example of controlled generation using pose through ControlNet:

pm_v2_controlnet

Our sample scripts can be referred to: inference_pmv2_ip_adapter.py

The image below is an example:

pm_v2_ipadapter

PhotoMaker V2, as a plugin, can work well with other plugins, such as IP-Adapter-FaceID or InstantID, to further improve ID fidelity, or combining with LCM for acceleration. We look forward to your exploration of more features, and welcome you to provide PRs or contribute to the open-source community

🥳 If you have built or known repositories or applications around PhotoMaker V2, please leave us a message in the discussion. We will include them in our README.

LICENSE

Since PhotoMaker V2 relies on InsightFace, it also needs to comply with its license.