Wan2GP - Multi Talk - single character
Wan2GP MultiTalk – Single Character lip sync. This video will show you my work flow in MultiTalk for a single character speaking. We are running Wan2GP version 8.99 by DeepBeepMeep. We are using a Nvidia GeForce RTX 3060ti graphics card with only 8GB of VRAM. In this video I show you the settings I use to make sure the character we want to speak is actually doing the talking. Sometimes this can be a hit or miss situation so I’m sharing some tips on how to make it more accurate.
Follow the free series to build and install the free open source options to get started with AI at home for free.
Wan2GP - Single Person Speaking
Step 1. Open your Wan2GP installation. If you don’t use a batch file go back to the episode called – Batch File for Starting Wan2GP
https://youtu.be/cZnCNoCAvoc?si=9v23zzw5b2l9u9nW
Step 2. I use presets to help speed up the workflow. If you want to learn how to use presets go to the previous eepisode on Wan2GP – Presets at this link:
https://youtu.be/b3lX-5EQia4
Select a preset or select the Wan 2.1 from the drop down window on the top left. Then select MultiTalk 480p from the drop down window on the top right.
Note: If you use presets then it loads everything you had loaded when you save the preset with the exception of the start image and the audio file. That is why I use them.
Steep 3. Load an image into the Start video with image window by clicking on Drop Media Here. Always use an image that is the highest resolution that you can. My image has three characters.
Step 4. Select Video Mask Creator from the topmost selection menu. Select Image and load the same image into the Upload image box. Then click on Load Image and it opens a new window called Step2:Add Masks
Click on the character you are trying to isolate. It will turn the character a blue color. Once you have the character selected hit the Add Mask button on the bottom of step2 window. Then hit the Image Matting button on the bottom of that same window. It will open two more windows oon the bottom. Foreground output and Mask.
Step 5. In the video I explain what the Bounding Box numbers mean.
Mask BBox Info (Left:Top:Right:Bottom) 22:18:52:90
In my example the number means that the mask is 22% from the left side , 18% from the top, 52% from the left side, and 90% from the top. This is the bounding box around the character that the multi talk model will make speak the included audio file.
Note: In MultiTalk single person speaking there is no place to put this BBox number.
Step 6. So in single person speaking in MultiTalk in order to gain better control over the generation.
Click on the Button: “Set to Control Image & Mask”
at the bottom right of the window.
Step 7. When you click that button Set to Control Image Mask it automatically returns you back the Video Generator screen.
Step 8. Now I select my audio file to add to the generation.
Note: In the video my example I trim down to 2 seconds 3 seconds is optimal for my VRam because it keeps the length down to 81 frames. You can do what ever length generation you have enough memory for.
Step 9. Check the rest of the settings:
( In the video I realize I never loaded the preset that loads all info except the image and the audio clip) So after I fix the preset I go back again to the mask generation to make sure it sends it back correctly after I loaded the preset for this scene.. Sorry for the confusion. Again if you don’t use presets yet, go back to the previous video and learn that. Presets load everything with the exception of the image to start and the audio file.
Once you have the settings adjusted to our liking hit the generate button.
