AI_generations_locally - MyBits

MyBits drops first YouTube video!

In this first episode I show you how I use Pinokio one click installer to download Wan 2.1 and how I animate my talking avatar for this channel.

This video is for the budget minded hobbyist who doesn’t want to break the bank to start doing some AI generations at home.

Follow the free series to build and install the free open source options to get started with AI at home for free.

In this video we use a DIY budget PC with only 8GB VRAM. We use it to Generate free lip syncing video avatars using Pinokio and Wan 2.1.

Wan 2.1 installation with Pinokio

Step 1. Download Pinokio.

Pinokio download: https://pinokio.co/

Once you have downloaded it, click on the download zip folder and extract it. Once extracted open the folder and find the Pinokio Setup application and double click on it to install Pinokio. It will give you a warning for Windows click the more info link and then hit the Run Anyway button.

Step 2. Once it installs navigate to the discover tab and open the Discover Page to select the application. (In this video we are installing Wan 2.1).

Click on Wan 2.1 and the one click installer will download and install all of the dependencies it needs.

Step 3. Once installed you should see the home screen with your application on the left sidebar.

Click on the Wan2.1 icon and it will open Wan2.1 in a panel on the left sidebar. Click the Start button to load it and run it for the first time. Be patient it may take awhile for the dependencies to download and install.

Step 4. Setting up Hunyuan Video Avitar 720p to run a generation for the first time.

(I have found for my limited Vram of this machine this application works best for lip sync video generations).

Step 5. In the Start Image box select an image that you want a character to talk.

I use an image that is fairly large compare to the output resolution. I believe in the video the image I use is a PNG 1764 X 1015.

Step 7. Upload an audio file into the Voice to Follow box.

Hit the scissors tool and drag the box over just the waves in your clip. Then hit the Trim button. This is a great tool for determining the length of your video. Because we are using low vram the video length will be shorter than if we had a better video card. We could have trimmed the audio clip to 5 seconds.

Step 8. Write a prompt for the image that you are using to make talk. It makes a huge difference what you put here I try to keep it simple and to the point I tell the AI model what the character is doing, what the background is doing and the what camera is doing. I have found you need to experiment with this to get it to do what you want and not do what it wants.

Step 9. Write the Negative Prompt. Again this is not the same for all generations I tend to re-enforce the positive prompt and try to eliminate things it tends to do. Again keep in mind we are using very limited Vram.

Step 10. Setting the number of frames. 5 seconds of video is even pushing it for our low vram card so at 25fps 120 frames will be difficult and will likely run out of system ram.

Step 11. Number of Inference Steps. This will take a long time with our (8 GB)low VRam card to do 30steps (you can experiment with this setting as well) the accuracy of the lip sync, the following of the prompt , and the resolution quality are all effected by this setting.

Step 12. The resolution of 832X480 is about the best it gets for our low vram and the generation takes forever like 45 minutes to an hour?

Step 13. You can try different settings for the Shift Scale, Guidance(CFG), As of the video I have only used the default settings?

Step 14. Generate the first video.

Hit the Generate button and be patient it has to load the model the first time you run it may take longer than subsequent times.