If you are looking for an introduction to deep learning, check this article out. If you are looking for an introduction specifically to deepfakes, read this and this.
When I first heard of deepfakes at the beginning of January 2018, it seemed certain we were yet again on edge of an utterly groundbreaking event. Being a semi-active reddit user I became aware of the concept a few days before it made its way into mainstream news, first through youtubers and then by news channels.
What was my disappointment when I learned that 99% of the content posted in /r/deepfakes was porn related content. Really? You get a user friendly app any moron can use and, apart from a few Trump/Nicholas Cage related pieces, all everyone is doing is using the technology to put the cast of Game of Thrones into porn sets. Among others one could see Emma Watson, Scarlett Johansson, Gal Gadot and Chloë Grace Moretz. Popularity of the last name made me realize that perhaps the vast majority of /r/deepfakes content is generated by sexually frustrated teenage boys – but then again perhaps that is now the primary demographic that builds the foundations of progress. Never have I ever expected to see a virtual representation of Michelle Obama sucking a big black dick and yet that image is now forever imprinted in the depths of my mind. Before finding out about all of the above I had a vision similar to the one portrayed in the questionably good film “The Congress” (2013). After all, if most of the content was generated by a bunch of teenagers with basic knowledge of how to use an image search engine, then surely people behind the code are capable of much more sophisticated results.
Knowing that machine learning is already being used for real-time face capture and reenactment (Stanford University), lip-sync generated from audio (University of Washington), voice manipulation #VoCo (Adobe) and generalized text to speech animation (Disney), among many other ethically gray practices, I have grown increasingly suspicius as to where we might be headed, fueling my suspicious with episodes of Black Mirror and Westworld as have many others like me. What I saw was a promise of something much greater that would soon become part of everyday reality. A few months went by and my curiosity could no longer be ignored, especially since the groundbreaking revolution I was expecting did not seem to occur. Deepfakes became synonymous with porn with both reddit and pornhub banning all nsfw fakeapp generated content (and if you get banned by those websites you are left with 4chan and the dark web so no wonder deepfakes became infamous). It is only a matter of time before someone applies the very principles that drove FakeApp 2.2 in another app that deserves mainstream appeal. It took me a while to think of who to merge.
Despite having thousands of possibilities, my mind has been infected with images of princess Kate being absolutely destroyed. Being the London based hipster that I supposedly am I decided to lean away from conventions and so it was important to me that what is going to be merged has little to do with sex or Nicholas Cage. Merging Jordan Peterson and Kermit came to mind but that was obviously a stupid idea. In the end the choice has falled on prof. Jordan Peterson and Michael Key from Key & Peele and their sketch “Substitute Teacher“. Seemed appropriate enough. Each data set was generated from youtube videos. JBP has lots of interviews in extremely high quality (including 4K) which made the process fairly easy. Michael Key’s face was not as easy to extract; it was important his look was similar to the one in the sketch. Being part of a comedy duo there is a tremendous amount of videos with him but surprisingly not as many interviews as one would expect, and (even more surprisingly) not in great quality either. His data set ended up being generated from many 720p videos that were sliced up by me in Premiere.
After training the model for 24 hours the first video was generated. The results were atrocious, despicable and slightly frightening. It came to my attention that perhaps the subject matter has to be separated from everything else. After whacking a mask around Key and slapping a few keyframes on the timeline it was ready. The model was trained for a little bit longer and when it reached 72h of training in total the second part was generated. A tiny bit more promising but still far from Westworld. I decided I need to try with different subject matter. Inspired by “A Scanner Darkly” (2006), it was time to say goodbye to this model and try again with something else.
How do I improve my training data set quality? Ideally, your training data sets have similar lighting and face angles between the two faces to be swapped. After aligning and extracting faces, you should also remove any obscured faces from the C:\fakes\data_A\aligned and C:\fakes\data_B\aligned folders. Remove faces that have hair, hands, glasses, or other objects partially obscuring the face. You should also remove extreme lighting conditions, such as black and white photos or distorted color filters. You also need to remove any failed alignments, which usually look like diagonal full size photos.
source: https://www.deepfakes.club/faq/#Why_do_my_results_look_bad
In hindsight, it seems quite important to pay attention to such aspects of source videos a bluriness, compression artifacts, contrast ratio and lighting conditions. Resolution is not as important as I thought. (video reference)