This was mentioned a couple times on the main general, and while its pretty messy, some of the pieces on their own do work, just nothing to make it a straight automated pipeline or even something hassle free to run that is turn key at the moment. I figured that since everyone here is more less more determined and committed to the craft, that maybe we could get some of the best minds, and a little push from ChatGPT, to get this working to help streamline the process of turning anime episodes into datasets.
https://github.com/cyber-meow/anime_screenshot_pipeline
Let me provide some of my notes and observations from what I have done so far:
With frame extraction, as stated in the github, you are turning a 24 minute animation of about 34k frames and condensing it to an average of 4k/6k/9k non-frozen/dead frames, depending on the show, episode, studio, or era of said source. The work is being done by ffmpeg's `mpdecimate` which's purpose is to "drop frames that do not differ greatly from the previous frame in order to reduce frame rate."
The frame extraction command with ffmpeg provided in the github works fine, the issue is that git maker's bulk file script, `extract_frames.py `, doesn't play nice and only produces the folders while the ffmpeg script fails to execute. I did consider that video file syntax could possibly be a culprit for the script to function based on some previous errors I ran into, but it's not an issue running ffmpeg so I side stepped the bulk script.
Since I already compiled the datasets I'm currently working on from manually running the command, I haven't had the need to go back and retry the script with any modifications. ChatGPT did offer some suggestions, but required me to provide a copy of the output to review which I no longer had and didn't have time to go and reproduce.
Similar Image Removal, the base application running the filter is called `FiftyOne`, a "computer vision model" used for collecting databases, with its recent use being to build clean visual databases for vehicle autopilot AI to use. Using `remove_similar.ipynb` in Jupyter Notebook, a second round of filtering that will remove duplicate, very similar frames of a certain threshold, across the entire dataset, instead of just the sequential frames of mpdecimate. This would be cases when the animation is stretched out during talking scenes where only the mouth moves, standing shots where the camera isn't being panned, etc.
The script has a default threshold of `0.985` value of what is considered a duplicate, but I've noticed that even at this value some frames were considered duplicates and purged that shouldn't have been but that's what manual review is if you need that higher accuracy in a dataset.
The main issue I ran with this was that with my dataset (could be a personal issue), the process would be painfully slow at 1 sample/s read on the duplicate image detection Notebook script. That's one and half hours sorting through a 24 minute episode worth of already filtered frames.
Through some trial and error and ChatGPT QA, I found that switching the model used in the script provided much faster results.
If you want to test your luck, switch out the following in Cell 2:
`model = foz.load_zoo_model("mobilenet-v2-imagenet-torch")`
Message too long. Click
here
to view full text.
5 posts omitted.