/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Index Catalog Archive Bottom Refresh
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

US Election Thread

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

(14.05 KB 480x360 yE27iAozf74.jpg)

Version 315 hydrus_dev 07/18/2018 (Wed) 22:22:48 Id: d7b61a No. 9429
https://www.hooktube.com/watch?v=yE27iAozf74 windows zip: https://github.com/hydrusnetwork/hydrus/releases/download/v315/Hydrus.Network.315.-.Windows.-.Extract.only.zip exe: https://github.com/hydrusnetwork/hydrus/releases/download/v315/Hydrus.Network.315.-.Windows.-.Installer.exe os x app: https://github.com/hydrusnetwork/hydrus/releases/download/v315/Hydrus.Network.315.-.OS.X.-.App.dmg tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v315/Hydrus.Network.315.-.OS.X.-.Extract.only.tar.gz linux tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v315/Hydrus.Network.315.-.Linux.-.Executable.tar.gz source tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v315.tar.gz I had a mixed week. The gallery rewrite proved more complicated than I expected, so I put it off for a week. all misc stuff this week I updated to the new version of wxPython. It has a bunch of bug fixes that I think will help some menu regeneration problems some users have had, and I hope improve stability for the platforms/flavours where crashing has been a persistent issue. Please let me know if you notice your jank-situation improves or declines. The old default tag import options ui under options->importing is now completely removed. Everything is now under network->downloaders->manage tag import options, which now also has some new copy/paste buttons to help mass-apply TIOs across multiple specific url classes in its list. You can now set default 'checker options' for subscriptions under options->downloading. The various checker options buttons across the program also have tooltips for quick review. I added some human-friendly number sorting this week and applied it to the paths that get entered in the file import paths dialog (when you drop a folder on the client). It makes files named like 'Favourites - 10.jpg' sort [10, 11, …, 99, 100] rather than the purely lexicographic [10, 100, 11, …, 99]. If you have had issues here, let me know how it works and where else it could apply! The migrate database dialog got a pass. It presents some information better and its workflow is simpler. I'll update the help soon to match it. The moebooru parser now fetches the original 'png' of files where available, and I added a tumblr post parser that also fetches the post tags–if you want this, hit network->downloader definitions->manage url class links and edit->set it for the 'tumblr file page api' entry. full list - got started on the big gallery update, but decided not to pull the trigger just yet. I hope to do it next week, switching the whole thing over to a two-object multi-watcher kind of deal - updated to wxPython 4.0.3 for all platforms - cleaned up some menubar replacement code, and the update to the new wxPython should also fix a "event for a menu without associated window" bug some gtk2 users were seeing on quick menubar changes - manage default tag import options panel now has copy/paste buttons that work on the listctrl - added some 'paste tag import options' safety code to make sure no one accidentally pastes a subscription or something in there, wew - added default checker options for subscriptions to options->downloading - unified how checker options are edited from their button, much like how file and tag import options work. it also has a summary tooltip on the button - the checker options under options->downloading are now these slimmer buttons - in the manual import dialog (which pops up when you drop a folder/files on the client), the files will now be added in 'human friendly' number sorting, so files of the sort 'Favourites - 10.jpg' will sort [10, 11, …, 99, 100] rather than the purely lexicographic [10, 100, 11, …, 99] - gave the migrate database dialog a pass–a bunch of misc presentation changes and a general simplification of workflow, now based more on just increase/decrease location weight - a bunch of texts on page management (left-hand) panels that share horizontal space with buttons should now ellipsize ("downlo…") when they get too long for the width instead of drawing in an ugly way over the buttonspace - moved the manage import folders dialog to the new listctrl and added a 'paused' and better 'check period' column - if a user tries to run a 'paused' import folder specifically from the menu, the import folder will now unpause (I will probably remove this old paused variable in the future–it isn't of much use any more) - tightened up some repository reset code that wasn't deleting all service tables and hence recovering from some service id malformation errors correctly - wrote a 'clear orphan tables' db maintenance routine that kills some spare tables some users who have previously deleted/reset repositories may have floating around - fixed an issue with parsing folders after hitting cancel button on the import files pre-dialog - if watchers encounter non-404 network errors during check, they should now just delay checking for four hours (before, they were also pausing checking completely) - if watchers are in 'delay' mode, they'll also not work on files.
[Expand Post]- file and gallery downloads that hit a 403 (Forbidden) will now present a simpler error status, like they do for 404 - the new post downloader will no longer fail if one of the parsed source urls is not a url. the borked string will also not be associated as a url - regular gallery downloads now override bandwidth for the file download step, which is almost always the second half of a pair of post_url/file downloads, just to keep things in sync in edge cases - cleaned up some timestamp generation and 'overriding in x seconds' strings to be more human friendly - improved some serverside file parse error handling to propagate the actual error description up to the client a bit better - fixed typo causing incorrect num_ignored count in file import status button right-click menu - parseexceptions will now present more data about which page and content parser caused the problem. I am not totally happy about how this solution works and may revisit it - the lz4 import error catching is now more broad to catch some odd problem I discovered in new Linux build environment - the moebooru parser now fetches the original png of an image, if available - added a new tumblr parser that also gets post tags–it _shouldn't_ be the default - the new login pipeline now kicks in for the legacy logins–pixiv and hentai foundry–on a per-url basis, so adding pixiv/hf urls to the url downloader will trigger a login even if needed (previously, this was tied to legacy gallery initialisation, which explains some pixiv 'missing' login stuff some users and I were having trouble with) - if the legacy login system fails in the new pipeline, it now sets a flag and won't try again that client boot - the old 'default tag import options' panel is now completely removed from options->importing. please check 'network->downloaders->manage default tag import options' for the new url-based settings - misc fixes next week I'll see about this gallery stuff. I meant to sneakily swap out the basic gallery parsing with the new code this week, but it was way too awkward on its own, and I realised I had to do a bigger 'search' improvement along with it. I expect to convert regular download pages to something like the multi-watcher–something that lists and works on each individual search separately. I've got most of the data-side code written here, so I really just have to make some new ui and then test it a bunch.
>>9466 with flif I thought of that too, I don't share images, so if my entire image archive converted to flif I would have very few problems, however the only ones I would want auto converted are png/other lossless images into another lossless codec, that way converting them out of it for share is lossless too, for jpegs, its image specific, but hydrus intrunal, as in the thumbnails, i'm ok with every single image being converted there and those are not for sharing at all, that and I have around 50gb of a ssd filled with hydrus thumbs. even if its an incrementally smaller size, it adds up. as for me processing them, its honestly just a rough pass only rating things I want to keep track of as 'favorite' and things I need to keep track of 'favorite' and 'tag later' till I can burn through duplicates, I don't really want to go through all my files and really get down to business with rating that said, im rateing in gated ways pony/not pony funny/not funny art/not art safe/suggestive/explicit real/drawn tag later/nothing favorite/nothing for the most part this massively culls images into areas that I can whittle down what I want to deal with. Art is largely is this an artistic pose/something of value to learn from rather then something line anime read drawn segregates images fairly hard, but the first hard gate is safe/suggestive/explicit denouncing not porn, I don't consider it porn, porn If you are dealt with with gate 1, you will be rated that no matter what, while most other things are modifiers, real/drawn is also a gate status, however honestly getting that and is it porn can be done in one go, createing 6 distinct categories for images, ill likely gate off the drawn with eastern or western style, then inside each style, comic page or not, from there stand alone or multi image, with this being an end state as there is no way in fuck im tagging every god damn comic page, however I would also set a 'tag further' status to the ones that are worth tagging. I like working in ratings more than tags as its far less of a hassle to set them then it is a tag currently. What I want is something that can be expedient, because if it took me a minute to tag an image, and in said minute 6 more images came into the archive, it would take a depressingly long time, however if it takes me 1 second to see the image, determine what it is drawn or real, another second or two to see if its porn or not, a once over if its funny, I could possibly rate at a pace faster then my import, however before I even consider doing that, I have to have a way to notate an image and delete it, then see the notation on why it was deleted, otherwise I am needlessly triggering ocd adding any amount of images to the archive. as now I have to check 'why was this deleted, was it a mistake' and 2.3 million files of which I may only want to keep 1 million, along with around 280k of them being duplicates or potential duplicates… my image archive is a fucking nightmare., thank christ for hdds getting bigger faster then I use space.
Just found out about this program. I was totally expecting to see a dead board but it's nice to see that this is getting updates though i'm not sure why you would need weekly updates for this kind of thing
>>9453 >Some additional error information is supposed to pop up if you try to open externally if you have help->debug->report modes->callto report mode turned on I get this >2018/07/24 22:36:39: Attempted to launch /mnt/ext4/Hydrus/db/client_files/f51/51a1fb14b8227f0f27b1d6d1278deefa2f3cdbab0b7977cf6eab6a38c539dd27.webm using command ['mpv', '/mnt/ext4/Hydrus/db/client_files/f51/51a1fb14b8227f0f27b1d6d1278deefa2f3cdbab0b7977cf6eab6a38c539dd27.webm']. >2018/07/24 22:36:39: No stdout or stderr came back.
>>9473 Well, for now I've replaced the liblzma file in the hydrus directory with the one I had in /usr/lib/ and it works. If fixing it is a huge mess it's not a huge priority for me as everything seems to work fine like this.
>>9471 weekly updates… go back a few versions and see what changes week to week, some changes are massive, some are incremental take a look at when multi thread watcher got implemented and each release after improved it. the most basic form of it was workable and a huge resource saver, in fact I don't think I can possibly make the program crash in the same ways I use to since it got added I think I have somewhere around 500-700k images in in thread watchers just waiting to be used. dealt with, a good number of these are already in archive and are duplicates so I'm holding off the mass processing, but having them all separated into themes that can be quickly parsed and feel like I did something… that's really nice.
When importing files with 315 the order is always forced to be by name, this is quite disastrous for me as i always sort my import folder by date created and then drag them into hydrus to save that time (since hydrus only cares about imported time and not created on pc) I mostly download images from boorus with the booru tag parser, and that gives them an hash filename, being forced to having them being uploaded by name order is messy enough to keep me from using 315 (the "solution" would be to use a renamer to have the name reflect the date created time, but it's a mess because of the paired txt files for tags) Anyway, is this an intentional change or a bug? Is there an option to get back the drag&drop order? I really can't use 315 because of this
>>9466 >>9455 The only real impediment to me adding more image format support is library support. I use Pillow (PIL) and OpenCV to do all my image rendering, so if either of them add FLIF or HEIF, I can usually add support in hydrus within a week. Or, if you can point me to a simple library on PyPi (pip) that can take a FLIF and give back simple metadata like resolution and go path->bmp no prob, please do and I will check it out. My general feeling on this is that these are all meme formats for now, and everyone is waiting to see which is actually going to be useful IRL for real problems. WebP is another. I am personally hoping for high colour depth (HDR, proper greens etc…) to take off in the next five years or so, so I am personally cheering for >8bits per channel. A quick look suggests FLIF does up to 16, so it gets my vote. iirc, WebP is stuck at 8. I haven't touched HEIF much, but I think it has a gif-tier mickey-mouse container format that supports some kind of animation–my rough guess is it is Apple-backed corporate bullshit that may only actually work well on iPhones. Is there a good lossy format in the pipeline? Something with high colour depth, decent compression and CPU/GPU usage, and no over-engineered 'it streams great from our legacy CDNs/proprietary hardware' complexity? I'd ideally like a format that could do both lossless and lossy, and then we could shove the default decision on how to encode image data to the computer and cut the normies completely out of the loop. But maybe the companies would make the default '80% quality' shit, ha.
>>9456 Thanks. Some of this 'deal with larger groups of potential dupes when some relationships are already known more efficiently' logic is in there, but the whole duplicate system is awaiting a rewrite from the current 'masses of pairs' logic to 'defined groups', which will make figuring out which pairs to present and in which order a lot easier. My very very old 'figure out rating' filter had side-by-side, and it was unfortunately a workflow and maintenance disaster, enough that I deleted the whole thing. Having the quick flick back-and-forth is much better for seeing differences, although again I'd like to improve how the viewer works at high zoom and so on and so on. Do you happen to still have that 404-erroring URL? If it happens again, please check the URL and send it on. And visit it in your browser, just in case there is some unusual or broken 404 that is being generated or something.
>>9466 >>9461 >>9469 I like to do something like: system:inbox system:limit=512 rating:has rating 'space babes, very important check these first' You can clear them out in batches without making your session sluggish, and just hit f5 to throw up another random sample of whatever the rating category is. I've got a bunch of named ratings that I have mapped on shortcuts, so when a thread is dead, I can just ctrl+a and then mass-give them an appropriate quick search rating so I can find that pool of the inbox again. A side thing on FLIF and high colour depth etc…: I assume that if/when we move to HDR displays, old jpegs and anything else 8bit are going to look a bit flat, in the same way you can tell a gif by the shit palette. Any mass conversion to a new format we do will be tinted by this legacy issue. It isn't as big as going from black and white to colour, but I think it may matter, and maybe we'll end up having neural networks doing 'colourising' for deep greens or whatever, since that tech is happening at the same time.
>>9466 For advanced ML, my guess is that the AI are quickly going to be so good, we'll just fall in love with the new stuff (and them by proxy) and soon forget about our broken old human relationships, and then a few years later, they'll just kill us all in our sleep out of pity. So it probably isn't a long-term issue to think about. :^) Thanks for your thoughts on the dupe stuff. I like this idea. The original dupe system works on collections of pairs, which is the biggest thing holding me back right now–I'll convert the db side to 'groups', which will be much easier to deal with, logically, for larger decisions like these. I've still got about a hundred quality-of-life jobs to do from the initial dupe work and I am buried in the downloader now, but maybe I can catch up on some of this stuff and fit in the db change (which will also affect siblings and parents, which have very similar issues) in the interim between the download job finishing and the next big job (chosen by user vote) starts.
>>9478 >>9471 I'm a leave-my-homework-until-the-night-before kind of person, so if I have a release period of a month, I won't do anything significant for the first two weeks at least. This weekly schedule has kept me ticking ok for a few years now, although I still have problems managing my overall todo and thinking in the more strategic 'where is this going' long-term. It is also helpful for quick iterating with feedback–I can try out a button for some odd, complicated task, and if it doesn't quite do the job for the users who care, I can change it again the next week. Another nice thing is when I fuck up, I can roll out a fix in a few days without too much hassle. If you get into hydrus, you don't have to update every week if that doesn't work for you. Plenty of users only dip in once every two or six months or whatever. Let me know if you run into any problems!
>>9473 >>9474 Thanks for this. Linux environment shit is all greek to me, but yeah, it looks like the process started by hydrus is inheriting the cwd or something pyinstaller set up and so pulling .so files in from hydrus dir instead of mpv. I will have another look at this, see if there is some flag I can set up to say 'new process instead of child process' or whatever works.
>>9495 Thank you for this report. This was due to the new 'sorting by human numbers', which sorts 'blah - 99.jpg' before 'blah - 100.jpg'. I am likely going to be full up doing gallery stuff for the rest of this week, but I will put a checkbox for 'keep original sort' onto that dialog this or next week. Please hang on to your to-be-imported files for now and give it another go later (or roll back to v314, I am not sure how doable this would be).
>>9503 I think flif has a "lossy" mode through progressive decoding or whatever is called where you can just use the first however many bytes as a lossy version
Is it possible to make hydrus "forget" the duplicate relationship after the images get deleted, or at least have it as an option? This is my issue : i filter some duplicates better\worse and all is fine, but then let's say months later i come across the same picture and i upload it again since i forgot about it Now, because months earlier i already filtered that image as a worse counterpart than the one i have, the next time i launch the duplicate filter that pair is not gonna show up since for hydrus it was already resolved, leaving me with a pair of duplicates that'll be almost impossible to find The only way to see it would be to reset ALL duplicate relationships, or check in system better\worse once in a while to see if you accidentally reuplodaded a duplicate It'd like it better if hydrus presented me the pair again in the dupe filter, it'd be better than having seemingly invisible duplicates around Also in the better\worse decision, i'd prefer for hydrus to _not_ strip the worse one of all its tags and copy them on the better one, if it ever got uploaded again (even if by mistake) it would be a tagless image, making it even worse to locate in case of duplicates, also sometimes you don't even want to transfer the tags over but hydrus will do it anyway (unless i'm missing an option to disable this), but it will not transfer the rating (would be nice if it did)
>>9504 No I don't have the url anymore, I parsed the images and killed that watcher so it would stop. the main problem with flicking back and forth in duplicates is it refuses to remember zoom levels or positions. however, I run into a problem every now and then where the program will lock up for a few moments, most likely tied to how many images I have open, I no longer get the massive ones after sending you the log that said 80 seconds were spent saving the fucking thing every 5 minutes, that asside, every now and then the program would lock up, I would be waiting getting more and more pissed off that its locked up and alt tab, wait for the program to work, and go back to it. now the main thing I want to bring that up for is this (see pic) the program can already open up 2 viewers now if durring duplicates, it opened two but split it between left and right half of the screen, mirroring inputs to both sides, that would be great, but I have no idea how much work that would take, it just seems that functionally, it could work without to much of a hassle. >>9505 one of the main reasons I got a g13 was macroing like that. Only problem is I have 580k images open semi sorted into a category and a part of me wants to burn through them and its winning over the 'just close the shit and do it the way you said' I am parsing more images then I add… I think… on the topic of hdr, its fucking nothing honestly. I have a 4k monitor thats currently running at 8bit because I cant input 10bit and have 4:4:4 chroma, not enough bandwidth through hdmi to support that. sure, 10 bit looks good, but its honestly more in the banding then in color. let me put it this way, the difference between 123 123 123 and 123 123 124 is something im kind of able to see, its something I have to know im looking for it to see it… god I fucking love color representation on this monitor… but if you have 10 bit, you effectively go from 255 colors to 1023 colors, one value being 0 so the color 492 492 496 would be the exact same two colors and they are already fucking hard as hell to tell the difference, you are never going to notice the 3 colors in between. shit is going to be great for gradiants, hell, this video https://www.youtube.com/watch?v=74SZXCQb44s shows off hdr and non hdr VERY well, the problem is this is a tech demo done is such a way that there is a stark contrast between hdr and non hdr Honestly the new local dimming types being introduced will have a bigger overall impact in image quality as it adds to contrast then 8bit vs 10bit. What I mean is, an 8 bit photo will still look the same as a 10 bit photo, the only difference is with less colors to have as inbetweens you will see banding in 8 bit more so then in 10 bit, at the same time, we already have things that deal with debanding, on the gpu, so its a bit hard to really say if an impact will even be felt. I have watched hdr content, played hdr games (though if it was hdr or displayed correctly is up in the air) and all I can say is I notice the contrast FAR more than the color gradient on non tech demo videos.
>>9515 You can change the way duplicates give tags and ratings, however when I last did a test it was broken/I didn't get it to work. also, is that the way that hydrus works? Just downloaded the zip and tested… wow, that is how it works. I don't know about tags because I am not fucking adding the repository to a test install, but yea, So h dev, personally I don't want want that they are duplicates to be forgotten, however, a button/option to go though known same pairs would be nice. here, I go though dups, I find duplicates, I delete the low res one. sometime down the road I re import it for some reason, lets say its a batch of files that I want all imported regardless of what it is. Now both files are back in, a new filter to recheck would be good. hell a recheck for every option would be good, potentially tedious, but good. lets say I have a bunch of alternate versions of an image, it would be nice to be able to go through the alternate pairs, and same quality again, if only to check your work.
>>9517 Could you point to where i could change the way the duplicate filter gives tags and ratings? I'm afraid i can't find where that option is Anyway, of course i wouldn't want the duplicate to forget the duplicate relationship by default, but having it as an option would be handy It's really just for the cases where you upload again images that were already filtered and deleted in the past, it's bound to happen at some point, but hydrus is not gonna show the pair again in the filter because it still remembers the relationship But i really need to count on hydrus to tell me i'm reuploading worse duplicate images, since i'm bound to forget and make mistakes i need it to tell me when it happens by showing me the pair again in the filter (of course only if it concerns previously filtered&deleted images, i don't want to go through every single group of duplicates again every time i reupload a past dupe)
Add vim shortcuts and i'll suck your dick.
>>9518 In the filter scroll the mouse to the top, and at the center top there is a gear, that should open up a way to change what happens when you mark something as better. Personally I have been rating things more so tagging so my interest came in with that, but I don't think I was able to get it to duplicate tags, and now that I know the way it works is if I test it and reintroduce the files to the archive, hydrus decides 'nah, ill never show you them again' I don't really want to test further. I Also have a problem where my database fucked itself because of power outs and an ssd, recovering it worked, but as a side effect I have 2.3 million files in my archive, but only 1 million files are in my duplicate file processing tab. Its an issue hdev knows/knew about at the time, but I told him dup processing is a curiosity till various things get implemented, namely a way to notate why a file was deleted. this could help you because if duplicate processing notates an image, that note survives being deleted, and if you accidently re add a dup to the archive, if duplicate filtering adds a note that says "Duplicate - Worse" to the note, it would be possible for hydrus to search it in the future. not going to lie, its a bit of nigga rigging, but for your purpous the note system and a searchable note would work. ALSO, on import you would have a log of the files, if user notes show up you could see you accidentally imported a previously confirmed duplicate image. for me the system to show notes, and ability to quickly notate things on deletion expands functionality of hydrus so much as it allows me to delete files, give a reason, and move on because right now if I see something was deleted I have to see why. because of this, I think I have only fully deleted 300 maybe up to 1000 images, when I have a duplicate filter telling me I have 100k images to go though, and once notation is in and my dup process is fixed, i can add at least 180k more images to it, along with going though my images and culling them.
(105.27 KB 417x209 client_2018-07-29_23-02-30.png)

>>9429 While we are on the topic of duplicates, I noticed in search, I can search for 5 or more unknown relations, so I'm thinking fuck it, lets see how many duplicates I have in the 5 or more category and I found this. Images like this are why I want a 'preferred alternate' option, because, at least of these two images, the one with the gas fumes is the higher quality image, but If I had to pick between the two I would rather have the lower quality version. I can think of quite a few scenarios I would do this in, I have an image cg set where the artist has 10-20 images that are all effectively the same, but I only really want to keep a few of the images, in that case I would like to pick a few of the images and label them "preferred alternate" I could do this with some fucking around with what I call the images, get nearly the same effect, but if possible I would prefer a real category.
>>9521 >those pictures.
>>9522 tea thats what happens when you mass download images without culling though them
(199.43 KB 467x456 1442743373405.png)

>>9521 >because, at least of these two images, the one with the gas fumes is the higher quality image the shit i read on this site, i swear to god
>>9524 if we are talking about quality, not the subjective is 'this my taste' but how good the image is, the one with gas fumes has little to no compression artifacts, the one on the left however looks like its been through a few passes of saving a jpeg as a jpeg and lowering the quality each time. situations like this is why I would like a 'prefer alternate' button, if the 'prefer alternate' was in the note that I could see on a log page, I would know the exact reason it gone right now we got this is better same quality alternate not duplicate While I think we agree that the one without fumes is the better image, a strictly 'this is better' at least in terms of duplicates says they are both the same image, but an alternate image doesn't to anything with them, which is why a 'preferred alternate' button I think is the only option that missing. I think there are alot of people who have an image set thats 20 or so images, largely all the same image, just slight variance in facial expressions, this can/will/does create a duplicate image chain where they may all get caught they may not all get caught but realisticly you have 1 maybe 2 images you want to keep in that gallery I imagine this is how the system would work. lets say you get a file that has a chain of 20 or so images for the sake of argument and the one two images here >>9521 are part of it. you flag the left as the prefered alternate and it goes on saving that image for later. as there are still at least 18 more images it could be pitted against each other with everything goes smooth, you finish, and you are left with 2 images, 1 is the prefered alternate the other is a higher res/quality of the same image but they never came up to each other in the general filter Then you get another option, the prefered alternate filter You then go through what's left there and see if what's left is a better worse pair, in which case you label it better worse. or the images are part of a set and you wish to keep both, so they are now alternates. The main reason for this at least in my opinion would be to help with image sets. The images that are prefered would be taken out of normal filtering and set aside for special filtering, so you don't constantly question 'is this the image I wanted?' and instead the image you want gets pulled out and set aside. after you go through all the duplicates, you may have something where its 2 or more prefered alturates or its 1 or more prefered and a contender image it would then pit 2 of the images together to either get taken out in a 'this is better' filter, or in an 'alternate' filter. I said it above too, where a contender image is just a way to expedite filtering, let's say you have an image that survived though a massive filtering, and a new image gets put in, in all likelihood it's not going to be better than what you have, so instead of going through a normal filter where 2 unknowns are pitted against each other, the contender filter pits the images against 1 that went through a 'this is better' and came out the winner against the new image, this also makes it a bit easier to filter as if you only filter condenser images, you could just set a rule that any file size smaller then it, or resolution lower, is just automatically filtered, and you are only presented with ones where it's honestly a question whether or not the new image is better. granted, there would also need to be an option to disallow an image to be a contender, as there are some images I have that are the best quality I can find, but god knows someone increase their size or fucked with it just enough to be a wallpaper. I think these two filters/modes would greatly increase the ability of duplicate filter be less of a pain in the ass to do.
>>9506 I didn't realize this thread was still alive. Glad to hear you like the ideas for the duplicates, my vote is also to get the downloader/login engines both done before worrying about duplicates. I spend most of my active time working through duplicates, but I spend ALL of my passive time downloading and sometimes I run out of things that need to be downloaded because I can't easily pull imagesets and tags from as many sites yet. >>9507 Can confirm, I almost always wait until a version with only minor changes drops to update because that way everything is basically guaranteed to have any significant bugs already fixed.
Why does the duplicate system find a bunch of dupes at search distance 4, then no more at distance 5, then a ton more at distance 6? It seems limited to numbers powers of two. Just wondering, it seems odd. Being able to increase the possible dupes at a lower number at once would be helpful, it wouldn't feel as overwhelming.
can we get a downloader option for derpibooru in a future update? the parsers and URL classes seem to be working (because URL downloads perfectly bring all the tags and full resolution picture), it just needs a downloader to allow for mass imports through tags
>>9531 hydrus doesn't up front tell you the duplicate distances anymore, so I cant remember exactly but it seems like the lower the number, the more of an exact match is needed to call it a dup as in less wiggle room. The program already has a form of duplicate filter, where it filters out exact same images, so some distances may be effectively useless as the only way to get them would be lossless images where you change 1 pixel, and the only reason to do that is evading duplicate image filters on a chan, which more easily evaded by just saving it as a jpeg. But the further you go away from an image, the more you bring in these are not duplicates at all.
>>9535 I am asking why a dupe search distance of 5 finds no more dupes than distance 4 but distance 6 finds tons of more. It would be useful if there was more granularity (is that the right word?) in the system and each increase in distance would give you a moderate number of new dupes instead.
>>9539 In my experience dupe search distances only work if it's a multiple of 2 (including zero).
>>9516 >>9515 >>9517 >>9518 >>9520 >>9521 >>9526 >>9535 Thanks for all this. I am not working on dupe stuff at the moment, but I have written these points down and will keep them in mind. I am overall happy with the first version of dupe stuff but agree that it could really do with a full pass.
>>9519 I don't know much about vim–what sort of shortcuts are you looking for?
>>9531 I was talking to someone else who had this issue a couple of weeks ago and couldn't reproduce it. When I do 4/5/6 on my test machine, I get more each time. The fundamental object that is being tested here is a 8x8 matrix representing some simplified vertical/horizontal waveforms in the image (it is a DCT), so I wonder perhaps if certain classes of image tend to be 'mirrored' in some way on the two axes represented in the matrix, but I am not sure. I'd be interested if you discover anything new about this. The max potential number here would be 63 or 64 but most of the action is in the 0-8 range, with anything above, say, 12, being completely useless, so I may end up re-doing the shape-based dupe detection to be a 16x16 matrix (i.e. 256 bits) to give us more resolution here, although this would slow search significantly, so it might be better just to put my time into better workflows. The big bottleneck now seems to be human attention (it takes ages to go through pairs by hand), so I wonder if I should attack that first.
>>9532 Yeah, this is highly requested, so I'll roll it in before this whole overhaul is done. There's some stuff here for it already: https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/tree/master/NEW%20Download%20System but I haven't tested it myself and it isn't easy mode yet. You might just want to wait for me to add it naturally.
>>9532 >>9564 Shit, sorry, I read that wrong. Yeah, I'll add a gallery parser and 'searcher' for it when I have that system ready, so it'll just pop into the gallery selector button soon enough!


Forms
Delete
Report
Quick Reply