/hydrus/ - Version 361

/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Mode: Reply

Name
Options
Subject
Message	Max message length: 12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 32.00 MB Total max file size: 50.00 MB Max files: 5 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

Board Locked? Request Reopening

APNG and GIF uploads are temporarily disabled while we deal with a spammer problem.

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

Version 361 hydrus_dev 07/24/2019 (Wed) 22:13:33 Id: a692a4 No. 13293

https://www.youtube.com/watch?v=v6qckMHp7wU windows zip: https://github.com/hydrusnetwork/hydrus/releases/download/v361/Hydrus.Network.361.-.Windows.-.Extract.only.zip exe: https://github.com/hydrusnetwork/hydrus/releases/download/v361/Hydrus.Network.361.-.Windows.-.Installer.exe os x app: https://github.com/hydrusnetwork/hydrus/releases/download/v361/Hydrus.Network.361.-.OS.X.-.App.dmg linux tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v361/Hydrus.Network.361.-.Linux.-.Executable.tar.gz source tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v361.tar.gz I had an ok week. Some final duplicates work is done, and there is some polishing to the new tag autocomplete. duplicates The duplicates filter can now detect if static images with the same resolution are pixel-for-pixel duplicates! If they are, it gives one of the standard 'comparison' statements on the right hover window. Furthermore, if one file is a png and the other not, this statement will colour green/red and bias heavily to the non-png, since the png is likely a bloated 'clipboard' duplicate that you don't want. Pixel summary data is not cached long-term, so this routine takes a bit of extra CPU. It only kicks in if both files are images with the same res, but nonetheless please let me know if this makes the duplicate filter too laggy for you. I expect the new 'pixel hash' data will be cached at the db level in future to auto-resolve png/not-png dupes like this. Also, the duplicate filter will now match the two files' zooms even if their resolution ratio differs! Zoom is now locked so the two files' widths are matched, along with the files' top-left corners. Two files with resolution 1920x1080 and 1912x1080 will now line up pretty good even if you zoom and pan. And I have written a new system predicate for whether a file is the best file of its duplicate group (also called the 'king'). It provides an easy way to find only the best files, or only those that are not the best. It is bundled with the old 'num duplicate relationships' system predicate under the new 'system:file relationships' predicate. I also fixed an issue with the 'custom action' button that wasn't letting custom actions go through unless some file was deleted–the final deletion question dialog now has a 'delete neither' choice, which is the default. And if you want to feel some despair, Mr Bones now reports potential, duplicate, and alternate counts. The duplicates storage overhaul is pretty much done. There is plenty more I could do, but I have now finished the main db focus of the work. Beyond some final UI stuff, there is only some decent new help to write. I would like to have that done next week, so I can draw a line under this job. It was more work than I expected, but I am overall really happy with it. tag autocomplete Last week's autocomplete changes seem to have overall gone pretty well. However, being able to search so fast has revealed some old 'limiters' I had in place to stop certain super laggy searches going ahead. And some old wildcard logic was flawed. Now we have more power with this control, I have been able to clean it up a bit. First off, entering something like 'character:*' should now work everywhere (although it will likely lag a whole bunch once the final results come in). Also, searches with an explicit namespace, like 'character:ara', will now match 'character:samus aran', just as the simpler 'ara' does. Wildcard searches, like 'char*:sam' or 'c*r:*mus' should be a bit more sensible overall, finding more possible results and matching complicated queries more reliably. And 'media' autocomplete fetches, which happens on a search page when you start typing with thumbnails already loaded, should also be much faster now (they were lagging last week with high numbers of thumbs). I improved the media search efficiency and added similar 'cancel' tech as I did to the db search last week, so it should now be fairly smooth and fast, even up to 10,000 files in view. the rest Just a note: system:size is now system:filesize. Also, if you are a Linux user, or I have otherwise previously suggested you turn on the options->media->BUGFIX: Load images with PIL option, please check that option again this week and try turning it off. There was a time when PIL was more reliable than OpenCV–the other image library I use–but now things seem to be the other way around, and OpenCV is significantly faster too. A user reported to me recently that he had an external hard drive he had hydrus installed to die due to overheating. It looks like it was related to heavy hydrus import folder work. This is the first time I have heard of something like this, but it still concerns me a lot. If you are running from a drive that can get similarly very hot, I strongly recommend you ensure you are not running any very heavy, hour-long-plus import or repository processing jobs on it. In the meantime, I will write some pause/throttle options for all the big routines to help users reduce load according to their situations. full list - duplicates: - the duplicate filter now compares the pixel content of static image pairs of the same resolution–if they have the exact same pixels, a comparison statement is added, and if one file is a png and the other not (i.e. the png is likely a useless clipboard copy), the statement notes this and a strong duplicate score is applied - added 'system:is/is not best file of its group' to search for file kings

[Expand Post]

- renamed 'system:num duplicate relationships' to 'system:num file relationships' - wrapped the two file relationship system predicates into one 'system:file relationships' stub predicate that opens to a dialog with two pred panels - added a 'add potential pairs' command to the thumbnail right-click file relationships menu, which will force-queue files for the duplicates filter - the duplicate filter now ensures the two medias' zoom is locked so they have the same width through a transition. furthermore, their current dragged top-left position is pinned in the same location. this ensures files that have slightly different resolution ratios (especially when they are just a couple of pixels off) still remain reasonably comparable when switching back and forth - reworked and simplified how position/drag delta is handled in the media canvas to support the above - fixed the 'custom action' button on the duplicate filter, which had no 'delete neither' choice and whose 'forget it' button cancelled the whole custom operation, making it impossible to custom action without deleting something. I have added a 'delete neither' green-text button to the front, as the default action - mr bones now reports on your potential, duplicate, and alternates numbers - . - tag autocomplete: - greatly sped up tag autocomplete search when fetching from a current media view (i.e. from thumbnails in the search page)–it had some CPU-inefficient testing/counting that mattered at high media/tag counts - greatly improved cancelability of tag autocomplete search when pulling from a current media view–this was resulting in high lag when typing fast with multi-thousand results - fixed the gui-level tag matching test to match namespaced search inputs with offset subtags (e.g. 'character:aran' now matches 'character:samus aran'), both for wildcard and specific namespaces - when typing an explicit wildcard tag search that does not end in a *, you will now be presented with two wildcard options–one with the implicit * suffix, one without - fixed 'write' tag autocomplete inputs (like in manage tags) being able to search for chunky 'namespace:*' explicit wildcard searches - . - the rest: - fixed the ipfs nocopy path translation control saving rows for client file paths outside of the main install path for non-Windows, where it was forgetting on save - renamed 'system:size' to 'system:filesize' - sped up some system:inbox searches - disabled a PIL 'load truncated images' backup mode, which on the current version can seemingly lead to infinite load hangs - file report mode now prints info when it deletes/recycles a path, including stack traces - fixed a long-running and silent 'port already running' bug related to setting services on the server that was stopping successful service-set-restart from the client in many situations. 'port is already running' checks that conflict with other processes will now give an immediate error to the client without saving any changes - the server now prints to the log as it stops/starts/has started its services - improved how the server can report certain 500 errors - the 'critical service tag/file reference' repository processing error has been improved: rather than reset the whole repository, it now pauses the repo and resets processing status for just the repo's 'definition' update files (without deleting any existing entries, so they should ultimately reprocess super fast) and also schedules a complete integrity and metadata check for all updated files - keyboard interrupts from the console should now trigger a clean exit request for the client - polite and forced shutdown requests when logging off should now trigger a fast exit (i.e. no yes/no dialog, no shutdown maintenance, but otherwise session saved and so on) for the client. this fast exit is noted in the log - moved the tag and rating service listctrls in duplicate merge options panel to the new listctrl object - moved the manage regex favourites listctrl to the new object - updated a bunch of yes/no dialogs to the new panel system - deleted some old unused dialog code and related unit tests - fixed up deletion-and-reimport file location handling for lingering media objects, which were not correctly forgetting combined local file deletion record on the reimport - improved shutdown error handling during repo processing - deleted the mishimmie default downloader next week The duplicates help, is the top thing. I need to draw some diagrams, take some new screenshots, and brush up the existing text to better explain the new system. After that, I will catch up on small jobs. I'd love to have multiple system:hash search added (for searching a bunch of md5s, say), maybe some subscription thumbnail publishing cleanup, and perhaps some Client API work, where I'd like to have web browser cookie import for easy login. Once the duplicate work is done, I expect to do a little work on audio support, likely basic 'has audio' metadata for files, and then crash on some client/tag-repository & PTR overhaul.

Anonymous 07/25/2019 (Thu) 03:41:54 Id: 780cd7 No. 13297

>>13293 Reading the first section about duplicates, there is something that bugged me that I alway look for when choosing very similar images, now when you make an image, you are going to likely tie at least one of the x/y amounts to a normal not random number. so I will see things with a 1280 or a 1284 for one of the resolutions, and if they are very similar if not nearly the same, I default hard to the more normal number. Would there be a way in program to highlight this? when we start talking about a 1280 or a 1437 image, there is a bit more to look at then just the more normal resolution, like detail that is or isn't present on the higher res image, its only with closer numbers that this comes into consideration. yea, how boned am I is a great name for mr bones

Anonymous 07/26/2019 (Fri) 07:46:17 Id: d6ac16 No. 13303

Thank you for all your work Hydrus dev! If I could have one far-fetched wish it would be a Regex find/replace for tags. Specifically so I can fix two common problems I have with pulling public tags. Example of what I mean… Find: (character:.*)(\s$.*$) Replace: $0 Reason: Change tags like "character:mario (Super Mario)" to "character:mario" along with all other screwed up character: tags because people don't seem to understand how to use series: tags. Find: \s Replace: _ Reason: Replace whitespace characters with underscores for all tags For some reason, people who use boorus love including the "series:" namespace in the character namespace. While I do want the character: namespace when I import from a search as it saves me from having to add that information myself, or looking up every character an image has for images with many, many (talking like 40+) characters displayed. If it was an uncommon problem I'd just fix it by hand by searching for character tags and fixing them one-by-one… I have 12,000+ characters to fix.

Anonymous 07/26/2019 (Fri) 10:53:22 Id: 780cd7 No. 13304

welp… a fairly large happening happened, all I can say is oof. thankfully this means my archive wont gain a an absolute fuckload from finding new artists anymore but still.

hydrus_dev Board Owner 07/26/2019 (Fri) 14:12:23 Id: 89b8d8 No. 13305

>>13304 Hey, I am just reading up on it now. I am not a user myself, so I can't talk too cleverly about the situation, but I know how much it sucks when a site you like goes down. To me, it is a good reminder not trust any CPU except one you own, and to download everything you like to a machine under your control. Bookmarks are fundamentally unreliable. Comiket is coming up, right? I assume any competitor sites will not only try to cobble together something out of the torrents/megas going around, but also set up new translation pipelines for the new season's content, and the existing userbase will presumably migrate to any of them that are successful, even if it takes a few years to build things back up. I've seen plenty of blackpills flying around, but I doubt we have seen the last of the internet's spicy content.

Anonymous 07/26/2019 (Fri) 15:11:08 Id: 83e766 No. 13306

>>13304 >>13305 I WAS ABOUT TO CALL YOU OUT ON THIS VAGUE WORDING SECRET CLUB ELITIST BULLSHIT BUT YOU'RE TALKING ABOUT SADPANDA HOLY FUCKING BALLS

Anonymous 07/26/2019 (Fri) 15:35:06 Id: 83e766 No. 13308

>>13306 In a similar vein, all the big dick posts on /a/ of people violently sourcing via an out of context exhentai link no name no artist nothing extremely big dick play on their part that's super epic, man

Anonymous 07/26/2019 (Fri) 19:06:48 Id: 780cd7 No. 13310

>>13305 thankfully i'm a bit of a data horder, I have lived though enough happenings to never trust shit I like to be somewhere tomorrow, thankfully I downloaded everything that I could when I found it and when that wasn't an option I downloaded from a different source, the only thing I couldn't get this way was image archives, and without new people shoved in my face, a good chunk of why I have fuck all for hdd space in the archive is gone. then I sit here looking at everyone else freak out about the good shit being gone… 'anyone have X' someone posts a link, I look it up in my archive, there it is already. fucking 47556 archive files and now that most of the shit that interests me is gone from that site, I acquire them FAR slower… the main thing I lossed from all this is a great aggregator for porn. also, many groups gave up their blogs and released directly to ex, so even going backwards that way is a challenge… as for a competitor… if anything comes of talks, you are still looking at needing serves that can handle something along the lines of 200tb, and that's without the distribution. the current thought process is a torrent like thing that has a centralized base of gate keepers who will allow content. this is a check so no cp that would kill the project gets uploaded and potentially tainting the project, along with being distributed so hopefully no attack like this could affect it, the worst that would happen is the main authenticator goes down and no new things can get added. you would still at least need one database of all the files for incase shit happens, but a system like this could potentially lay foundation to a service that can never truly die. the only problem is that current thoughts would leave you open to attacks like torrents do, as in a copyright holder fucks you. typically they don't go after downloaders, just people who share… As for a short term… I don't know who everyone is going to go to, there are a few that would work but they have nowhere near the infrastructure to handle what e-hentai did. my main hope is that we can get an aggregator up and running before 2020, something like baka manga but for doujin and porn, they do do that too but its far from ideal considering the vast amount that ex hentai handled >>13306 lol, I figured that anyone on a chan site would have heard of this shit going down, its possible that this is a short term thing, as it seems like the reason for it going down is suspect, its all on if the owner will fight for a bit or not.

Anonymous 07/26/2019 (Fri) 19:23:11 Id: 83e766 No. 13311

>>13310 >lol, I figured that anyone on a chan site would have heard of this shit going down I actually stopped using any *chan besides bumming answers if I need them on whatever. I didn't fuck with how everything is just social posturing, at least on 4chan, never used any other. Like the out of context ex links. If you call it out, someone just tells you you're socially unacceptable in as emotional of a way for themselves as possible. There's nothing to respect there except the infrastructure of it being a *chan site… which the users completely forfeit to turn everything to as near an upvote/downvote system as possible. Then having the audacity to use "reddit" as a counterargument. Literally just the word "reddit" and nothing else in response to the socially unacceptable person.

Anonymous 07/27/2019 (Sat) 08:49:51 Id: 780cd7 No. 13314

>>13311 you know this is a chan site too right?

Anonymous 07/27/2019 (Sat) 09:15:07 Id: 83e766 No. 13315

>>13314 >I actually stopped using any *chan besides bumming answers if I need them on whatever

Anonymous 07/27/2019 (Sat) 09:18:26 Id: 83e766 No. 13316

>>13314 Oh I guess you might mean the "never used any other" quote. I'm not familiar with 8chan but I figured I could illiterate retard my way into asking a question here if I needed to. I've never just communicated with anyone on anything besides 4chan before.

hydrus_dev Board Owner 07/27/2019 (Sat) 17:34:58 Id: 6aeacd No. 13318

>>13297 Thanks, that is a good idea. I don't think adding a 'this is better' green/red weight to that difference is guaranteed useful, but it would be neat to just have a 'this image is 720p, the other is not' statement for quick reference. What are useful resolutions to highlight? Just the regular 720p, 1080p, 4k? I don't want to go down a rabbit hole of highlighting old 1280x1024 wallpapers, but those three standard 16:9 seem sensible to start with.

hydrus_dev Board Owner 07/27/2019 (Sat) 17:39:55 Id: 6aeacd No. 13319

>>13303 Yeah, I'd definitely like something like this. The main problem with adding cleverer tag sibling and tag replace systems is I first have to flesh out the underlying manager and database 'shape' before I can start adding UI to interact with it. Tag siblings are in a similar 'rats nest' situation as duplicates recently were, and they sorely need a significant overhaul to improve logic and speed. After this dupes work, I'd like to hack-in a way to do namespace siblings, and while I am in there, I will push on this, but I think the bigger job of the complete overhaul will have to be a 'big job' in the next poll that comes up.

Anonymous 07/27/2019 (Sat) 19:47:07 Id: 7f32f8 No. 13336

>>13293 So I updated hydrus and now I'm getting a lot of "connection failed retrying in 60 seconds" errors when downloading from pixiv, and it's really annoying. Even if I'm running a single artist gallery downloader with the default bandwidth options I keep getting this error every 3 or 4 images. I guess its because of this from version 356 >- the network engine now waits significantly longer–60s–on connection errors before trying again, and with every failed attempt will wait n times longer again. when in this waiting state, a manual user cancel command cancels it out faster Before updating I constantly got connection errors with pixiv, but hydrus used to retry immediately. Is it possible to change or disable this waiting time or is it hardcoded in the program now?

hydrus_dev Board Owner 07/27/2019 (Sat) 19:56:21 Id: 6aeacd No. 13339

>>13336 Thank you for this report. You can't change it currently, but I will definitely add an option for it and an override command from the network job control's cog icon button.

Anonymous 07/28/2019 (Sun) 14:00:20 Id: 780cd7 No. 13349

>>13318 Its not necessarily resolutions, specifically lets say I wanted to do something tall typically when I start I would go with 20000x10000 and when im done, it may end up being 5000x2500 or 2500x1250 however if someone fuck with the image, and scales it down in a retarded way, or wants to scale it enough to push it into spam territory that 5000x2500 may turn into 4998x2499, something like this, I would prefer the cleaner number 5000x2500 for checking things… if you don't want to have hard resolutions, then going for ratios would probably be a better bet. this would help point out when something may have been altered from source. like if one of them is a perfect 4:3 or 16:9 but the other one is a pixel or so off. it may also be worth pointing out some websites resolutions, I know a few would cap a horizontal resolution at 1200, so it may be worth pointing that out as a possibility, hell, it may be possible to look at source and question if that is the version you want to keep if its known for compressing/altering images. while a hard 'this is better' may not be the best way to go about it, a highlighted 'you may want this one more' would be nice. >>13325 Yea, working smaller chunks would be nice, as for the exact first, my problem was a set of around 2000 files where many of the files were duplicates/alternatives with some things changed but nothing getting caught, especially with artist archives and such, smaller searches like this with speculative and a defined set of images would likely get me every single image to ok, but due to the time it took and the locking of the program (for what would likely end up being a 24~ Hour period of time) I abandoned that for the time being.

Anonymous 07/29/2019 (Mon) 13:53:20 Id: cc2304 No. 13361

Lowercase everything = get the fuck off my computer. Seriously git gud @ software design. The fact that you even need a getting started guide is fucking pathetic. No one is gonna use it cause it sucks balls.

Anonymous 07/29/2019 (Mon) 14:57:12 Id: ce7dcd No. 13362

>>13361 ok

hydrus_dev Board Owner 07/29/2019 (Mon) 19:47:11 Id: e11d3d No. 13368

>>13349 That resolution stuff is an interesting thought. I think trying to solve the problem of what is a useful resolution is beyond the time I have left in this current big job, and I should leave it to human eyes for now. I can add some common resolutions for now, though, and I can detect odd-numbers vs even-numbers easy, and we can iterate on that later if we see other easy answers. I'll make a job for nice resolution ratios, but it may come later.

Anonymous 07/30/2019 (Tue) 13:39:42 Id: 780cd7 No. 13373

is there a way to filter 'pixel for pixel' duplicates? would love to be able to get those hammered out and could do that very fast too.

hydrus_dev Board Owner 07/30/2019 (Tue) 23:36:07 Id: 89b8d8 No. 13378

>>13373 Not yet–this data is currently generated on the fly in the duplicate filter. But I think it is very worth caching it in the db at some point. I can search it incredibly quickly to find pixel dupes and get started on some auto-dupe-resolve rules like 'if two files have the same pixels and one is a png and the other is not, delete the png' so we can finally and automatically clear out Clipboard.png trash dupes. It would be fast enough that I could even add a hook for it on import! This would be optional and probably default on. More complicated stuff like 'if two jpegs are the same, keep the one with smaller/larger filesize' would be default off and up to the user.

Anonymous 07/31/2019 (Wed) 15:36:48 Id: 780cd7 No. 13384

>>13378 I would set both rules to delete the larger file size, it's very rare, but I have come across some jpegs were saving as png did lower file size.

Index Catalog Archive Top Reply

Manage Board Moderate Board Moderate Thread

Forms

Delete

Password Unlink (Removes file reference from posts) Delete (Removes file from the server)

Report

Reason Category Global

No Cookies?

Quick Reply


Sage Bypass Check