/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Index Catalog Archive Bottom Refresh
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

Uncommon Time Winter Stream

Interboard /christmas/ Event has Begun!
Come celebrate Christmas with us here


8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

(28.65 KB 480x360 LQcKXieQXN8.jpg)

Version 360 hydrus_dev 07/17/2019 (Wed) 23:09:37 Id: ad39a7 No. 13223
https://www.youtube.com/watch?v=LQcKXieQXN8 windows zip: https://github.com/hydrusnetwork/hydrus/releases/download/v360/Hydrus.Network.360.-.Windows.-.Extract.only.zip exe: https://github.com/hydrusnetwork/hydrus/releases/download/v360/Hydrus.Network.360.-.Windows.-.Installer.exe os x app: https://github.com/hydrusnetwork/hydrus/releases/download/v360/Hydrus.Network.360.-.OS.X.-.App.dmg linux tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v360/Hydrus.Network.360.-.Linux.-.Executable.tar.gz source tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v360.tar.gz I had a great week. There are a bunch of little things and an important speed overhaul to tag autocomplete. tag autocomplete I have never really liked the tag autocomplete workflow. It once blocked the UI completely, and some unusual timing options were needed to make it even useable, but it still often lagged out or just responded with gigantic lists in judders. After chipping away at the problem, this week I am finally updating it to what I really wanted. So, the main change is that autocomplete results are now fetched as soon as you type. It responds very quickly and overall, I think, feels great, particularly once you have put four or five characters in. You get what you want as soon as you type, and can hit enter right away. The db 'job' to fetch results is now completely divorced from the UI and is able to cancel much faster when you type a new character, so the artificial fetch-start latency requirements of the old system are no longer needed. There are now just two options for tag autocomplete, under options->speed and memory: whether tag autocomplete should fetch as you type (default on), and a character threshold to switch from 'exact match' autocomplete searches to 'full' ones. This 'exact match' defaults to 2 characters, and means if you type 'sa', you will only get results for 'sa' rather than 'sa*', which is generally a pretty giant laggy list, but if you type 'sam', you'll really be searching 'sam*' and get all matching autocomplete tags. You can reduce or increase this 'exact match' threshold as you prefer, or turn it off completely and get full autocomplete for any input. Also, I have added some quality of life features. If results do not appear within 200ms, you'll now get a 'loading results…' label in the dropdown for feedback, and any 'static' results, such as exactly what you typed for a manage tags dialog input, or the special 'namespace:*anything*' for a search input, will appear immediately so you can select them without having to wait. Also, entering 'character:sa' will now trigger the same smaller 'exact matches' test as for the unnamespaced 'sa', rather than searching for the whole giant list of tags beginning 'character:sa*' as happened previously. Furthermore, a query like 'char' no longer matcher 'character:samus aran'. This was a neat idea, but it proved too unwieldy IRL and is now better served by wildcard queries such as 'character:*' or 'char*:*'. Overall, I am very pleased with the change. Please give it a go, and you'll see the difference immediately. However, tag autocomplete is a complicated system with different workflows, and I have made some big changes, so if you are, say, the sort of user who types fast and just hits enter, and you find the new results are a bit 'flickery' or something for you, let me know and I'll see if I can smooth it out. more file maintenance I brushed up the new file maintenance UI under database->maintenance->review scheduled file maintenance to have a nicer list and have added three new jobs: One fixes file permissions on Linux and OS X. For a while, due to an oversight, file imports have been getting 600 permissions on Linux and OS X. I have fixed this now to be 644 (so, for instance, a nocopy ipfs instance or a network file share running on another user can access the files) for all new files, and this new file maintenance job will retroactively try to fix the permissions of existing files. The other two are for the duplicates system. One explicitly regenerates similar files metadata, and the other checks if a file is improperly in or out of the similar files searching system, and fixes it if it is in the incorrect place (scheduling metadata regen as appropriate). Both are mostly for debug purposes but will get real use when I eventually add videos to the duplicates system, which will be a giant CPU job we'll want to spread out with the nice new non-blocking file maintenance pipeline. The old maintenance button on the duplicates page that kept eligible files 'up to date' is now gone as a result, and any outstanding jobs there (although most users shouldn't have any by now) should be migrated to the new file maintenance system on update. full list - tag autocomplete: - after various tag autocomplete async work, fetch timings get a complete overhaul this week. the intention is for a/c jobs to appear as fast as possible, with good ui feedback, without interrupting ui while they work. feedback on how this works IRL would be appreciated - there are now just two autocomplete options under options->speed and memory: - - whether autocomplete results are ever fetched automatically, defaults to true - - the max number of characters in the input that will cause just exact results vs. full autocomplete results, defaults to 2, can be None - namespaces are no longer searched from an unnamespaced query ('char' no longer matches 'character:samus aran'). this proved too slow for real use, and remains better available with explicit namespace searches such as 'character:' or 'char*:*' - the 'exact results' character limit now also applies to subtags of namespace searches! so, entering 'character:a' will deliver the same short exact match results as just 'a'–no more gigantic lists when you put in a simple namespace - improved tag results caching to deal with the new non-namespace matching on subtag input - tag autocomplete dropdowns will now display a non-selectable 'loading results…' label when results take more than 200ms to load. - tag autocomplete dropdowns will now also display 'static' tags, such as 'namespace:*anything*' for 'read' inputs and the exact entered text and possible siblings/parents for 'write' inputs, during loading. so, typing 'character:' just to get the special 'character:*anything*' predicate is now simple and does not need a whole load wait to enter! - cleaned up some tag listbox code to handle parent selection and navigation better along with the new label type
[Expand Post]- greatly improved autocomplete search logic in the critical text search portion, collapsing it into one cleverer and more easily cancellable query rather than two or three simpler ones with potentially gigantic lists thrown back and forth - improved speed of autocomplete cancel for certain large lists with many siblings - . - file maintenance: - the new file maintenance ui now shows scheduled jobs in a listctrl, and only shows jobs that have outstanding work. you can clear/do work on multiple selected jobs - the file manager should now try to guarantee at least 644 permission on file imports (previously, it was only trying to add 600, which lead to problems with nocopy ipfs running on another user etc…) - added a file maintenance job to check and fix file permissions - added a file maintenance job to regenerate similar files metadata - added a file maintenance job to check if a file should be in the similar files system–if it should and isn't, it is queued to get its metadata data regenerated, and if it is and shouldn't be, it is removed - the previous bulky similar files metadata regen job from the duplicates page is now removed, and any outstanding scheduled regen will be transferred to the new file maintenance manager on update - . - client api: - added POST /manage_pages/focus_page, which makes the given page the current page in the main gui - added help and unit tests for this new call - client api is now version 9 - . - the rest: - fixed an issue recording media viewtimes when no max viewtime is set - fixed the new missingdirectory errors not printing the missing path - fixed an issue with some human-started repository actions waiting silently on bandwidth when it was not intended (e.g. account refresh) - export folders now raise proper errors and pause themselves if their path is not set, does not exist on the file system, or is not a directory (previously, they silently stopped work without error) - cleaned up some misc import folder code, and put in additional protections to the delete/move code to ensure folders cannot be so actioned if they somehow end up in the path import queue - when unpinning a file or directory from ipfs, the clientside service now first checks that the current daemon considers it pinned (previously, this 500 errored when the object was not pinned due to a reinitialised daemon etc…) - fixed an issue with the new ipfs path translation control, which was forgetting values when the clientside path was outside of the default db structure - media objects that transition from trashed to physically deleted but remain in view will now correctly be aware of their complete previously-deleted status (rather than being simply remote, as they were before until a client restart) - improved some of the recent duplicates db update code to pre-optimise the new tables on update (some users were getting slow behaviour due to mis-scheduled analysis maintenance) - extended the new panel system to deal with custom button panels and moved the duplicate filter 'commit and continue?' dialog to the new panel system - moved the archive/delete and duplicate filter 'commit and finish' dialog to the new panel system - wrote a new question panel for the typical yes/no dialog used across the program and started a cleanup job to migrate all 140-odd instances of this over - fixed an issue where a program instance that quit due to a user deciding to leave an already running instance in place would clear the original instance's 'running' file in its shutdown, meaning subsequent runs would charge ahead and hit 'database is locked' problems on db init! - wrote a new 'similar files metadata generation report mode' to provide debug info on this cpu/gpu intensive routine - added 'why use sqlite?' entry to the help faq, with a link to prkc's excellent document about the subject, https://gitgud.io/prkc/hydrus-why-sqlite/blob/master/README.md - also added prkc's excellent Linux package requirements information to the 'running from source' help page - fixed some old py 2.7 references in running from source help and an old link in ipfs help - moved the 'file viewing statistics' menu down on the database menu - fixed some dialog Escape key event handling - fixed some ui ancestory testing code - improved some misc similar files system code next week Next week is a cleanup week. I'd like to just do some boring rewrites and ui code updates, and I'll see if I can hack away at the last duplicates overhaul jobs.
Apparently the IPFS path redirect stuff is not completely fixed for client path locations outside the main db dir. I will give it another go this week.
>>13223 Ok, running the maintenance "Membership in the similar file search system' now, hopefully that's the one I need, 4 million files and it seems like I get 100 files done every 1-2 seconds so about 3000 file a minute, 180k an hour, so about 18 or so hours till everything is sorted… or less, just stopped the job so I could target just the first 1.8~ million files after reading that number, but apparently in the time that I had this going it did around 700k files… awesome. here's hoping this works. Image is whats left of the top job down from little over 4,055,017 and the new task below it regenerate metadata, which I assume will add the files back into the similar search, as just stopping it seems to have done nothing for that number.
>>13225 oof, on the second step of regenerating their metadata, this could take between 10 and 18 hours…
Wow wow wow tagging feels so much snappier now. Loving it! Thank you so much!
>>13225 last leg after 1.08 million
I stopped getting "pixiv work" tags over a week ago. It's been killing me but I thought it wasn't just me. But is it really just me? I just updated Hydrus and still didn't fetch the "pixiv work" tag from a subscription. I don't know what I could've done to break it except I once dragged those image things into Hydrus in the hopes of making ripping from pixiv work before it "just werked" eventually. I remember the window I dragged the images onto was an anime girl, which scared the shit out of me at first since I thought it was a screamer. It didn't do shit as far as I could tell, but I wasnt sure how to undo what I did, so I just sat on it until Hydrus started doing the pixiv ripping out of the box. But I figure that broke this now.
>>13231 final strech
(334.23 KB 1366x768 360 cheif.png)

>>13190 I ran the integrity checks and found 10 or so errors in my client.caches.db culminating in a database disk image is malformed. So I cloned it and now everything works again. I think my search is actually faster but I don't know if that was the cloning or this latest update. This new tag autocomplete is working really well for me. Thanks for the help, always appreciated. Also unrelated question, I have a bunch of fetched tags that are "artist:…" and "comic:…" and I'd like to sibling them all to "creator:…" and "title:…" respectively. Is there is way to do this all at once? Like making a sibling equivalent of changing "comic:*" to "title:*" or something?
Tagging is so nice now, good job dev!!
Found a bug: If you press custom action in the dupe filter, after you enter your custom action, Hydrus will ask you which files to delete.
Tagging feels definitly more fluid now although it will take some time to get used to not searching via partial namespaces in some cases, but that is fine. I definitly like that they work again once i already search for a tag - (as in i can just write 'char' to see all characters that a certain creator has used) One thing i would like to see added is a way to look at the existing tags within a namespace inside the manage tags window - outside you can write 'character:*' and see all existing character tags but this does not work when you are inside the tags window - would that be possible to add? Thanks for all the hard work!
>>13228 >>13225 >>13231 >>13233 Great, I am glad this is fixing your problem. Thanks for the updates. My intention is for the new file maintenance manager to be able to handle jobs competently in the background if the user doesn't want to burn CPU manually for 18+ hours straight. The current system has the default throttle of 100 jobs/day, which assumes all jobs have equal weight, but I think I'll have to reweight relatively quick jobs vs heavy ones, or maybe add a '100 jobs AND ten minutes' or something since if you had let your ~5M jobs run at 100/day, it would have taken 150 years of idle work every day. I am still trying to grapple with the human workflow of some of the numbers we have here. Hydrus is a powerful magic wand, able to do more than we are used to.
>>13229 >>13239 Thanks lads. This took a while, but it was definitely a case of slow and steady wins the race. There's still some edge cases to sort out and some logic to improve, but the snappiness here is the ideal for where I want most of the ui to be. I can improve the lagginess of db access across the program in similar ways, it'll just take the work to clean up the code.
>>13232 I can't say confidently what is going on with your client, but a pixiv tag search test on my dev machine here seems ok. Pixiv update their access frequently, so if your client is now running on and old or accidentally broken parser due to the downloader import, this would explain what was going on. This has happened a couple of times with different sites now and there is no excellent 'just reset this site to default' button, which suggests I should write one. Until then, I think your best solution is to do something like: Hit all the manage dialogs under network->downloader definitions and delete anything pixiv related. Those dialogs should have a 'add defaults' button, which will let you reload the default objects that come with the program. Add all the default pixiv stuff back in. You can probably skip the login stuff, if that still seems to work for you. Then hit manage url class links under the same submenu and hit the try to fill in gaps… button to relink everything back together. Check your pixiv subs are pointing to the right 'GUG's (i.e. downloader entry, should be called 'pixiv tag search') and then give it another go. That is a bit of a hassle, so you could first try just seeing if you can manually switch back to the default parsers under manage url class links. I don't know exactly what your client has, but it probably has the defaults still under it. If the problem is your pixiv is now pointed at custom parsers for the gallery/file url classes, just switch them back to the simply named defaults and you'll likely fix what is wrong. Failing that, do the complete wipe/reset as above.
>>13234 Great, I am glad you are back up. The speedup is likely due to the clone, yeah–it reads and writes the whole db, so it works like a great disk cache for the next x hours. The namespace situation is getting out of control! Once I am done with the current duplicates work, I'll be doing some PTR and tag-repo and tag siblings work to try and put some fires out. Namespace siblings, like 'call all "idol:" tags "person:" tags' is a top priority. At the moment, the only solution would be to do individual siblings en masse, which is not practical. Tag siblings are great overall, but the whole system is still running on the 1.0 attempt, and I feel it does not account for human preferences enough. Shit like 'bodypart:' is creeping into the PTR, which is a fundamentally correct namespace–we can all agree that 'breasts' are a 'bodypart'–but different users disagree on whether they want to actually see it as a namespace. Permitting client-local control of namespace display cuts the knot of this problem. Eventually, tag siblings will have a 'tag definitions' system, group-based like the new duplicates system, that will place semantically similar tags on equal footing along with extra info like 'this tag is in japanese language, it is slang' and let users choose their language preferences and whether they want to see 'vagina' vs 'pussy' based on proper vs slang rather than for each pair again.
>>13240 Thank you for this report. I was going to say 'this is actually intentional', which it is, but I just did a bit of testing here and I saw that choosing 'forget it'/'no' on the delete dialog actually cancels the whole action. The intention was to add an optional delete to custom actions, not mandatory. I am 99.7% confident I can fix this for next week.
>>13242 I just answered a question on tumblr about the namespace searching: https://hydrus.tumblr.com/post/186423570874/loving-the-better-autocomplete-in-v360-only It is complicated to explain nicely, but I will fix up the wildcard logic so you can put in 'char*:sa' and have it implicitly enter 'char*:sa*' and return 'character:samus aran'. I think moving to the workflow of just putting 'c*:' before your input may solve the problem nicely and still give you options. Thank you for the report about 'character:*' in the 'write' autocomplete. I am not sure why it is not fetching all the results on that call, although a bunch of the logic here is still from the old system, which protected certain inputs like '*' from being searched due to the lag. Now we have some more control and no ui blocking, I'll see if I can undo some of these limiters and give you more power.
>>13245 Thanks for the reply. I tried nuking all the pixiv-related stuff under the dropdown but after doing that and "try to fill in gaps" I still can't fetch "pixiv work" tags. I even went back to the "manage default tag import options" and selected "get tags" to fetch "all tags" but still it doesn't fetch "pixiv work" tags for me anymore. I didn't check my pixiv subscriptions- I just opened a new gallery tab in Hydrus and pasted someone's pixiv member ID or however, after selecting "pixiv artist lookup" from the dropdown. It counted up until completion as normal, but didn't fetch "pixiv work" tags. I didn't redo the log in for pixiv because I didn't think it would help if Hydrus can fetch the images and every other tag fine.
>>13243 the 4 million jobs ended up taking around 2.5-3.5 hours, I'm not really clear on exact times as it would go fast and slow so I wasn't paying close attention, but that second task did end up taking 18 hours. Here is my suggestion, and its probably going to only get more apt going forward looking at cpus, you sould have a big multicore more that more or less ignores background use, because lets be honest, the program on my cpu meter even in the time intensive stuff never even registered as cpu use, and the most the program ever eats cpu is when it does things with ffmpg or whatever it is because that will use an 8core16thread cpu, hydrus itself will only use 1 thread It would probably also be a good idea to have a toggle for things that will hard lock the use of the program from things that will only momentarily lock it when it has something to do. similar metadata from the duplicate window, that will lock you out of the program while its doing its thing, but the metadata regeneration I was doing, while time intensive, wasn't locking me out. I think that would be a better way to divide, because personally, 8/16 cpu here and even when hydrus is doing everything it can to lock itself up, I dont feel it, but when something hard stops the program form being used, that I do feel. I also think there is value in sorting things that can get done fast vs things that take time, but I think the program telling you something like 'dude, I gots some shit I need to get done, mind if I go at it for a while' kind of popup would be appreciated, also, if there was a way to estimate time based off a data set of what has been done vs what needs to get done that would also be nice.
(381.89 KB 2210x463 client_2019-07-20_22-34-55.png)

A thread I was watching a day ago 404ed and had 21 and I was sure it would have went to 151. turns out it got spammed and thread abandoned so I went in and removed the spam images. this is something I asked about a while back, but i'm not sure if I ever got an answer for this. would there be any way to gave that number update after a thread has been culled? possibly another window column for 'removed' I know for most people this would be worthless so an on or off in the options for this would probably be for the best. While i'm on the topic of columns. I have 558 watchers on this page, with no real way to keep track of what I have done to what. What I mean is I could open 10-20 of these threads at a time and scroll down giving them a quick once over for anything egregious, this would be stage 1, I could then give them a look more at the content and see if anything is sticking out as unwanted, this would be a stage 2, and fine tooth comb the thing which would be a stage 3 and at this point likely end in the removal of the watcher due to being nothing more I can really do. Now while I can do the above 1 at a time, many of these threads are things that get posted time and time again. and some of them have more potential to getting spammed, and with relatively limited time/divided attention, it would be nice to be able to mark thread watchers in some way, what would be ideal is a rateings like way to make them where I click a square, it fills in the color and I can quickly see what is at what stage but even a number system would be good to where I could define a threadwatcher a 1,2,3 or whatever I wanted to and it would be sortable by that. im sure there are people who could make great use of it, as a user defined way to mark a thread
>>13260 sauce on the middle image?
>>13261 there ya go, that will at least give you something to goo off of to possibly find higher resolution >>13223 so going through some more of my downloaders, this time galleries and I came across one that has an issue, well many have issues but this one is a bit egregious. this artist decided they needed 3-14 alternates for every image, now, While duplicate processing will hit them, I have a potential 875k duplicates this is not viable to cross fingers that I will hit the images I want to hit. These images also have nothing identifying them all as images, the one tag they share is 1girl or something like that, and only 590 of the 1591 images have that tag is most common what i'm going to do is add a rating that will be something along the lines of 'process duplicates' and act as a temp rating for this purpose, but would it be possible to either have a right click on the watcher/gallery function to use that image set for potential duplicates? possibly even just random images have a right click duplicate check with these images? I'm not sure how useful this would be across the board for everyone using duplicate filters, but I know for me at least with a db that is at best discribed as a cluster fuck, being able to narrow down to image sets or displayed images for dup processing would be very nice.
>>13266 Thank you mate. But I tried reverse image searching with yandex, iqdb, tineye, saucenao, all returned nothing. But google showed one result from /aco/: https://boards.4chan.org/aco/thread/3331595/blacked-twinks-and-traps#p3331645 I recognize the artist name "onta", and I ripped their twitter "doxyonta" before, but dude posts way too many sketches, crops, plus typical twitter garbage to be able to reasonably find any specific thing there. The md5 didn't match anything in Hydrus for me, and the look for similiar files function found nothing. Realistically it's just another twitter crop or something though. I last ripped his twitter 6 mo. 22 days ago.
>>13267 ya reminded me of one of the tumblrs I watched/downloaded where the person thought it was perfectly reasonable to have 32000 images that were nearly all fucking garbage along with his own art… god that would probably get be a bit of space to cull it if I still have it… god I can't really deal with opening those cluster fucks at the moment. will be good to go though once I finish the current batch. >>13223 Ok hdev, found a new thing that would help a lot in duplicates search and also how duplicate search seem to be handled. First lets start off with 2 images and why I think this feature would be great. 1 - the gif from pokemon, this one I found by luck, but the gif was separated into every frame in both jpeg and png, the png being 3.1mb per image, now at first it seemed like it was a simple 'this shit is just a few images that show what's happening' a fairly common thing it wasn't till I was getting a shit ton of the inbetween images, as in in between closed asshole and largest pokeball that I thought it was animation and this may be a problem. 2 - the mlp image is 1 of 104 still in the inbox and 30 that are currently in the trash. now these two images and how often they were coming up made me want to just get a full page of the images so I could deal with them there, but in the dup filter, there is no way to 'open duplicates in new tab'. for cases like this it would be really nice to be able to open said image with the 4 options, exact, very similar, similar, speculative and let me sort things out so its more manageable there, because there are so fucking many of these things and tagging them with 'this one is better' and 'this is a related alternative' is a bit tedious when there are this fucking many of them. and as I said above, 104 in inbox and 30 in trash… would it be possible to have the duplicate search have a search parameter for inbox/trash/everything? I get that it wont have many uses, but cases like the mlp one where I say hol the fuck up and need to see it, or ones with the asshole animation, if I had been sending them to trash, I would have been missing ones I wanted to see rather then what I was doing and calling everything an alternate. in fact the reason the mlp one even has ones in the trash for it is entirely because there is a tentacle version and I pulled the duplicates for that one and sent though them as I said I was going to >>13266, that was a bit tedious as the db keet locking for a long time due to needing to reparse duplicates and not being able to run every image fast but it got done, had I known that there was a 134 image set for these I would have instead just did them manually rather then relied on the filter.
(904.88 KB 300x290 SPOOKY.gif)

yo hydrus dev, i had an idea for an optional feature. a function to move all tags from one of the public tag repositories to the local tags of the user, but only for a group of selected pictures, and only one way (PTR to LOCAL, not the other way around) to prevent peoples personal tagging to shit all over the public repositories. the way i imagine it is like copying the tags from PTR1 or PTR3 to the Local tag repository, but instead of doing it one by one, having an option that does it automatically for multiple pictures. you know because using the copy button in multiple pictures copies all of the tags in the 10 pictures and moves them all as a single block of tags. do you think its doable?
>>13270 <using the copy button in multiple pictures copies all of the tags and moves them all as a single block of tags.*
>>13266 artist?
>>13274 the best you have to go on is the signature in the lower left, never looked into it but someone else said its likely a crop.
>>13275 ah sorry didn't notice it thanks.
Thank you, based Dev.
Ok, so in getting duplicates sorted on a rather large image dump with many similar images, an exact match duplicate search is not picking up the similar ones, so I decided to try out going a few steps down, due to how small the image set was. so… doing this on a speculative search gives me 1 extra file done every 30 or so seconds… so noped the fuck out of there similar while not as bad, seems to be 30 files every 10-20 seconds. nopeming back off again I go to very similar. this is about 30 files every second, ok, I can wait for this to finish and hopefully get all the files I want… granted this will take longer then manually filtering similar, but it seems that exact/verysimilar/similar/speculative searches are all kept over time and this will get done one way or the other. Now, I thought of 2 things 1) the program just hard locks if I keep speculative search going, I have had to open a fresh session to be able to get things done and hopefully not lock the program up. Is it possible to get this to work in chunks of 1000-10000 and commit? I will admit that this is almost entirely a problem with existing large databases and any new database would not have these issues, it may even just be my session fucking over the program… either way a commit from the program every so often would be helpful. 2) now, the image set I want to work with is only about 2000 files big, would it be possible to set a specific file set up to be processed in? especially with speculative being so long to even do one file. I don't know if 2) would require a smaller subset of the full archive duplicate processing, or if it could be done along with the full dup processing, but it would be nice to be able to work with smaller filesets when needed.
>>13285 still hung forever even on a fresh session… it seems like 20k files and any more then that at a time causes it to hang indefinitely, it's possible that after several hours it may show up done, but that's not something i'm willing to leave to chance, going to keep filling this out a few thousand at a time, but i'm not expecting a whole lot.
>>13257 Here is my reply: >>13317
>>13259 Thank you for this feedback. That's definitely the direction I'd like to go. The file maintenance system has proved a good step forward, so now I'd like to do similar for db-wide actions like analyze and repo processing. Get everything working on the same pipeline and timers and checks, and then I can neaten up the user feedback of 'hey, I need to do x, y, z big job, is this a good time?' into one system rather than the current cobbled-together system that consults several different maintenance clocks all with their own settings. As a side note: standard python is restricted to using one core as a language. I can bump up CPU if I go into low-level C++ hard math stuff, or by calling other processes like ffmpeg, but up in my high-end, the language has a Global Interpreter Lock (GIL) that I can't get around. I have threads and everything, but none of them actually execute simultaneously on different cores. The biggest sources of lag and latency in hydrus are due to my shit code rather than hardware/language limitations though. I just need to put the work in to make it neatly async, like the recent autocomplete improvements.
>>13260 Unfortunately, the current downloader system doesn't support the idea of 'removed' results. It always treats a new gallery request (or thread check) as an 'append any new stuff' action and doesn't take into account urls that have dropped off (which in the gallery context could have just cycled out to the next page). I think any clever additions to the downloader system will have to wait for the next big iteration of work on it. I like the idea of marking threads in some way for later processing. I had always wanted to make the individual thread items drag-and-droppable so you could migrate them to other/new watcher pages, but I ran out of time last time. My best suggestion for now is to multi-select and right-click to say 'show all selecteds' presented files' or whatever, which will show all the files for several at once. If it all looks legit, and the threads are DEAD, I then recommend you drag and drop the big selection of files to a new processing page or give them a later-filtering-lookup rating/tag, and then clear out the DEAD threads. >>13266 The new 361 should let you add specific potential pairs from the right-click menu. I am vacillating on whether I should/can easily add something like a 'launch dupe filter from here' command from a thumb selection. I think adding a temp tag/rating you can find again in the dupe filter is a good idea, or just picking your favourite from the thumbs in front of you may also just be easier, at least for smaller alternates groups. A quick browse through ten files in a row and choosing 'yeah, the one where all the girls have a benis and runny makeup is the best' when they are all in front of you and just deleting the others may be faster than going through them by pairs later on.
>>13268 Hey, just fyi: the 'speculative' search and the other options are inclusive of the more narrow choices. Speculative is really just 'hamming distance <=8', so it includes the others, which are 4, 2, and 0. When I eventually add videos, I'll add gifs as well. It'll work on frames, so your pokemon gif should match enough of the still images to get linked up with the eventual dupe/alternate groups you create here, and will appear in 'speculative' searches and so on. For finding things in trash, if you change the search domain, either on the duplicate files page or a page initialised with a 'similar files' system pred, from 'my files' to 'all local files', that should include trash. You may need to be in help->advanced mode to see this option.
>>13271 >>13270 Hey, you should be able to do this from manage tags right now, although it is advanced and debug-tier UI. First, make sure help->advanced mode is on. Then select your thumbs and open manage tags. Hit the cog icon and then 'advanced operation'. It'll load the same mass-copy dialog you can do service-wide from review services, just limited to that selection. Double-check you have everything set how you want, and then click Go!. Make a backup before you get into this, as it is a powerful tool and if it goes wrong, there is no easy undo. This dialog will get a pass in the upcoming PTR/tag services work I am planning. I'd like to add the tag filter object to it so you can more finely determine which tags are moved around.
>>13285 >>13286 Yeah, the current duplicate files search is a big atomic job, and if it goes on too long with a manually started job, the UI gets caught up until it eventually finishes. I don't like it, and as I move it to this new unified maintenance pipeline, I'd love to have it work in smaller async chunks like the file maintenance stuff. Since you have a big db, I recommend you have it work in idle time (the cog icon on the duplicates page should allow this) and let it catch up in that system, which has improved cancellability. I generally recommend users start with 'exact match' (i.e. 0 hamming distance) search to start with and work that queue completely before going for larger distances. This is much faster than the other searches and gives you easy decisions first. Because of the internal logic of the duplicates system, easy/actual duplicate decisions are more powerful/useful than alternate decisions–they eliminate the queued pair count faster.


Forms
Delete
Report
Quick Reply