/hydrus/ - Version 424

Name
Options
Subject
Message	Max message length: 0/12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 32.00 MB Total max file size: 50.00 MB Max files: 5 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

Version 424 Anonymous 01/07/2021 (Thu) 01:59:59 Id: 0070e5 No. 15072

https://www.youtube.com/watch?v=YnU_j_ZA-tc

[Embed]

windows zip: https://github.com/hydrusnetwork/hydrus/releases/download/v424/Hydrus.Network.424.-.Windows.-.Extract.only.zip exe: https://github.com/hydrusnetwork/hydrus/releases/download/v424/Hydrus.Network.424.-.Windows.-.Installer.exe macOS app: https://github.com/hydrusnetwork/hydrus/releases/download/v424/Hydrus.Network.424.-.macOS.-.App.dmg linux tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v424/Hydrus.Network.424.-.Linux.-.Executable.tar.gz I had a good week. There are some quality of life improvements and faster tag search across the board. The update will take some time this week to update a cache. If you do not sync to the PTR, it will be just a few seconds. If you sync to the PTR, expect about 5-15 minutes. faster tag search In the second half of 2020, I tried several times to tune the database for different sorts of wildcard tag search, which is used in all autocomplete lookups and many file searches. I was sometimes able to get small clients always running well, or complicated large systems running, well, but I failed to get it good for all situations with code alone–the structure of the database tag lookup cache made the tuning difficult. So, I have updated how that cache works. Rather than always searching one big master table, the client can now 'zoom' in on the appropriate search context based on the type of search page or manage tags dialog or whatever. Pretty much anything related to autocomplete and tag-based file searches is faster. Most importantly, the worst-case time for these searches is greatly improved. Complicated searches, like a 'namespace:*anything*' file search, should no longer have sudden gigantic lag spikes. These searches may still take ten seconds or more when searching millions of tags and files, but they won't accidentally lag out for two minutes on some tiny 'my tags' search with only 60 results. The only exception in my testing is 'number of tags' searches still have bad cancelability. It is better, but not great. I'll keep working here. The cache replaces an existing one. It will take some time to build it on update. If you do not sync with the PTR, it should just be a few seconds. If you sync with the PTR on an SSD, it should be 5-15 minutes (on my heavy client with a nice SSD, it was 7 minutes). If you sync with the PTR on an HDD, it will take significantly longer, so please plan for it. If you sync with the PTR, you will see some numbers count up as it builds the different parts of the cache. There will be some deletion work to start, then counting up to perhaps a million, and then up to 16 million or so, at about 30,000 a second. I have more plans here, and more work to do to optimise the tag display system, but I will let this new cache breathe for a bit before going back in here with a machete. full list - new tag caches: - as 2020 ended, I attempted but failed to tune fast search for all kinds of clients, big and small and simple and complex. unable to guarantee decent speeds with just code, I have redesigned the tag text search cache. rather than checking the gigantic master table for all namespace and subtag lookups, the client can now zoom in on a small fast cache limited to the current search context, so doing a clever lookup on 'my tags' will no longer be hampered by having PTR beside it, and doing a solid lookup on the PTR or 'all known tags' will no longer be accidentally hampered by an optimisation for another situation - the 424 update will take some time to generate the new caches for your existing data. if you don't sync with the PTR, it should be a few seconds. if you do sync, it will be about ten minutes on an SSD (seems about 30,000 definitions a second), and somewhat longer on an HDD. it will count up the tags as it goes, and on the PTR there will be a bit of deletion work, then one or two counts up to perhaps a million, and then one big count up to about 16 million. - in my initial tests, this cache adds about 1-2% additional processing time to mass tag changes, but a wide variety of tag lookups and file searches are now significantly faster, have much nicer worst-case lag spikes, and should cancel quicker. these are best in any specific tag domain, although 'all known tags' should still be much better. a future expansion of the tag cache is planned to finally address clean and accurate 'all known tags' searches - summary; all these should be faster and cancel faster: - autocomplete searches for 'subtag*' (most normal searches) are optimised - autocomplete searches for 'namespace:*' are optimised, including when the namespace itself is a wildcard - autocomplete searches for wildcards with an asterisk in the middle of the subtag are optimised - autocomplete searches for wildcards with an asterisk at the beginning of the subtag are optimised (but this is still generally the slowest query) - autocomplete searches for namespace and subtag wildcard combinations are optimised, with either or both as a wildcard of any type - autocomplete searches for '*' are optimised - tag file searches without a namespace (i.e. in file search, with any namespace) are optimised - namespace file searches are optimised, including when the namespace is a wildcard - wildcard file searches are optimised, for all the classes of wildcard above - 'tag as number' file searches are optimised - 'has ><= x namespace tags' file searches are optimised for speed, including when the namespace is a wildcard, but still have bad cancelability on large domains. I'll work on this more - . - other tag cache info: - the 'tag text search cache' regeneration routine under the _database->regenerate_ menu is replaced with a service specific routine for the new cache

[Expand Post]

- on boot, if the client sees any of the new cache tables are missing, it notifies you and regenerates the affected subsection of the cache - an old method of performing complex wildcard searches was using surplus data and has been eliminated. these searches are now also computationally cheaper beyond the other domain-based optimisations this week - I have identified the next bottleneck in the tag search pipeline and have a plan to speed all the above up even further, which can all be done in code - thanks to user feedback, I have also identified other wasteful overhead in tag processing. I'll keep working! - while the planned 'all known tags' cache will be useful since most file searches are in this domain, it will be a bit of work, so I will first let this new lookup cache breathe for a bit. 'all known tags' will not be nearly as big as the 'all known files/combined file' caches that have hit us with so much CPU recently. I expect it to increase the client.caches.db size by about 5% - unified all increments or decrements to autocomplete count caches, no matter the service domain, to one location - unified how autocomplete counts are fetched across different service domains - optimised specific and combined autocomplete count cache update overhead for new, existing, and deleted tags - optimised display autocomplete count cache updates for tags with multiple siblings or parents - optimised the 'local tags cache', which does fast tag text fetching for local files, when new tags or files are added/removed from the 'all local files' domain. this now occurs in the same unified autocomplete count update process. it now also caches pending tags that have no current count - merged 'exact match' autocomplete tag searching code into generalised wildcard search - misc autocomplete and other tag code cleanup and harmonisation - ditched some old mass UNION queries that were not cancelling well - . - the rest: - when you paste queries into a sub, the summary 'these were/were not added' dialog now always appears, and if you paste empty whitespace, it now says so - the manage siblings/parents dialogs now specify which services apply which siblings, whether they are fully synced, the current display tag sync maintenance settings, and ultimately whether you can expect changes to apply quickly after dialog ok - when a text entry dialog comes with suggestion buttons, it now focuses the text box by default. sorry for the trouble here! (issue #765) - updated a couple petition reason suggestions in manage tags and parents - added a shortcut to 'main window' to refresh _manage tags'_ related tags suggestions with 'thorough' duration. in future, these dialog-specific actions will be moved out of 'main window', these have just been a 'temporary' patch - updated the 'running from source' and 'install' help with some new numbers and info about mpv, and updated the 'server' help with a document helpfully provided by a user explaining that the server does not do what many new users think - sped up 'has tags' file searches in certain situations, mostly when there are few if any other search predicates - the default e621 parser now pulls meta tags, thank you to a user for providing this - the default nitter timeline url classes are updated, thank you to a user for providing this - the new little hook that takes 'file:///' off of paths pasted into the filename tagging path text now also normalises the path, so if you are on Windows, the URI's slashes will be Windows-corrected to backlashes. it also now removes wrapping quotes - the hydrus logger again correctly restores stdout and stderr after it is closed on program exit (this was disabled for some reason, but fingers crossed it seems fine now!) - an issue where automatically started duplicate potentials file search could not cancel when shutdown 'stop work' button was clicked or where idle maintenance mode turned off should be fixed - the shutdown maintenance work for the first client shutdown now has a little text saying it is just some quick initialisation work - for hopefully the last and completely final time, I think I fixed the invalid tag repair function for certain sorts of tags applied to currently local files - improved the way a job thread was pulling new jobs (issue #750) next week The poll is done! Here's the link again: https://www.survey-maker.com/results3310902xA574481e-102#tab-2 Multiple local file services has won. It looks like better URL sharing and file alternates will be soon after, as well. Thank you for voting–seeing what isn't popular is as useful as seeing what is. Unfortunately, I cannot start that immediately. I have a fire to put out next week related to the network objects lagging too much when saving their updates. I will spend the rest of Q1 doing the delayed network improvements. So, with luck, I will get going on local file services in Q2. I also have a ton of messages to catch up on!

Anonymous 01/07/2021 (Thu) 14:21:27 Id: 64f296 No. 15075 >>15087

What's the difference between multiple local file services and multiple databases?

>>15087

Anonymous Board Owner 01/08/2021 (Fri) 20:26:45 Id: 52c45d No. 15086

While the new search cache is working great on normal search pages, on 'all known files'/'PTR' domain, which you usually see in 'manage tags' for the PTR, performance is bad. Often 2-6 seconds to fetch results. I regret this, and I am sorry for the inconvenience. I have identified the slowdown to one link in the chain, and will work on it next week.

Anonymous Board Owner 01/08/2021 (Fri) 20:30:10 Id: 52c45d No. 15087 >>15090

>>15075 The main benefit is everything is shared in one client database. As well as basic simplicities like only one client to backup or have options for, you will also be able to search in more than one local file service at once, store files in more than one services at once (and only need one copy of the file), only have to sync to the PTR or do other large tag/db maintenance for one database. It may open up some interesting new processing workflows as well.

>>15090

Anonymous 01/09/2021 (Sat) 14:42:48 Id: cab0c2 No. 15090 >>15092

>>15087 Then you should provide a way to import one database into another without losing/resetting any metadata like "time imported" for example. Or else this feature is completely useless to me because I don't want to lose that from my multiple databases. I'm not going to create a new database and spend hours or days setting everything up, then importing all the files from my old databases and lose all of their metadata in the process.

>>15092

Anonymous Board Owner 01/09/2021 (Sat) 18:33:19 Id: 52c45d No. 15092 >>15098

>>15090 Yes I agree, I will have some sort of client-to-client transfer as part of this, so users who have been using multiple clients will be able to merge neatly and preserve time and archive metadata. Around that time I'll likely have to introduce rating export/import too, and perhaps URLs as well. The good news is now we have the Client API, this should be neatly possible as a direct client-to-client connection.

>>15098

Anonymous 01/11/2021 (Mon) 14:22:16 Id: cab0c2 No. 15098

>>15092 Thank you.

Release Tomorrow! Anonymous Board Owner 01/13/2021 (Wed) 06:31:13 Id: a89ea8 No. 15108

I had a good week. I was not able to fit much interesting fun stuff in, but I fixed the slow tag autocomplete search in the manage tags dialog, sped up tag processing, fixed some 'ghost' tag bugs, and reduced some wasted CPU in the network engine. Overall, the client should be a bit neater and faster in 425. The release should be as normal tomorrow.

Quick Reply


Sage Bypass Check