>>11987
>>11988
>>11989
>>11990
>>11996
Yeah, most of these sorts of spammy outcomes are due to a script misfiring. Either something generating .txt tag import files, or a website that exposes the wrong tags in the wrong place. Thank you for fixing them. I do not approve what tags go up (there are like 250,000-1,000,000 a day, and I can't see what files they apply to unless I have them on my irl client), but I do approve what gets deleted.
For the PTR, I am really pleased with how many users have contributed and written clever systems to populate it, and I am stunned I have been able to expand it to deal with so many tags, but there is also bunch I am not happy about. The main problems from my perspective are:
1) Messy standards. Conflicting booru styles and individual opinions on what is 'good' mean for conflicting search terms, which tag siblings cannot always fix, especially when what a good tag sibling in the particular case is also uncertain.
2) It is too fucking big. About 450 million mappings atm, which makes for a gigantic db.
3) Too many tags people don't care about. 'title' tags are not as great as I once thought they would be, and there are a ton of shit unnamespaced tags parsed from filenames and tumblr-style-linked-tags-where-many-separate-tags-become-one-mess.
4) Bad tags/files hang around. Even when a tag is deleted, its master record for the tag and its file hash remains. This bloats up db size and will have to be addressed at some point.
Current plans to address these are:
1) Improve tag filtering and tag siblings over the long term. I'd like to update the current 'tag censorship' system to use my new flexible tag filter object and add a db-level cache to compute and search this stuff quickly. Tag siblings could also do with vast improvement to define different types of sibling and allow personal preferences (it'd be nice if you could choose whether you want the 'clothing:' namespace to display, for instance, while still recognising and agreeing as a community that 'clothing:black socks' and 'black socks' are synonyms.
Unfortunately, I am just a dude with few social skills, so I will not ramp up and recruit some janitors to try and police and moderate what is on the PTR. It'd just kill me with stress trying to deal with all the conversations and dispute resolution. If others want to create tag repos that have stricter upload standards, please feel free–I just am not that guy.
2 and 3) I would like, this year, to split the PTR into multiple repos. One for series/creator/character/person 'big' namespaces that are extremely difficult to dispute, one for smaller namespaces like clothing and species, one for unnamespaced tags, and one for often-unique stuff like title and filename namespaces. The low-incidence, long-length tags like title tags bloat up client.master.db and lag out many autocomplete queries, and are not useful for searching. My belief now is that tags are for searching, not describing, so title information is not useful by default, only for enthusiasts. When I do this split, nothing will be lost from any client, but you'll be able to choose what tags you want to sync with. This will lighten the db size for everyone who just wants to search for 'character:samus aran'. I'll also look into integrating the new tag censorship's tag filter into the sync process to further lighten your basic db size and reduce processing time.
4) I don't have a firm plan for this, but I want to integrate some sort of recycling or resyncing process in a future network version whereby a repo can say 'all these hashes and tags are gone, all these update files from time x to y have been regenerated, please cull any orphans'. This would have both client and server interactions, perhaps even some kind of '2012-03 is nullified, please resubmit what you actually care about' and would aim to cull old and stupid shit.
These are all big jobs with significant changes to the client and server. I doubt I will be able to do them in normal weekly work. This stuff is on my mind, and if one of the problems becomes critical, I may have to take my executive privilege to override the next 'big thing to work on poll' and knuckle down and fix it.
Another idea I have long had is to do some tag metadata swapping between trusted clients. Now we have the Client API running, this is more and more of a possibility. It may make sense to ultimately transition away from reliance on a bigass central server and instead foster more of a distributed network.
The final objective of the hydrus network is to birth auto-tagging systems (neural network or whatever) so all tagging happens on your own CPU cycles
and birth an imageboard-cultured egregore waifu, and the only tag information shared between users is 'how to tag' metadata. The current tag sharing we are doing is excellent prep work to train our auto-tagging systems in future. True work on this will be many years from now and will be greatly shaped by how this tech shakes out IRL.