/hydrus/ - Next Big Job Poll Discussion

/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Mode: Reply

Name
Options
Subject
Message	Max message length: 12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 32.00 MB Total max file size: 50.00 MB Max files: 5 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

Board Locked? Request Reopening

APNG and GIF uploads are temporarily disabled while we deal with a spammer problem.

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

Next Big Job Poll Discussion hydrus_dev 04/10/2019 (Wed) 22:54:53 Id: 187563 No. 12152

Ok lads, as I am now finishing up OR search, I am soon going to be free to work on a new 'big job'. I am pleased that I was able to make simple Client API and OR search in much faster iterations than previously. I hope to continue like this, keeping the next big job 8-12 weeks at the most before running a new poll. The current list is: Just catch up on small work for a couple of months Reduce crashes and ui jitter and hanging by improving ui-db async code Clean up code and add unit tests Improve tag siblings/parents and tag 'censorship' Add ways to display files in ways other than thumbnails (like 'details' view in file explorers) Add text and html support Add Ugoira support (including optional mp4/webm conversion) Add CBZ/CBR support (including framework for multi-page format) Add import any file support (giving it 'unknown' mime but preserving file extension) Improve 'known urls' searching and management Explore a prototype for neural net auto-tagging Add support for playing audio for audio and video files Add ui for waifu2x and other file converters/processors Write some ui to allow selecting thumbnails with a dragged bounding box Add popular/favourite tag cloud controls for better 'browsing' search Improve the client's local booru (this likely now means a backend migration to the Client API) Improve duplicate db storage and filter workflow (need this first before alternate files support) Improve shortcut customisation, including mouse shortcuts Add ratings import/export, and add 'rating import options' to auto-rate imports Add more commands to the undo system Improve display of very large/zoomed files in the media viewer Set thumbnail border colours on user-editable rating and namespace conditions Improve hydrus network encryption with client cert management and associated ui Add tag metadata (private sort order, presentation options, tag description/wiki support) Improve file lookup scripts and add mass auto-lookup Add multiple local file services (which will enable true nsfw/sfw partition) Add an incremental number tagging dialog for thumbnails (for adding page:n etc… to a sequence of files) Permit custom ordering of thumbnails, through mouse-dragging or otherwise Allow user to have multiple open split tab columns or separate windows with one or more pages Improve rating workflow by providing score representatives to compare with Add file modified/creation timestamp searching and sorting Write an URL Repository so clients can share known url mappings Add animated thumbnails for videos (animating on mouseover) Allow multiple custom 'open externally'-style file launch commands for files Add version tracking to downloader system objects and explore remote fetching of updates Expand file notes system I will put up this poll with the 349 release post and then select whatever seems to be on top by 351, with the proviso that I will try to discount any non-organic (e.g. botted) votes. You will be allowed to vote for multiple items. I am happy to work on any of it. Please feel free to suggest new items or ask for longer explanations of any of the above. I will edit the list as new items are agreed on.

Anonymous 04/16/2019 (Tue) 09:03:19 Id: e8c65d No. 12274

>>12272 https://github.com/SethMMorton/natsort there we have it, HyDev take note.

Anonymous 04/16/2019 (Tue) 09:23:40 Id: e8c65d No. 12275

>>12272 Also if you want a fast implementation https://github.com/sourcefrog/natsort

Anonymous 04/16/2019 (Tue) 22:42:27 Id: 82d237 No. 12277

>>12247 Yes, that's basically something like these: https://github.com/andrewekhalel/sewar or even https://github.com/lidq92/CNNIQA or https://github.com/lidq92/CNNIQAplusplus … and so on. Having these available in Hydrus' duplicate filter (or its revised version) should help a lot. That said, you should only expect imperfect reliability in fully automatic mode. You won't ALWAYS get very accurate scoring or even just be able to identify the "better" image automatically. These also generally can misidentify variant images as "the same but worse", and other mistakes like that. >>12152 Like the other anon implicitly did, I also propose the ability to run some more of these image quality / image similarity metrics as a possible feature. Would be nice to have more options to populate the filter and assisted automatic scoring from within the duplicate filter. At he same time, it would probably be a good idea to make the duplicate filter more modal. E.g. "this is above the certainty threshold you set up, so here's how the weighted scoring of the algorithms you picked would solve this and the corresponding scores - confirm?". Probably also needs some thought put into a colored multi-selection GUI thing that makes it quick to see the scoring & resolution to manually deviate and fix mistakes.

Anonymous 04/16/2019 (Tue) 23:26:57 Id: 7cbe7b No. 12278

I don't know if this is a feature already or not but a sibling thing that could have multiple dependencies so that it could go like >if [arist] + [ocname] then swap to artist:[artist] + character:[ocname](artist) would be very convenient, expecially for scrapping tags from places like furaffinity where people don't use underscoring properly

Anonymous 04/17/2019 (Wed) 09:07:34 Id: 56c741 No. 12281

On the subject of character name. Would there be a way, far in the future to link 2 tag together. Instead of: character:character_(series) use: [[character]]x[[series]] Where x is a dynamic taglink action. And while we're at it, instead of namespaces, multiple list of tagtype a tag can be presented as for that specific media. Tear me up, it's brainstorming more than a definitive solution. I don't even know if it is possible.

Anonymous 04/17/2019 (Wed) 10:28:49 Id: c228c1 No. 12282

>>12277 Never trust Neural Network systems verbatim. Always find an expert system that can work well first.

Anonymous 04/17/2019 (Wed) 12:29:43 Id: 8dee28 No. 12283

>>12282 >>12277 true, it wouldnt be perfect, but right now I have to parse between two images and when the quality is close, what could take 1-5 seconds takes 10-20 seconds. if we had dup tiers, like the current one is a blunt and stupid shit looks close, that is perfect, sadly I don't have any examples on had as I deleted them, but there was a scalie thread on /trash/ where some jackass came in, made the shittiest 'corrupt' images that were 7-8 times the good images file size, in a thumbnail looked passable most of the time, but full size, it was unrecognizable. if a dup filter was more accurate, these duplicates would get overlooked, what I would like is this current duptetector as a base line from this a more stringent dup detector to check the base lines work. this should filter out alternative images from normal dups and from there a far more stringent one that would do something like jpeg to png comparisons, and if they are close enough to trigger here, in nearly all cases the jpeg was at some point converted to png, so in this case, scrapping the png would be done as it would save space without needing to go though with a fine comb by hand, with that final level, imagine that the two images are just shown, right click is keep png, left click is keep jpeg, and due to how close they are you almost never click png. at least this is how i imagine it, dups going through 2 dup filters and then a third png to jpeg filter would turn the 10-20+ second checks into 1-5 second confirmations.

Anonymous 04/17/2019 (Wed) 16:29:17 Id: 82d237 No. 12285

>>12282 Of course I'd *also* want the usual expert systems from sewar and so on because they usually are faster, mostly very easy to implement, and even more suitable for some use cases. But actually we probably need both. The problem is that we don't really have an expert system that really can do the technical model analysis of something like this: https://github.com/idealo/image-quality-assessment

Anonymous 04/17/2019 (Wed) 16:46:23 Id: b5f753 No. 12286

>>12285 I will add more information to the Optimization thread on the list of expert system vs NN repos

Anonymous 04/17/2019 (Wed) 16:59:21 Id: 82d237 No. 12287

>>12286 Personally I preferred to KISS and just suggested a few python frameworks that might be easy to hack into Hydrus - but sure.

Anonymous 04/17/2019 (Wed) 22:16:11 Id: 27ff2a No. 12288

- parser revisioning - remote fetching - version tracking

Anonymous 04/18/2019 (Thu) 15:43:37 Id: 4c3fb6 No. 12293

>>12152 I'm not sure how big this is but here is a suggestion: implement an icon and search command for files which has any notes attached to them.

Anonymous 04/18/2019 (Thu) 18:22:16 Id: 77a73b No. 12296

A probably small thing that would help me a lot would be an "delete both" button in the duplicate filter - i know that i can press del but i usually dont have a hand at the keyboard while filtering and then i also still need to press delete both anyway…

Anonymous 04/18/2019 (Thu) 21:02:56 Id: 11f70d No. 12298

>>12232 PeTR Public Emacs??? Tag Repository

Anonymous 04/19/2019 (Fri) 00:35:05 Id: 27e638 No. 12299

- Cookie management from API - Tag statistics from API so if you search for a_totally_sfw_tag, and it produces a lot of creator:cname , you know you should sub to cname

Anonymous 04/19/2019 (Fri) 08:32:46 Id: b5f753 No. 12300

>>12298 Public Integrative Tag Repo Public Incorporative Tag Repo Public Interdependent Tag Repo Public Ingrained Tag Repo (PITR)

Anonymous 04/20/2019 (Sat) 18:43:18 Id: b5f753 No. 12313

>>12166 You might wanna get https://github.com/deanmalmgren/textract (this is some good stuff for text-like documents)

hydrus_dev Board Owner 04/20/2019 (Sat) 19:02:23 Id: 187563 No. 12314

>>12230 Thanks for clarifying. Unfortunately, this is not trivial to do for hydrus. Any paged system needs sorting, and hydrus supports many clever kinds of sort, so in order to implement this, I would need to load 'media' metadata for every file in a search result before I could fetch the first page of results. This would not save much time from the current system, where most of a search delay is in fetching that same media metadata. The proper solution, and I imagine how the boorus probably do it, is by having a sort cache (and they have page caches as well, and generally simpler searches to cache), to cross-reference search results against to figure out page slices. This is more complicated than I want to make hydrus search code at the moment. I am happy with being able to display and manage thousands of results at once in the main gui, and I also don't want to further complicate the viewer with paged management and load code. As you say, I encourage users to add 'system:limit=x' if they want less laggy searches. I would be interested in your further thoughts if you have certain scenarios where search is very slow. If there are particular instances where the client runs very slow for you, I'd love to help it run faster.

Anonymous 04/20/2019 (Sat) 19:27:28 Id: b5f753 No. 12315

>>12213 IPFS repos would be important, also an advanced API that can trade IPFS hashes and images would be sweet

hydrus_dev Board Owner 04/20/2019 (Sat) 20:05:01 Id: 187563 No. 12316

>>12234 >>12233 >>12235 >>12243 >>12245 >>12248 Yeah, I am mixed on cbz. I like the idea in the sense of waving a magic wand and having great support, but I can't do that and I know I fall to feature creep too easily. If this is voted on, I would try to make very simple support and see how that goes, and then iterate on it in future if it proves popular. I can't out-compete the programs already out there, but I can do some simple stuff, and ancilliary code like navigating multi-page single-file media will have uses for things like file alternates. I really want all future big jobs to be small improvements and experiments, ideally 6-8 weeks and pref no more 12, so I don't get bogged down like the downloader engine overhaul. I am open to experiments that fail and don't want to get emotionally attached or fall into sunken cost fallacy.

hydrus_dev Board Owner 04/20/2019 (Sat) 20:06:37 Id: 187563 No. 12317

>>12236 Thanks. Yeah, I would like easier sibling workflow, including from the right-click menus, as part of a tag sibling/parent improvement. I would push in this direction with "Improve tag siblings/parents and tag 'censorship'".

hydrus_dev Board Owner 04/20/2019 (Sat) 20:10:26 Id: 187563 No. 12318

>>12240 Yeah, your dialog mock-up is exactly the sort of thing I was thinking of. I'll have a new options panel somewhere that turns on the advanced mode of this dialog and let you set up some favourite reasons and custom entry. Now that I have the 'set a reason' infrastructure in place, this will not be super difficult to add. I expect to have it in in the next few weeks.

hydrus_dev Board Owner 04/20/2019 (Sat) 20:17:03 Id: 187563 No. 12319

>>12247 >>12277 Yeah, my first push here will be to set up a system that permits auto-decisions in a sensible and generic way, along with user ability to control what is permitted, and then in future hang new auto-decision systems on it. The 'this is a png copy of a jpg' seems like a nice simple way to start, and I know I can do very quick detection of that by just hashing image pixels. Then maybe explore some 'this jpg is definitely lower quality than this one of same resolution' stuff. The way to slice through dupe mountain will be through automatic systems to reduce the human drudgework, but I am similarly leery >>12282 >>12283 of anything too clever/vapourware to start with. Most of all I want to get the infrastructure and maintenance processing code in, and then we can test all kinds of different comparison systems for our exact purposes.

hydrus_dev Board Owner 04/20/2019 (Sat) 20:22:14 Id: 187563 No. 12320

>>12272 >>12273 >>12274 >>12275 Thanks. I have 'human' number sort capability in hydrus already, although I am sure there are still places to apply it. All numbered tags should sort like this atm. I am pretty confident I can get directory listings and file access of rars and zips with the python libraries I already have. Any first version of a cbz viewer would be simple and just read through the internal pages one by one, no bookmarks or per-page metadata or anything. Just something that lets you penetrate the 'list of numbered jpg' zips already in your db in the media viewer (and rename to .cbz or whatever, so you can 'open externally' to your preferred comic reader).

hydrus_dev Board Owner 04/20/2019 (Sat) 20:25:29 Id: 187563 No. 12321

>>12277 For new image recognition techniques, yeah, I designed the search system to make this possible. Much like >>12319 , the main push of duplicate detection 1.0 was to build a search system that could handle many search systems and hang one simple 'looks like' system on it. I can fairly easily add new techniques to support rotation or colour similarity or whatever on it now. This would not be my urge at the moment, as this simple system we already have the biggest problem is there are way too many to go through, so the processing workflow is now the weakest link, but once we have that more under control I can work on this.

hydrus_dev Board Owner 04/20/2019 (Sat) 20:28:37 Id: 187563 No. 12322

>>12278 Yeah, that's a tricky one. I know exactly what you are talking about, and I would love to have a nice system for it, but the actual guts of how 'if … then' tag relations would work are way more complicated than I am confident I can currently support. For siblings and parents, my first priority is to improve the data store behind the whole system first. Once that isn't on fire behind the scenes, I'll consider carefully adding this sort of power. I am sure it could go very wrong if not thought about, so it'll be baby steps until we have some real world experience.

hydrus_dev Board Owner 04/20/2019 (Sat) 20:38:31 Id: 187563 No. 12323

>>12281 Both of these thoughts are on my mind. Adding tag siblings revealed to me a big set of pain in the ass problems related to tag definitions I had not considered before. For both of these, I think the ultimate far future way to solve them is to have a tag definition structure "Add tag metadata (private sort order, presentation options, tag description/wiki support)", where clever metadata can be applied to tags. So you could say: character:shimakaze (kantai collection) And the tag definition, which would essentially be a cleverer iteration of the current siblings and parent system, would say "this has 'series:kantai collection' parent" and also perhaps "this can be displayed to the user as 'character:shimakaze'" without destroying the unique tag identifier through merging with some 'character:shimakaze (my oc series, donut steel)', basically being aware of the (kantai collection) after the main tag. I am very much on the side of letting users display tags how they want, as there are many different spergy desires here, and having a system that recognises info about tags lets us do mass management rather than the current per-tag mess and endless firefight. Same for namespaces. I am experimenting with 'clothing:' namespace on the PTR, but I know some users hate that. It would be ideally better if the tag 'bikini' had the 'property' of "clothing" rather than an explicit namespace to argue over, and then a user could say 'when a tag has "clothing" property, display it as namespace'. As it is, I expect my next step here will be more in line with little patches. Namespace sibling control (like saying 'display all creator: tags as artist: please' or 'display all clothing: as unnamespaced') seems an easy-ish next step. "Tags were a mistake." - t. hydrus_dev

hydrus_dev Board Owner 04/20/2019 (Sat) 20:40:31 Id: 187563 No. 12324

>>12288 Thank you, I am adding 'Add version tracking to downloader system objects and explore remote fetching of updates' to the list.

hydrus_dev Board Owner 04/20/2019 (Sat) 20:42:22 Id: 187563 No. 12325

>>12296 Thanks, this is actually a small thing. I assume you want no duplicate action applied, just a basic 'get rid of these two shits' and move on to the next decision? I'll see if I can add a button to quickly do this for 349 or 350.

hydrus_dev Board Owner 04/20/2019 (Sat) 20:45:43 Id: 187563 No. 12326

>>12299 Thank you, I have api cookie management in my current to-do. I won't work on client api as a big job in this cycle just so it has some time to breathe. Can you explain the 'tag statistics' idea a bit more? Could this be something to apply to the program more generally, rather than just the API, like something to click that says "show me what artists I like and do not sub to?" Subs need a db-level data overhaul before I can do clever inspection about them btw. But this is something else to integrate into the client api as well–managing subs.

hydrus_dev Board Owner 04/20/2019 (Sat) 20:48:38 Id: 187563 No. 12327

>>12293 Thank you. There are multiple jobs for improving notes that I have not been able to get to. I am adding "Expand file notes system" to the list to cover a general push in this direction. This would include multiple notes and likely connecting the notes system to the downloader as well.

Anonymous 04/20/2019 (Sat) 22:20:25 Id: e607d6 No. 12333

>>12325 Yes, thank you very much!

Anonymous 04/21/2019 (Sun) 07:51:30 Id: dc31c6 No. 12334

>>12152 Hydrus is already pretty great, when it's stable. The main 'feature's i'd like to see worked on next are getting pixiv scrape working again and more login manager support for big sites like Fur Affinity and Ink Bunny. Having hydrus scrape a largely nsfw site and miss all that is kind of pointless because of no login.

Anonymous 04/21/2019 (Sun) 10:31:53 Id: 8dee28 No. 12335

>>12316 sadly the only program that you would be up against is comic rack and acdsee in terms of full featured program and in terms of comic rack, nearly every feature they have, you also have, and in terms of acdsee you either use crash prone versions from 10+ years ago (version 8/first pro to version 9) or you use the recent versions which are pure bloat for unicode support, there is very little middle ground with acdsee due to so many of the in between versions fucking with features. >>12319 I honestly never want an automatic sort, as much as it would be good for going though my cluster fuck, I am still getting images that are full featured images going against 23 byte black boxes. running into this makes a pure auto system unacceptable to me, I would accept an auto system that looks at two images, and then has me spot check lets say it finds a better worse pair, it presents me the better and it presents me the wrose green border is better, red border is worse I scroll though like normal, but left click confirms green, right click confirms red. Confirmed green acts like better worse does currently, confirmed red takes it out of the auto figure it out area and into a manual pick.

Anonymous 04/22/2019 (Mon) 04:41:45 Id: b8262d No. 12348

Are there any plans at all to (at least have the option to) keep metadata of individual tag mappings, like a file's tag history or keeping track of which process added this tag to this file? Like was it typed manually, was it imported while scraping, etc. I assume this would bloat the DB by a big factor but at least for my personal use I think that might be worth it down the line

hydrus_dev Board Owner 04/24/2019 (Wed) 21:46:41 Id: 187563 No. 12359

Poll >>12358 !

Index Catalog Archive Top Reply

Manage Board Moderate Board Moderate Thread

Forms

Delete

Password Unlink (Removes file reference from posts) Delete (Removes file from the server)

Report

Reason Category Global

No Cookies?

Quick Reply


Sage Bypass Check