/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Index Catalog Archive Bottom Refresh
Name
Options
Subject
Message

Max message length: 12000

files

Max file size: 32.00 MB

Total max file size: 50.00 MB

Max files: 5

Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password

(used to delete files and posts)

Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

Uncommon Time Winter Stream

Interboard /christmas/ Event has Begun!
Come celebrate Christmas with us here


8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

Big things to work on next hydrus_dev 10/31/2018 (Wed) 21:34:04 Id: a13630 No. 10429
With the download engine and login manager coming to a close, I will need something new to be anxious about and near-overwhelmed by. I will put up a poll in a few weeks for everyone to vote on a big list of possible new features that are too large to fit into my normal weekly work. The poll will allow you to vote on multiple items. I hope to work on the most voted-on item for two to three months before starting the cycle again. This thread is for discussion of the list, which at current looks like this: - Just catch up on small work for a couple of months - Improve tag siblings/parents and tag censorship - Reduce crashes and ui jitter and hanging by improving ui-db async code - Speed up tagging workflow and autocomplete results - Add ways to display files in ways other than thumbnails (like 'details' view in file explorers) - Add text and html support - Add Ugoira support (including optional mp4/webm conversion) - Add CBZ/CBR support (including framework for multi-page format) - Add import any file support (giving it 'unknown' mime) - Improve 'known urls' searching and management - Explore a prototype for neural net auto-tagging - Add support for playing audio for audio and video files - Add OR file search logic - Add an interface for waifu2x and other file converters/processors - Write some ui to allow selecting thumbnails with a dragged bounding box - Add popular/favourite tag cloud controls for better 'browsing' search - Improve the client's local booru - Improve duplicate db storage and filter workflow (need this first before alternate files support) - Improve shortcut customisation, including mouse shortcuts - Import/export ratings, and add 'rating import options' to auto-rate imports - Add more commands to the undo system - Improve display of very large/zoomed files in the media viewer - Set thumbnail border colours on user-editable rating and namespace conditions - Improve hydrus network encryption with client cert management and associated ui - Add tag metadata (private sort order, presentation options, tag description/wiki support) - Write a repository-client refresh/resync routine to clear out junk data and save space - Prototype a client api for external scripts/programs to access - Support streaming file search results (rather than loading them all at once once the whole query is done) - Increase thumbnail size limit (currently 200x200) - Add an optional system to record why files are being deleted - Improve file lookup scripts and add mass auto-lookup - Cleanup code and improve practises - Add multiple local file services - Add an incremental number tagging dialog for thumbnails I am happy to work on any of these items. If you have questions, please ask, and if you have suggestions for new items, go ahead.
(301.47 KB 300x300 ezgif-4-0302794f5ae3.gif)

GIVE ME MY GENERIC MIME REEEEEEEE
(85.27 KB 1080x1080 d4kwm8b7h9v11.jpg)

>Add an interface for waifu2x and other file converters/processors Please bless us with this. Having this as a base for future duplicate handling improvements or file managing would be great. Godspeed, devman.
better handling of image groups/collections, like manga or the multi image posts you get from pixiv. now you can mark them as alternates, but that is lacking several things: -displaying groups as one image on pages (like the current collect feature) -(the most important for me:) handling groups as one entry in all queries. so if i search my db, the groups are automatically displayed as one image. I either get the entire group as a result in a search or none of it. they should count as 1 in system:limit, etc. -automatically marking downloaded images (eg. from pixiv) as groups in the downloader/parser -easy way to switch between images of a group in the viewer
>>10429 multi-file lookup scripts WHEN but more seriously I think audio is what hydrus is missing most right now.>>10429
>>10432 Unfortunately, the db is not ready for this. I will first have to do the "Improve duplicate db storage and filter workflow" item first to prep the db for alternate support. I am now editing my OP to reflect this. "Improve duplicate db storage and filter workflow" is now "Improve duplicate db storage and filter workflow (need this first before alternate files support)"
(1.14 MB 237x237 ezgif-4-77df5a6461ec.gif)

ALSO: Use Alternates flag in the dupe filter to set up imageset collections, then allow grouping by Alternates.
>>10435 Fuck me you guys already got to this >>10434 >>10432
- Prototype a client api for external scripts/programs to access this feature can include this from third party - Add an interface for waifu2x and other file converters/processors - Add more commands to the undo system - Add tag metadata (private sort order, presentation options, tag description/wiki support) - Add Ugoira support (including optional mp4/webm conversion) - Explore a prototype for neural net auto-tagging - Improve hydrus network encryption with client cert management and associated ui - Improve the client's local booru partially this - Add CBZ/CBR support (including framework for multi-page format) - Add text and html support - Improve 'known urls' searching and management - Improve duplicate db storage and filter workflow
Add option to delete files after export
also what is not here is specific tab general tag management - view total of your tag - statistic - total image per tag - tag overview - replace tag/namespace - delete/rename tag - tag/namespace chart
>>10430 This means "Add import any file support (giving it 'unknown' mime)". I will add this to the list now.
i also wish to have some feature that can make hydrus possible to replace comicrack/comic manager software feature that i can think of - group tag - better image group
>>10433 Yeah, lookup scripts got left behind during downloader. They were prototype for the new parser system, but so v0.1 ugly behind the scenes that it was a super hassle to fix them up. Would you like me to add a 'clean them up and add auto-lookup for multi files' to the list? I see multi-lookup as being a queue you add files to, and then the client works that queue according to a bandwidth control, like one every ten seconds. It'd be a maintenance time thing.
(10.37 KB 294x229 vMSg0vI.jpg)

>>10431 I second this
>>10439 This is a small thing. I will add this to my regular to-do. I might be able to get it done in the next few weeks.
>>10440 I hope some of that would be covered under the tag siblings/parents dejanking. I hope in that job to add namespace renaming (like 'all creator: tags now become artist: tags'). Can you think of a good way to say the rest of your items there, maybe "Write a tag statistics review panel"?
>>10442 I feel the best way forward here is adding CBZ/CBR support. As well as supporting import, this item would include ui updates so the media viewer is aware of multi-page media and has some better way of navigating it. It might also include bookmarking. Better handling for smaller groups, likes recolours and other alternates, will first require the "Improve duplicate db storage and filter workflow" task so the db is ready for it.
Speeding up the tagging workflow sounds like one of those things that would pay greater and greater dividends the sooner it gets implemented, though I don't have any recommendations for how the workflow would be sped up. Making an api could have the same effect. Some users could make scripts for other things on this list like an auto-tagging nn while you work on things the api can't address. Then you could swing back around later, grab the user made scripts, clean them up, and implement them directly as features.
>>10429 I think the sooner we get tag metadata support the better, considering just how much of a mess just PTR is. Also, fuck people who put filename: or source: as tags in the PTR.
>>10429 call me an idiot but my list is yuge. >>10364 Dat Protocol (or just improving IPFS functionality) >>10323 Moar MIMEs and media support >>10101 and >>10103 (scraping tweets/tumblr/dA/etc with media) >>10102 (PTR clean up and standardization) >>10203 (standard thumbnail sizes) >>10168 and >>10199 and >>10200 and >>10201 (API support for mobile) >>10047 (image converter support for Waifu2x and DeepCreamPie) >>10062 (misc. UI changes and hinting) >>9927 and >>10367 (HTML parsers for online texts, fanfics, webcomics, and others) >>9599 and >>9915 and >>9924 (comments in scripts for readability) >>9397 (WhatAnime support) >>8971 (manga site support) >>8961 (EBook support for online libraries) >>8320 (dupe automation) >>7596 (image square comments and scraping) >>7405 (fixing IPFS "nocopy") >>3722 (IQDB access for multiple sites) >>3665 (audio and video metadata) >>9391 (moving from Wx to Qt for better performance/aesthetics) >>9281 and >>10361 (advanced de-dup algorithms and OpenCV3) >>10290 (audio fingerprinting and dedup, and possibly audio downloader) >>10232 and >>10272 (tag fuzzy search) >>9881 and >>9882 (other P2P or blockchain solutions) >>9660 and >>9670 (multiprocessing for parsing, multithreading for parallel downloads) >>7525 (SMB/FTP or multiple client/"proxy" download architecture when you have multiple IPs) >>10450 (proper documentation on how the code works) Others: Video and audio downloader/scraper and/or pantsu.cat/nyaa.si-compatible import and tagging Youtube/"Alt-tubes", Soundcloud/Bandcamp subscription and downloader Illustration to Vector support for machine learning based image tagging Derpibooru-level sophistication on tag logic e.g. AND/OR/NOT and hotkeys (https://derpibooru.org/search/syntax) Using an API to create a Discord/Telegram/Matrix image search/randomization bot RSync/Duplicity proof image pack + database sharing for larger bulk "whole booru archive" (with data format) Using faster non-SHA256/SHA1/MD5 hashes to search for exact file dupes (BLAKE2, KangarooTwelve, SKEIN) Further support for yiff.party, fantia.jp and enty.jp (you know the kind of websites these three are) artist gallery/favorites from dA/Pixiv => tag/description pattern => recommended keywords artists favorites download from dA pixiv => favorited artists pattern => recommended other artists tag cloud and clustering with statistics on relatedness (possible tag parents/siblings) Tag translation and dictionary (*booru and dA English tags <=> Pixiv's japanese tags) Tag auto-converter for between-booru compatibility (e.g. episodes for derpibooru, extra meta for danbooru) A better way to separate the proxydownloader, tagger/user/community and distribution/IPFS servers
(237.10 KB 640x696 look.webm)

Descending from most wanted: 1. Make the gui NOT completely broken on Linux/Gnome 2. Add support for playing audio for audio and video files 3. >>10431 4. Prototype a client api for external scripts/programs to access (could be combined with 3.) 5. Add popular/favourite tag cloud controls for better 'browsing' search 6. View file metadata on image viewer ( https://github.com/sbraz/pymediainfo ) 7. >>10430
I'd say upgrading current hydrus mechanics should be the priority. Stuff like - Decrease tag sibling/parent jank and improve tag censorship - Reduce ui jitter and hanging by improving ui-db async code - Speed up tagging workflow and autocomplete results - Add Ugoira support (including optional mp4/webm conversion) - Add import any file support (giving it 'unknown' mime) - Add support for playing audio for audio and video files - Add OR file search logic - Improve duplicate db storage and filter workflow (need this first before alternate files support) please, I just want to group my alternates - Add more commands to the undo system Image processing and the like might be a pain in the ass, but it still can be done outside the program, plus visual stuff isn't worth much without the mechanics to back it up.
I am going to ask for what I ask for every time. I require a way to have images display why they were deleted. Off the top of my head for quick buttons for why 1) Prefer alternate 2) Have higher resolutions 3) Garbage image 4) meme possibly making them user definable for quick note The autism in me can't leave well enough alone, when I see image deleted, I have to know why, and when I see its a good image, I have to re import it, making more work for myself. If I got this, I would actually be able to start culling and tagging my entire database, along with being able to actually use dup detector for thing other then dicking around. as for other things I would like to see, with cbr/cbz, I suggest having more formats that that just to cover them all, along with other things that would make using this for a manga reader a viable option, as it stands, I could not use this because it will rename everything and while Im ok with that on single images, for vast archives of manga thats is an absolute no go. I also like - Reduce ui jitter and hanging by improving ui-db async code would likely help me. also - Add support for playing audio for audio and video files or at the very least telling us there is audio with a thumbnail overlay would be nice.
>>10473 >I require a way to have images display why they were deleted. THIS SO HARD, and sadly for the same autistics reason
A big thing to work on that you might want to consider, is to change some of your bad habits; bad habits you are aware of (because I've seen you mention them before). Being more open with your code and actually using git and GitHub like they were intended and possibly accepting bug reports & pull requests. GitHub isn't just a glorified code-hosting site, and neither is git just a backup solution. You should realize that just because you accept contributions from others, Hydrus doesn't become yours any less. You even do that currently (just not in the form of code) and seem to have no issues with people providing tags, ideas, parsers, test & bug reports to you. I'm not sure where you're hung up with this whole thing, really. Also consider this comment from the AUR: > Indeed. To further explain the context for the benefit of others, the developer seems to have no experience or knowledge of OS best practices, general security concerns, or how to properly integrate software into any modern operating system. You can't package it as-is. He is receptive to suggestions and is respectful, but in general I think he is set in his ways. Another thing to point out is how he doesn't really use git- github is just a glorified webhost to him. No granular commits, no PR, no attention paid to issues. Just uploads a week's work in one commit. There are fewer commits than there are releases! > Because of this scattered haphazard strange development, changes are necessary when trying to integrate hydrus into a package manager. These changes either have to come from upstream, or the package maintainer. Since the dev doesn't use github-issues or accept PRs, if you want upstream changes you have to take your plight to either the imageboard or the discord. Both are filled with younger teens who also don't understand how computers works, and in general your voice will be drowned out by their insistence that the dev shouldn't waste time on things like that. Which is amusing, I don't think they appreciate how rare it is for a side project to consistently update week-by-week. There's plenty of development time to go around. > To be fair, some of the issues are inherit to targeting Windows as the primary platform. Also, I realize that you've been doing this for a long time, but preceding any potential contributions from others, you should try to figure out if you want to follow PEP8 (or maybe something slightly less strict, like PEP8 - line length limit). The current way your code is formatted (e.g. very many superfluous newlines that make the code arguably less readable) and things within it are named, is very strange. I am aware that you might say that this is the way you've traught yourself Python and to code over the years, but I don't believe that you've reached the limit of what you want to or can learn. Trying to follow some of the best practices and allowing others to contribute to the beautiful software you've created, would be a huge step in that direction IMHO. Just my two cents, I hope you realize that I mean absolutely no offense. I'm sure you've been asked this before/someone else requested it, but I really want you to reconsider once again; otherwise, when you eventually might stop working on Hydrus, the fans of your software will probably be stuck with an unmaintainable and confusing codebase. Thanks for Hydrus.
>>10475 To elaborate slightly on this. Something like PyCharm Community Edition (free!) could be used, even if you dislike it, to do much of the work for you when you are actually planning to "make everything nice", or doing huge refactoring. Some of your code tells me that you are not using any linting software or IDE at all to develop Hydrus. It also just really frustrates me that you communicate well with your userbase, listen to them and their ideas, but force everything through either this imageboard or Discord, instead of GitHub – which it is specifically _made_ for. Important insider knowledge about Hydrus and the development process and the more advanced scripiting is kept completely transient and temporarily in this imageboard, badly indexed and searchable because you seem to refuse to use the appropriate tools. Much of the goodwill and contributions (and potential contributions in whatever form) are filtered through this unapproachable development process facing the userbase of Hydrus, and I just don't understand why.
(763.87 KB 1334x767 2018-11-02_04-02.png)

(73.43 KB 1041x682 Screenshot_20181102_040642.png)

>>10475 >>10476 And to make it clear again, because I really fear that I might be sounding like an absolutely ungrateful douche – I really am thankful for Hydrus and for your continued development and excellent discussion/involvement with the community, and I mean absolutely no offense. What you've done is extremely impressive, and I doubt I could do it as well as you; and even if you ignore everything I said I'll still happily use Hydrus, and so will others of course. In no way do I want this to sound like a discouragement. I even wrote my own little attempt at a Hydrus clone in Python 3/Qt5, and feel like that I have a healthy (and more accurate) appreciation on how hard it would be to do on the full scale of Hydrus in actuality, so please understand that I'm not just mindlessly shitting on you.
>>10475 >>10476 >>10477 Here is a basic task for you: Write a standard documentation with psuedocode (and possibly UML) for Hydrus. See: >>10450 The dev said it himself that his personality isn't really that social, it is our job as part of the community to do what is best to help others. Can someone please spin-up a lightweight wiki so that we can actually make this work, and finally get to the bottom of feature request documentations? https://tiddlywiki.com/ or https://www.dokuwiki.org/dokuwiki or https://www.bookstackapp.com/ or http://www.xwiki.org/xwiki/bin/view/Main/WebHome would work for this
Have anyone read >>3760 ?
In order of importance: - Add tag metadata (private sort order, presentation options, tag description/wiki support) - Speed up tagging workflow and autocomplete results - Add import any file support (giving it 'unknown' mime) - Improve duplicate db storage and filter workflow (need this first before alternate files support) - Add support for playing audio for audio and video files - Add Ugoira support (including optional mp4/webm conversion)
>>10476 >PyCharm Incredible off topic question, but is pycharm good? I see very mixed opinions on my uni and online, but since it's made by jetbrains I assume it's pretty good. What do you think?
>>10479 Me being able to write some code and voice some suggestions and criticisms regarding the development process, does not mean that I have the ability or time to go through a multi-thousand LOC project and create comprehensive UML diagrams and documentation (what do you mean by pseudocode?) for it, especially as someone who is mostly unfamiliar with the codebase. Even if I did have time, I mentioned that it's very hard for me (and probably others) currently to even understand the code, due to the non-standard naming and formatting.
>>10483 It's the best Python IDE around, and I say this as someone who has programmed Python for years. I could get access to the Professional version, but I really prefer the Community edition, because it's mostly limited to functionality that I actually care about.
>>10484 He who names it, shall lead others to do it. > Multi-thousand An average homework can easily reach that, unless we are talking hundreds of thousands, it won't take long to document everything (2 months max, assuming we are doing it every weekend) I will volunteer as well if you can setup a self-hosted wiki http://awa.shoutwiki.com/wiki/Anti-Wikia_Alliance and http://www.pixelprospector.com/the-big-list-of-free-wiki-farms/ (Wiki Farms) Example: http://www.wikidot.com/ community site or http://wikispot.org/ >>10483 >>10485 We definitely need PyCharm
>>10476 >>10475 He has said before that he is not into collaborative programming or "mob programming", he would like to develop "alone" as he is not really that sociable when dealing with such things. Another concerning thing is that 8chan is currently being DDoSed, and Discord is one of the best avenue for communicating without "code clashes" (Matrix and/or IRC would be more FOSS, but I digress)
>>10486 That's such a dumb argument. I don't want to get into a fight here, but what I suggested is something the developer needs to inherently change about the development process, the codebase and his attitude towards contributions; as it currently stands I'm not even sure if any contributions would even be cared for (existing open pull requests for example have never been merged or even commented on) – and that's exactly the thing that I want the dev to reconsider. Your white knight attitude is nice, but creating wikis and telling people to just do the work, without a guarantee that it's even useful work, is not a good approach, even if that sentiment sounds noble to you. > An average homework can easily reach that I've written quite a bit of code for homework (Comp. Sci.) and for years in my personal time, and the "average homework" or project certainly didn't reach multiple thousand lines of code – barely hundreds, if that. I'm studying in Germany, for the record. So either your code is very repetitive and not DRY or you have a completely different curriculum with focus on different subjects. However that's irrelevant, me having time for my university work does not equate to having even more time to do this (considerably larger) amount of work for Hydrus. > it won't take long to document everything (2 months max, assuming we are doing it every weekend That may be, but currently I'm not sure if those contributions would just be ignored, so I'm definitely not wasting my time on the off-chance that the dev might decide to pay attention to any of them. Also see below: > He has said before that he is not into collaborative programming or "mob programming", he would like to develop "alone" as he is not really that sociable when dealing with such things. I know, that's why I suggested he would reconsider.
I feel like at this point all I really want is UI improvements and true async. And maybe audio for webms. Also as a quick chime-in on the development side of things - even if the project started going all in on git with PR/issues, I don't think the codebase would actually get many contributions in its current state. The current situation makes it so that you get autists trying to "help" by spamming links to a billion libraries and technologies that are barely relevant, which is a complete waste of time as well. If I was maintaining this I sure wouldn't like all that "we" bullshit - if you want to get something done might as well do it yourself. I think a better way to go would be through an internal clean-up of the codebase first. Maybe bundling that with a Qt port would be a nice way forward.
>>10473 >>10474 Could you not just meta tag these? On delete, wipe the tags and replace with your meta tag reason. It should show up again if you re-import or if you highlight the empty placeholder thumbnail.
>>10459 >1. Make the gui NOT completely broken on Linux/Gnome This and faster autocomplete would make my user experience much better. I had to disable autocomplete because it would sometimes lock up the GUI for several minutes.
>>10493 Open up the log area, see on the right hand side how there is notes, I want to have something user insertable there so I can see why it was deleted. a quick select for reasons on delete would just facilitate a faster general delete. personally on delete I would like to see some custom buttons and text input so on delete it will write straight to notes what the hell is going on with the image. This would facilitate fast removal, and on a re find, a quick reason why it was removed.
>>10496 But deleted files are excluded by default…couldn't you just assume you deleted it for a reason?
>>10489 > So either your code is very repetitive and not DRY or you have a completely different curriculum with focus on different subjects. We do intensives in our country, our code could reach ~5k when it comes to larger projects but never break the 10k mark… and Hydrus is >100k > That may be, but currently I'm not sure if those contributions would just be ignored It definitely will as he has stated that he is like Shrek. I don't want to reiterate this. > I know, that's why I suggested he would reconsider. He has been steadfast for more than 2 years, why do I know? Because I voiced the same opinions about this two years ago. If you really want change, we must first show him our work is actually useful. Otherwise he will ignore it, and we end up making peripheral software like https://github.com/topics/hydrus
>>10444 Not him but I would like you to fix lookup scripts too. I use it a lot to tag images gotten from HentaiFoundry subscriptions. The big problem with it right now imo is that it doesn't work with a ton of boorus because it cannot generate a url that those boorus accept. For example danbooru md5 searches requires a "md5:" in the url which the current system can't generate, it only supports "md5=". Ideally I would like to be able to select any number of images in Hydrus and run a lookup script on all of them. Even better have a list of lookup scripts which are run one after another if the previous script is unsuccessful, and maybe a way to combine the results from all of the scripts as well. Please add to your work list.
>>10500 And also sha1 if you want it
>>10497 >just assume That's not how my autism works.
>>10429 Is there a netcode free version yet?
>>10499 Then I'm out. I'm not wasting 2 months of weekends to write documentation that might, maybe, if the stars are aligned well, will be considered a contribution. Fuck that. And exactly this is the attitude I'm trying to prevent in others. You can't reasonably expect people to invest a shitload of time into something on the off-chance that their efforts might not be completely ignored. If you have enough time to waste for that, feel free to. Not me.
>>10508 > I'm not wasting 2 months of weekends to write documentation that might, maybe, if the stars are aligned well, will be considered a contribution Then don't ask in the first place, learn to recruit and actually do, not just say something and not doing it at all. I don't mind working along side you, but we need to put effort on this.
Gentlemen if you don't like the way Hydrus Dev operates then fork it and accept community pull requests. There's no reason he should go back documenting his code and playing coordinator if he doesn't want to work with other people. Going back to the subject at hand a neural network autotagging feature sounds like a fun project. I've had great success in the past tying illustration2vec into Hydrus but the pre-trained models available were lacking accuracy for non-character tags. A reverse autotagging to petition wrong tags would be neat too.
>>10506 Yeah, I can understand that, but what other reason would you delete something than to have a good reason for doing so?
>>10474 >>10473 Thanks again. I am adding this to the list as "Add an optional system to record why files are being deleted".
>>10476 >>10475 >>10477 Thanks, I appreciate your posts. No worries. While I'd like to be better with my programming style and competence and openness, and I am in general trying to push in that direction, I am not totally sure how to accelerate my A to B on it. I've never been in github/open source community, and most of their way of doing things completely turns me off. I also know full well that I work awfully with others, and if I were to try diving back into that, there would be a drama bomb and hydrus would be abandoned. This is a pattern my life has repeated over and over. I don't know how to un-sperg myself, basically. I am thankful for my current weekly output and don't feel enthused about jumping back into some professional realm that I don't have much love for and I know has a decent chance of fucking it all up. I'd love to take six months off and clean up a lot of the older code to what I know works better now, but I'd rather do other work. If you would like, I can add an option to this list for something like "Cleanup code and improve practises", which would include improving my barebones unit tests and so on. I am open to ideas, if you have more. I am trying to be better, but have not found that much works for me.
>>10458 Thank you. There are many things here, and they are mostly quite technical. Could you select maybe three that you would like the most, and if you consider them to be already on the list or not?
>>10481 >>10480 I have just skimmed it again now. If this is your thread–or if you just like the ideas–could you do the same as I have said in >>10524 ? Can you pick three favourites and write them in a clear english line and say if you think they are on the list or not?
>>10500 >>10505 Thank you. I am adding this to the list now as "Improve file lookup scripts and add mass auto-lookup".
>>10524 >>10525 I will give my top seven Ebook and Comic support - pdf/epub/mobi/djvu/chm MIME, maybe Office-related MIME - https://github.com/adolfosilva/libgen.py for a start, maybe https://github.com/evilhero/mylar and https://github.com/Xonshiz/comic-dl Text and HTML support - https://github.com/JimmXinu/FanFicFare (both ebook and html form) and maybe some more in >>9927 - standard wget or curl for direct web archiving, because why not? - HTML and CSS combined/separate option (maybe useful for similar themes) - Twitter Tweets, Tumblr post, *chan posts etc. support as text/HTML downloader scripts Fuzzy searching for tags, images and music - >>10232 and >>10272 (tag fuzzy search using phonetic hashing) - >>10290 (audio fingerprinting and deduplication similar to audio player/managers) - >>8320 (dupe automation, but with better algorithms to detect JPEG quality) - >>9281 and >>10361 (advanced image de-dup algorithms and OpenCV3) - Further IQDB coverage with other sites not included in the official site e.g. e621 Better tag management - tag cloud and clustering with statistics on relatedness (possible tag parents/siblings) - Tag translation and dictionary (*booru and dA English tags <=> Pixiv's japanese tags) - Tag auto-converter for between-booru compatibility (e.g. episodes for derpibooru, extra meta for danbooru) - Derpibooru-level sophistication on tag logic e.g. AND/OR/NOT and hotkeys (https://derpibooru.org/search/syntax) Multi-processing, Multthreading andmultiple desktps - Allowing parallel downloads from different sites or servers to waste less time - Assuming someone has multiple desktop and IPs, create a standard protocol to delegate slave IPs to download certain websites to offset load or obfuscate traffic, and send back the scraped URL, tags and files back to the master IP - Possibly create a standard export format using USB for moving files from slave to master in case connection is down Video and audio support (not much to ask using youtube-dl) - >>3665 (audio and video metadata) - Youtube/"Alt-tubes", Soundcloud/Bandcamp subscription and downloader (Bonus: Torrent2Hydrus) - More MIMEs for different formats (since they all play in VLC/MPV) API building - >>10203 (standard thumbnail sizes, requested by some Mobile UI devs) - >>10168 and >>10199 and >>10200 and >>10201 (API support for mobile) - Better IPFS >>7405 (fixing IPFS "nocopy") or Dat Protocol with >>10364 Others (ML related) - Illustration to Vector support for machine learning based image tagging - >>10047 (image converter support for Waifu2x and DeepCreamPie) Others (download related) - MOAR manga sites >>8971 - Further support for yiff.party, fantia.jp and enty.jp (you know the kind of websites these three are) Others (discovery related) - artist gallery/favorites from dA/Pixiv => tag/description pattern => recommended keywords - artists favorites download from dA pixiv => favorited artists pattern => recommended other artists Others (pet peeves) - >>10102 (PTR clean up and standardization)
[Expand Post]- >>10062 (misc. UI changes and hinting) - >>7596 (image square comments and scraping) - >>10450 (proper documentation on how the code works)
>>10510 I think this is what I am most excited about myself. I hold a small candle for increasingly clever machines, even though they aren't amazing yet. The big tech companies, software libraries, and GPU drivers are all gearing up for this tech, so I figure it is about time. I am also enthusiastic about the large tag collection we have generated. We have one of the largest combined meme/elf_tiddy databases ever assembled, so I think we can do some interesting training.
>>10497 No, my mind works this way I deleted something? why was it deleted? Open link in new window Holy shit, why in the fuck did I ever delete that? redownload. Or in a more realistic sense to what has been deleted so far I see its deleted, I check it Oh it's a meme of low quality And I piss the time it takes to check it away. Or I see a good 20 images are deleted, what could this have been Open links OH FUCK YOU its that god damn baby fur sonic thing that some asshole decided would be funny to post here Or currently I went through 300 images in dup detector and about 1000 in program before I found out this was going to be an issue for me, so I have about 1300 images that are good enough to be in the archive, that are not in the archive, that I stumble on every now and then, and because of the 1000 I got rid of before I knew it was a problem for me, I have no idea if they were duplicates or if they were mistakes. Ok I went away for a bit and came back so train of thought has left me behind, point is, the current way hydrus works makes me second guess why things were deleted, along with if I bring a duplicate back in, it will never show that duplicate in a filter again. as it's considered a known pair. Notes like this will be helpful for more than one application, but mine is at the very least most relevant to my use case.
>- Reduce ui jitter and hanging by improving ui-db async code Voting for this in the hope that it might make Fatal IO Error 11 go away, or less common. >- Add Ugoira support (including optional mp4/webm conversion) >- Add import any file support (giving it 'unknown' mime) Well, I was the guy pushing for native ugoira support, so of course this needs to be here. If ugoira gets low priority or takes a while to implement, it would be nice to get unknown mime support asap, so that we can at least hoard ugoiras already, in case an artist's autism kicks in and he deletes all his pixiv posts. Unknown mime would also be great for all the psd files I have.
>>10429 >- Increase thumbnail size limit (currently 200x200) You could let the user specify thumbnail size, would also eliminate the need for 2 thumbnails per file. Generally UI improvements will benefit everyone so i think the focus should be on that first. Anything that improves the usability of the program like tag sorting, shortcut/interface customization, boundboxing, undos, etc. should be near the top of the list. Audio, CBZs, pdfs and what not already have very good programs so these should be low priority. Same goes for general file support. Focus on making Hydrus better at what it was intended - image management and viewing. The rest can come later.
>-Reduce ui jitter and hanging by improving ui-db async code I have to keep the PTR sync paused or else, when it starts, Hydrus stops responding even for days. During the time it hangs it keeps reading and writing to disk as shown in the task manager, but with the ui freezed I have no way to stop the sync safely and I'm forced to terminate the Hydrus process thus losing whatever progress it might have done. After having the the ui not freeze looking at a >>10102 (PTR clean up and standardization) like already mentioned a few time would be nice. >- Explore a prototype for neural net auto-tagging >- Add an interface for waifu2x and other file converters/processors I would also like to see an option, that the user can check, that lets Hydrus automaticaly convert and/or compress archived images (I'm only talking lossless here). Like passing all png images through pngoptimizer (http://psydk.org/pngoptimizer).
OR searching, pretty please with a cherry on top. Building custom queries with OR would drastically enhance the way I can review my collection. Overall, what it would let me do is build queries based on more subjective criteria than tags alone can offer. I could make a search for artists who do a particularly cute moe style, or I could come up with a group of hot readhead anime characters, which I've always wanted to do. If you added OR searching, it would definitely become the number one thing I did with Hydrus. This is my dream.
>>10429 Almost forgot to ask for FLIF or similar MIME support. Who needs thumbnails when partially decoding the image gets you a thumbnail quality version?
It does about everything I want currently after editing some of the scripts to support my autism (thanks for the help in email), for me the only major extra things I'd love is: - Add Ugoira support (including optional mp4/webm conversion) [because I save way too much shitty art, and there's countless ugoiras not preconverted on a booru] - Add OR file search logic [i.e. search mystery character with blonde hair, or perhaps its tagged as light brown or even brown/orange]
>>10528 >>9142 here, I'm still willing to build a library / API for i2v if it would help. I can also help design a system that would allow for training custom models, which is a much more involved problem, but would probably produce better results. Again, no pressure if you're not ready.>>10528
I am renaming the "Reduce ui jitter and hanging by improving ui-db async code" to "Reduce crashes and ui jitter and hanging by improving ui-db async code" to represent the linux stability side of this job. I am renaming "Decrease tag sibling/parent jank and improve tag censorship" to "Improve tag siblings/parents and tag censorship" to represent general sibling/parent improvements beyond making it less shit. I am adding "Cleanup code and improve programming practises".
>>10532 As an aside, as zips are supported in a limited way, ugoiras that have straight zip URLs, like how I think the default danbooru downloader gets them, are just imported as blind zips right now. Part of adding ugoira will be writing some further file parsing to do 'does this zip look like an ugoira m8?' and then reparsing all zips in the db retroactively.
>>10539 For now, I recommend you go options->maintenance and processing and turn off idle time work completely. Only do big work on shutdown, and limit it to, say, 10 minutes. This will make repo processing much more manageable. Yeah, I'd love to have (optional) png optimisation. In the coming years, as the client slowly moves away from hashes being important, these local/personal improvement tools will make more and more sense.
>>10541 I can add FLIF in easy short work as soon as PIL or OpenCV add support, which I don't think they have done yet. Or, if someone can point me to a good, non-meme pypi FLIF library that can do some version of GetResolutionAndOtherMetashit( path ) and numpy_array = GetRGBPixels( image ). As long as someone else does the decoding work, it is only about twenty lines of work on my end.
>>10544 Thank you. We'll see how this vote shakes out, but either way, if you are still keen to do work on something like this, I'd love to outsource the expert pain-in-the-ass part so I can focus on building a workflow in the hydrus ui. I'm still a sperg about collaborating, but any sort of library that made parts of this easy-peasy would be very welcome. I guess we are probably talking two(?) components: 1) Given a model, what tags are suggested for this image? 2) Given tagged images and maybe some human interaction, how to make a model? Although I presume we are also talking about some shared interface layer and whatever else is needed. Since we already have i2v model, if you made a library that did the grunt work of 1, I could probably integrate that into a new column in the tag suggestions stuff in regular weekly work. 2 would need to be in 'big work' and more emails/posts back and forth to figure out what workflow and calls the library would need. I don't know much about this, so any thoughts you have on making this stuff real are welcome.
>>10548 >[…] just imported as blind zips right now. Can you please enable ugoira download for the pixiv downloader? That way we could already start hoarding properly. All the current ugoiras have the animation.json file, which starts with the key "ugokuIllustData", so that would be the 100% accurate way for recognition. Though some people might have older ugoiras that only have the 6-digit numbered jpgs or pngs in the zip file. I guess those might be confused with zip files containing comics, so it might be good to have a way to manually change the handling (animation/book) for those. Since old ugoiras don't have any frame duration information included in the zip, being able to set the frame rate manually would be good in that case.
>>10552 This is a tough one. I have just looked at the problem again. The link we are currently using to get pixiv metadata is this: https://www.pixiv.net/touch/ajax/illust/details?illust_id=71528360 (for page https://www.pixiv.net/member_illust.php?mode=medium&illust_id=71528360, which is a recent post) It provides a JSON-less zip: https://i.pximg.net/img-zip-ugoira/img/2018/11/06/05/48/45/71528360_ugoira600x600.zip With the frame timings embedded in the API JSON. My new downloader isn't clever enough to synthesise new files from multiple sources of data, so grabbing the zip and inserting some frame timing json up would require a more significant add-on, which I would expect to write in adding ugoira support. It isn't something I can do quick. As it happens, I was looking at how danbooru do ugoiras, and the couple ugoira zips I downloaded from them didn't have frame timing JSON in the zip either. I wonder if they are just pulling the zip file and using some flat 25ms or something for their webm conversion? Am I talking rubbish here? Do some pixiv zip links have the animation.json in them, and I just missed them? Do pixiv ugoira pages link to different zips anywhere, and the API is just using different stuff?
>>10560 hey goy. is there support for choosing UI font and fontsize? if not then will you add? ty
>>10551 Alright, I'll try to infodump what each of those would require. i2v is a multilabel classifier – you can give it an image and it will give you the confidence for a bunch of tags (1539 of them, examples 'yuu-gi-ou zexal', 'tokyo ghoul', 'kin-iro mosaic', 'safe') The other kind of model is a binary classifier – it only gives you one tag at a time. Either way, you feed it an image and get back a number from 0 to 1 for each tag, and you get to decide what's the cutoff. The model itself is stored in a large-ish file for the weights. For example, the weight file for i2v is 180 MB and doesn't compress much. This isn't tiny, but it's on the small side compared to some more powerful models. Loading the model takes about 0.8s on my machine, classifying one image takes about 0.33s. The steps to build a model from scratch are: >Decide on the architecture This includes describing the various layers, and deciding how many tags you want to look for. >Gather training data The amount of data you need depends on how "simple" the tag you want to find is, and how similar are images with / without the tag. A few hundred images is probably enough to train some easy tags, a few thousand should be able to handle harder ones. >Run training This involves letting your computer run full blast for a bit while it does a bunch of linear algebra on the images. GPUs make this much faster. It depends on the amount of data we use, but I'd expect most models worth training to take an hour of GPU, or maybe 10 hours of CPU (very rough estimate). There are tricks you can do to let everyone help out training a single massive model, but that's a technical and logistical nightmare. There's a trick you can do called "transfer learning" which lets you piggyback off a model you already have. It might be possible to use this to add tags to i2v that aren't in the basic list. This would produce a small model (that still require the larger one to work) and would take less time to train, but it's limited to things that are similar to what i2v was trained on originally.
>>10551 For case 1, I've got a pretty basic file that runs i2v. Loading code, tag list, weight file are at >https://github.com/antonpaquin/Hydrus-Autotagging/blob/master/illust2vec-flask/illust2vec.py >https://github.com/antonpaquin/Hydrus-Autotagging/blob/master/illust2vec-flask/tag_list.json >https://github.com/antonpaquin/Hydrus-Autotagging/releases/download/0.1/illust2vec.h5 This will take a PIL image in and give you a dict of {"tag": score} out. This is probably enough to power the first component, and you can probably reverse engineer enough to not have to use my code at all. One possible way to handle case 2: I could build a thing that takes N images with a tag, and N images without the tag, and builds a classifier for that. There's a lot of potential for change here, but I think that's the simplest form.
>>10429 Maybe an overhaul on tutorals in the help section on the hydrusnetwork site would be my only request, there's a lot to learn about the various features in hydrus that just isn't there at the moment. I don't know, maybe let other people contribute their own tutorals if you're too busy and all.
>>10560 That's a bummer. I used the Px Downloader add-on to download the ones that include the json: https://rndomhack.com/2016/01/15/px-downloader/ I wonder if it actually re-packs them? I made sure to disable ugoira conversion in it's settings, which is why I was sure it wouldn't change the original file. I will dig around some more and see if I can find more info.
>- Add import any file support (giving it 'unknown' mime) Absence of this (and the ability to store original file name) is the main reason why I haven't considered moving to Hydrus just yet.
>>10569 you do know that you can add any arbitrary namespace right? so filename:<name> is possible. people are doing this.
Improve the client's local booru, atleast tag search Prototype a client api for external scripts/programs to access
>>10457 Is there any way to keep tags out of the PTR if you're using it. Or is there any way to make sure you aren't committing to it?
>>10572 you have to set up the ptr if you want it, hydrus does not come with it preinstalled, so just don't install it. Also if you do set it up you have to approve tag uploads, so you can be sure your tags stay your own.
>>10541 I keep all my thumbs on an nvme ssd along with the database, much rather have it this way then the hdd getting hammered looking for a few hundred images to generate thumbs for. that said if flif compresses the thumbs better than jpeg, that would be greatly appreciated. ——————- Ok hdev I remembered something that was brought up a while ago that needs to be improved. the duplicate detector has to have either a mode or a setting that lets you see already known pairs. one problem we discovered a while back was if you import a duplicate, do the dup detector and then it somehow gets re imported, you will never be told the duplicate is back. Then there is also something I asked about a while ago with duplicates, one being a 'contender mode' and one being 'prefered alternate' contender more is simple. you have an image, you determined it is the better of two images, this one and all its other potential dups get taken out of normal duplicate processing and pushed to a second one, because you now have a known better image. this way you could quick filter all the contender images, getting rid of all the images that are lower resolution or file size, and only needing to go though the ones that are potentially better rather then all potential candidates. Im not so much thinking this will be used in the first go around with dup processing, but every subsequent one, its a good bet that you will use it to weed out the junk. Now the reason a filter in and of itself is not good, is simply because resolution or file size of an unknown image is not a good way to determine if the file is good. some faggot on 4chan hated a thread so he bloated every image out and made them fuzzy and un desirable for weeks/months trying to kill a thread, and his work comes up time and time again in dups for images, just looking at file size or resolution would save that shit and remove the good image. Contender mode would get rid of that because there is already a known good version of the image, and you are only looking at higher resolution version, or higher file size versions. and finally 'prefer alternate' It doesnt need to be a mapped choice, it just needs to be a choice. I have several artists who I like who decided to make 20 10mb images that are all the same just small changes, and of them I may want to keep 2 or 3 images, so a perfer alternate option would allow me to mark one for deletion while knowing I don't have it, but I do have a similar one I liked more. this is kind of a useless thing for most people but would be helpful for me along with the "Add an optional system to record why files are being deleted" system.
>>10541 Didn't FLIF died or something? I think they stopped working on it.
>>10592 No, the faggots are at gitter talking about how to get it to the mainstream when BPG/HEIF beat them fair and square for compatibility.
The moment when you read the help about hydrus being able to control and manage files without importing but still doesn't understand a word How you do this , I mean how you tell hydrus to manage those files WITHOUT importing ? also how you can make hydrus make a subscribtion or follow a tag on image boards on general to automatically import new images from internet ?
>>10596 >[…]BPG/HEIF beat them fair and square for compatibility. Maybe compatibility for hardware decoding, but not software compatibility, which is far more important. No big websites will want to support these formats, because they're containers for HEVC intra frames, which is a licensing/patent nightmare. Google developed their own codecs to avoid patent fees, so they will probably not want these formats supported in Chrome. Same for Mozilla, who are pushing for the AV1 codec, so they would probably add support for AVIF long before they cave in to support HEVC based image formats. FLIF meanwhile doesn't require licensing and doesn't appear to cause any patent conflicts so far. So I wouldn't call it dead yet.
>>10561 At the moment, it should pull whatever your OS defaults are, I think for both font and size. I don't set anything specifically atm, afaik. I am not a big fan of themes and making things pretty (as you can probably tell!), so I struggle to revisit ui to neaten it up once I get bare functionality going. I am not against the idea of adding font customisation, but I think I would have to do a bunch of code and ui cleanup first.
>>10601 > FLIF meanwhile doesn't require licensing and doesn't appear to cause any patent conflicts so far. This time not licensing will make it loose competitive edge, soon HEVC will be standard, and Google forcing WebP, Firefox forcing APNG like they normally do… FLIF is toast.
Requesting coverage for most sites in https://theporndude.com/hentai-porn-sites (some sites require JS like Hitomi)
>>10601 >>10603 For reference remember #gitter_FLIF-hub=2FFLIF:matrix.org
>>10562 Thank you, this is great. I've copied it into my ML masterjob. >>10565 Yeah, if other people would like to write their own tutorials for anything in text or html, I am very happy to link to it or host it on the github.
>>10572 Not really. I'd like to add some tag filters to exclude bad tags at the server end ('banned artist' and 'url:' garbage) and allow for 'I only want 'creator:' tags' at the client end. And some repo recycling/cleaning to clear out some of the cluttered master records and reduce dbshit.
Could you make python3 a choice? just for the python features and the poor weird path people?
>>10575 Thank you, this is interesting. There's a lot I would like to do with the duplicate filter and system generally, especially ui to show/browse/review found duplicate relationships. First I will have to clean up the db side of things. I want to move from the current pair-hell to ordered groups that will allow for neater 'this is the best one' actions. This new structure will also work for siblings and parents, btw, which is not a dissimilar problem to deal with.
>>10598 You cannot manage files without importing. Sorry for my bad wording! My new downloader help is here: https://hydrusnetwork.github.io/hydrus/help/getting_started_downloading.html The subscription help is out of date, but I hope to improve it in the coming weeks. If you can't understand what I've written, let me know and I'll see if I can reword it. Feel free to email me or grab me on the discord if you want to work one on one.
>>10608 I hope to convert to python3 over this holiday. I will stop putting out releases starting on the 12th December and hope to have it done in four weeks. I will start working on the result of this thread's poll first thing in the new year.
I am adding "Add multiple local file services" to the list.
>>10566 >>10560 Okay, it seems very likely that PX Downloader re-packs the zip. It seems to be closed source, so I couldn't confirm, but I had a close look at what pixiv fetches when playing ugoira and a file with included animation.json doesn't exist there. The URL you get from that meta data doesn't get you the high quality version. I found that there is a second .json specifically for ugoira meta data. Example ugoira: https://www.pixiv.net/member_illust.php?mode=medium&illust_id=48731415 Ugoira meta: https://www.pixiv.net/ajax/illust/48731415/ugoira_meta This one lists 2 URLs, "src" and "originalSrc", to get the bigger version. Though of course my findings don't make archiving these easier. What would be the best way to preserve the originals? I asked for preserving the original files because I assumed that pixiv fixed their format to include meta data in the zip, but that's not the case after all. If we re-pack the zip file to include meta data, we get the problem of changed hashes that I wanted to prevent by archiving originals. One idea I had was to mux jpgs into an mkv file as mjpg. That way the frame timings can be saved and the images are not re-encoded. ffmpeg -framerate 30 -i %06d.jpg -codec copy mjpg.mkv mkvmerge -o ugoira.mkv -d 0 -A –timestamps "0:timestamps.txt" mjpg.mkv timestamps.txt would contain time stamps for each frame as the absolute time elapsed BEFORE each frame, while ugoira uses a relative pause AFTER each frame. E.g. these ugoira timings: >{"file":"000001.jpg","delay":30}, >{"file":"000002.jpg","delay":30}, >{"file":"000003.jpg","delay":30} would become these mkv timestamps: ># timestamp format v2 >0 >30 >60 I made a proof of concept python 3 script that converts all frames in a folder to an mkv file with correct variable frame rate. All frames need to be unpacked and the "ugoira_meta.json" needs to be saved to the same folder, because the script generates the timestamp file from that. It won't let me attach the script, so I put it here: https://pastebin.com/kdaH6CqE It won't let me attach the sample mkv file either, so I uploaded it here: http://tstorage.info/1rqrc4o43gqu For identifying ugoira files, an idea I found was to use ffmpeg to generate frame hashes for the individual jpgs: ffmpeg -i %06d.jpg -f framemd5 - These hashes actually stay the same even if the container format changes. The jpgs in the original zip file and the muxed mkv mjpg frames will have identical hashes. What do you think about this solution? This way we could get proper video files without re-encoding anything, and get consistent hashes to identify files.
I used to want to be able to add any generic file the most, but out of using hydrus daily what I want the most now is to be able to force-check a picture that failed to import properly with all the tags it should have. Automatic tagging is promising, but ultimately unnecessary for me since I just scrape files most of the time. If anything id use autotagging as a backup system to a failed tag import
The ability to tag images with consecutive numbers outside of the import files dialog. It would make tagging comics/doujinshi downloaded using the downloaders/watchers much, much, easier.
(49.88 KB 767x693 jap2.jpg)

>>10611 Nice. >>10429 >Add an interface for waifu2x and other file converters/processors Would it be possible to work with offline versions as well? I installed waifu2x on my machine so that I wouldn't have to rely on an internet connection. t. linox
I am likely to make the poll today, with the release. I may unsticky and lock this thread to move convo over there, but I am not sure. >>10622 That mkv jpg solution looks great! Thank you for figuring out the variable frame rate stuff and putting it a script together. I have copied this to my ugoira notes for when I get to this. >>10623 Let me know if I misunderstand here, but you can probably do this now by running the problem file through a program like HxD to figure out its sha256 hash and then searching in hydrus in 'all known files'/'public tag repo' search domain using system:hash=abcd… . That said, if a file cannot import to hydrus, it likely doesn't have any tags in hydrus–or do you mean like 'what tags it has on the site I meant to get it from'? In either case, I'd be interested in examples of files that look fine but won't import. Please feel free to submit the files themselves or URLs to them! >>10639 Thanks–I put this on my 'see if you can sneak this in' list a little while ago, and it just didn't happen. I am adding it to the list here as "Add an incremental number tagging dialog for thumbnails". >>10648 I greatly prefer doing transformations like this with our own CPU/GPU cycles, so I would likely start such a system by talking to local executables and then extend it to work with http POST queries depending on demand.
>>10650 The poll is up! Please go >>10654 to vote!


Forms
Delete
Report
Quick Reply