/hydrus/ - Hydrus software optimization thread

/hydrus/ - Hydrus Network

Archive for bug reports, feature requests, and other discussion for the hydrus network.

Mode: Reply

Name
Options
Subject
Message	Max message length: 12000
files	Drag files here to upload or click here to select them 0.00 / 50.00 MB Max file size: 32.00 MB Total max file size: 50.00 MB Max files: 5 Supported file types: GIF, JPG, PNG, WebM, OGG, and more

E-mail
Password	(used to delete files and posts)
Misc

Remember to follow the Rules

The backup domains are located at 8chan.se and 8chan.cc. TOR access can be found here, or you can access the TOR portal from the clearnet at Redchannit 3.0.

Board Locked? Request Reopening

APNG and GIF uploads are temporarily disabled while we deal with a spammer problem.

8chan.moe is a hobby project with no affiliation whatsoever to the administration of any other "8chan" site, past or present.

Hydrus software optimization thread Anonymous 06/07/2018 (Thu) 02:07:57 Id: 1e8781 No. 9068

ITT: create proposals for making Hydrus more optimized. Proposal: Why can't Hydrus switch to MariaDB? If it is faster, then it should be better. The only trouble is having the need to rewrite the queries, which from an SQL standpoint should be a non-issue, right? List of Databases with Open Source License and Open Source APIs: SQLite - Currently used in Hydrus, has minimal features MySQL - A more well-rounded SQL Database with user management PostgreSQL - An SQL with complex features with less performance MariaDB - SQL/NoSQL database with heavy optimizations ElasticSearch - A literal search engine instead of a normal Database Teradata - IDK https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems https://www.infoworld.com/article/2611812/mysql/mysql-face-off--mysql-or-mariadb-.html

Anonymous 07/14/2018 (Sat) 04:12:21 Id: 02d3aa No. 9391

>>9348 That is the issue, the dev is trying to migrate from wxPython to PyQt after the downloader overhaul, along with other key functions like parallel downloads, workflow management and mobile integration.

Anonymous 07/23/2018 (Mon) 08:41:35 Id: be1efa No. 9464

Bumping

Anonymous 08/01/2018 (Wed) 12:27:06 Id: d8220f No. 9530

>>9464 Yes but why?

Anonymous 08/12/2018 (Sun) 05:43:03 Id: cfc291 No. 9658

As mentioned by >>9094 the bottleneck is mostly how the I/O and CPU is handled by hydrus. Imports are done sequentially when they can be sped up a lot by using multiprocessing. I'm sure other actions are still done sequentially too. A transition to a graph database like ArangoDB could be better in the long run, but that's never going to happen. Looking at the client.master.db database, I'm not sure why he added an index to the md5, sha1 and sha512 columns but not to the subtag or namespace columns. Doesn't make sense to me (and is the sha512 index really necessary?). Also it boggles my mind that foreign keys aren't being used at all.

Anonymous 08/12/2018 (Sun) 06:28:15 Id: bd599a No. 9659

>>9658 I am also expecting multi-threading could be a place where we can optimise the code (since most computers now run on 4/8 cores). Perhaps SQLite, MD5/SHA hashing and de-duplication are not made for multi-core and/or GPU computers.

Anonymous 08/12/2018 (Sun) 06:57:30 Id: cfc291 No. 9660

>>9659 >multi-threading Python threads are all executed on the same core. That's why I said multiprocessing. It spreads out each subprocess across each core. Based on your post you don't know much about software, so think of a subprocess in python like a normal thread. >are not made for multi-core and/or GPU computers Everything you've mentioned can be easily sped up with multiple cores. Using a GPU would be even faster but there's no point in using that here. I'm actually pretty surprised he hasn't implemented multiprocessing functions in bottleneck situations like importing. It's very easy to split up the work once you've scanned all the files. You just divide them up by the number of cores and have each subprocess do that portion of the work. If you have 4 cores you have each core do 1/4 of the files you want to import.

Anonymous 08/12/2018 (Sun) 08:31:23 Id: bd599a No. 9661

>>9660 >Python threads are all executed on the same core. That's why I said multiprocessing Well due to people call 4 core Intel CPUs having "hyperthreads" making it 8 virtual cores, I would say that is easy to have those things mixed up. If I have to use a proper term Parallel Programming (as in Concurrency) would be more fitting. >Everything you've mentioned can be easily sped up with multiple cores I meant that it has not been implemented yet by the dev since (s/are not/has not been/) >I'm actually pretty surprised he hasn't implemented multiprocessing functions

Anonymous 08/13/2018 (Mon) 06:56:41 Id: 813085 No. 9670

>>9660 Also https://medium.com/@bfortuner/python-multithreading-vs-multiprocessing-73072ce5600b https://luckypants.weebly.com/subprocesses-and-multithreading.html

Hydrus decentralization and dapp Anonymous 09/04/2018 (Tue) 03:27:45 Id: 6bf834 No. 9881

Considering the recent happenings of Tumblr and booru.org purges, it is important to put focus on alternative decentralization libraries. 1. free P2P software a. BitTorrent - Most commonly used, but can't handle individual files b. WebTorrent - WebRTC version of BitTorrent, but still have the same issue c. eDonkey and GNUtella - both very obscure, not really useful or adaptive d. IPFS - currently used in Hydrus, can handle singular files in a folder structure 2. Proxies and psuedo-VPNs a. TOR - very common, maybe pozzed by CIA, has BitTorrent and IPFS compatibility (OpenBazaar) b. I2P - less common, not pozzeed, has BitTorrent compatibility, IPFS is in the works (go-i2p) c. Freenet and Retroshare - both very uncommon, has file transferring and chats as a primitive d. Zeronet - pretty dead, works with Javascript, too many unknowns 3. Blockchain data solutions (https://en.wikipedia.org/wiki/Cooperative_storage_cloud) a. Filecoin - based in IPFS, slowly developing, could be used in conjunction with Hydrus b. Sia - top data blockchain contender, has smart contracts with regular renewal for storage (https://sia.tech/) c. MaidSafe - possible competition, includes secure communication and storage (https://maidsafe.net/) d. Storj - noted, already have average pricing, made to be used along side self-host cloud (https://storj.io/) e. Ethereum Swarm - note really a good idea as the blockchain is congested by CryptoCats f. Others include https://decent.ch/ https://www.creativechain.org/ https://contentbox.one/ https://noia.network/ Others: https://cryptoslate.com/category/cryptos/storage/

Anonymous 09/04/2018 (Tue) 03:57:31 Id: 6bf834 No. 9882

4. Social media blockchain a. Steem - used in alt-media like bitchute, dtube and steemit (https://steem.io/) b. Rocketchat - used by the furrires to commuitcate (https://rocket.chat/) c. SocialX - at a whitepaper stage, to replace facebook and twitter (https://socialx.network/) d. Akasha - based in IPFS, meant to replace Tumblr (https://akasha.world/) e. BAT Token - used by Brave Browser (https://basicattentiontoken.org/) Others https://foresting.io/ and https://sola.foundation/ and https://www.synereo.com/ https://www.stateofthedapps.com/dapps/tagged/social/tab/most-relevant

Anonymous 09/04/2018 (Tue) 13:19:59 Id: 9f26dd No. 9884

>>9881 >booru.org purges What do you mean?

Anonymous 09/05/2018 (Wed) 03:59:52 Id: 6bf834 No. 9886

>>9884 Gelbooru and *.booru.org are hosted in the Netherlands, and they are using "anti-loli laws as an excuse" to force a purge on the admins.

Anonymous 09/25/2018 (Tue) 13:32:11 Id: 833c67 No. 10077

Do you know how can I convert hydrus db to postgresql? Hydrus db consists of multiple sqlite files, how can I connect all of them?

Search improvements with fuzzy search Anonymous 10/10/2018 (Wed) 18:53:39 Id: 7d7b19 No. 10232

Icon selection Anonymous 10/13/2018 (Sat) 12:05:56 Id: 9c2ceb No. 10247

1. Having icon representation of major functions in the menu bar and buttons 2. Possibly expanding on famfamfam https://github.com/ionic-team/ionicons https://github.com/yusukekamiyamane/fugue-icons https://github.com/FortAwesome/Font-Awesome https://github.com/Templarian/MaterialDesign https://github.com/linea-io/Linea-Iconset https://twitter.github.io/twemoji/ https://xtoolkit.github.io/Micon/ https://github.com/google/material-design-icons https://github.com/legomushroom/iconmelon

Anonymous 10/17/2018 (Wed) 15:41:06 Id: 7d7b19 No. 10272

>>10232 https://searchcode.com/codesearch/raw/42426693/ and http://archive.fo/ilT6P has a JS version of the Metaphone3 algorithm

Audio fingerprints/hashes Anonymous 10/20/2018 (Sat) 10:44:51 Id: 6f19b2 No. 10290

Anonymous 10/25/2018 (Thu) 06:51:21 Id: ec1fb1 No. 10361

>>9281 https://vision.fe.uni-lj.si/cvww2016/proceedings/papers/04.pdf (Quantitative Comparison of Feature Matchers Implemented in OpenCV3) https://sci-hub.tw/10.1109/m2vip.2016.7827292 (Comparison of OpenCV’s Feature Detectors and Feature Matchers)

Anonymous 10/25/2018 (Thu) 06:54:13 Id: ec1fb1 No. 10362

>>10361 SIFT https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html SURF https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_surf_intro/py_surf_intro.html FAST https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_fast/py_fast.html BRIEF https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_brief/py_brief.html ORB https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_orb/py_orb.html

Anonymous 11/10/2018 (Sat) 19:57:09 Id: 978c9b No. 10599

>>10361 Got some more comparative papers 4U https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8346440 (A Comparative Analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK)

Tag correlation Anonymous 11/21/2018 (Wed) 04:58:03 Id: 1e8781 No. 10742

https://en.wikipedia.org/wiki/Pointwise_mutual_information Pointwise mutual information between tag X and tag Y is the logarithm of (num. of images with both tags) * (total image count) / ((num of images with tag X) * (num of images with Tag Y)) PMI can be used to find possible tag siblings https://en.wikipedia.org/wiki/Conditional_entropy Conditional entropy of X given Y is ( (num. of images with both tags) / (total image count) ) * logarithm of ( (num of images with tag X) / (num. of images with both tags) ) CE can be used to find possible tag parents and children

Language and package-specific optimizations Anonymous 11/28/2018 (Wed) 17:39:24 Id: b990dc No. 10805

Nim is low-level Python, Crystal is low-level Ruby, both would be easy for the rest of us (and hopefully the dev) to pick up. Doing so would mean that Hydrus would be at least twice as fast in certain departments when compared to non-NumPy Python. (Also D is a C replacement, Go and Kotlin are Java replacements, but those are very different from the syntax of Python) Are there applications where low-level languages DON'T apply? Math calculations, in that case use SciPy/NumPy for less work. Some benchmarks: https://github.com/kostya/benchmarks https://github.com/drujensen/fib https://github.com/frol/completely-unscientific-benchmarks https://github.com/logicchains/LPATHBench

Anonymous 11/29/2018 (Thu) 14:25:59 Id: aa7425 No. 10819

>>10805 https://github.com/yglukhov/nimpy is for connecting Nim to Python https://nim-lang.org/docs/httpclient.html is the new URLLib https://nim-lang.org/docs/htmlparser.html is the new BeautifulSoup For more: https://nim-lang.org/docs/lib.html

Anonymous 12/15/2018 (Sat) 16:20:14 Id: b990dc No. 11022

>>10272 https://searchcode.com/codesearch/raw/2366000/ and http://archive.fo/4Phr9 has the Java version

Anonymous 12/15/2018 (Sat) 16:53:55 Id: b990dc No. 11023

>>10232 For Japanese fuzzy search you can use these to get the kana https://github.com/atilika/kuromoji (Java) https://github.com/takuyaa/kuromoji.js (JS) https://github.com/taku910/mecab (C++) https://github.com/ikawaha/kagome (Go) https://github.com/mocobeta/janome (Python) https://github.com/ku-nlp/jumanpp (C++)

Anonymous 12/19/2018 (Wed) 04:52:09 Id: a72330 No. 11053

>>10290 >https://github.com/acoustid/acoustid-index (C++) You're looking for https://github.com/acoustid/chromaprint (C++) To be honest though when Hydrus starts doing audio fingerprinting it should probably just use acoustid so it can grab tags from MusicBrainz ( https://musicbrainz.org/ )

Anonymous 12/19/2018 (Wed) 08:41:17 Id: 2f2eb0 No. 11058

>>11053 Or maybe others as well? What if we are getting music from torrents instead and don't want MusicBrainz to know that I got them? Bumping to spark conversation >>10232 http://www.scitepress.org/Papers/2016/59263/59263.pdf (Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names) More benchmarks for major phonetic algorithms

Anonymous 12/28/2018 (Fri) 19:09:14 Id: 5cfb09 No. 11133

>>9068 >PostgreSQL - An SQL with complex features with less performance 1998 wants it retard memes back.

Anonymous 01/07/2019 (Mon) 11:17:27 Id: e73dfb No. 11204

>>11023 >implying

Anonymous 01/08/2019 (Tue) 02:07:54 Id: 1e8781 No. 11206

>>11204 How so? Too many onyomi and kunyomi? Even then if we are not using phonetic fuzzy search, string fuzzy search can still be used (see https://en.wikipedia.org/wiki/String_metric)

Anonymous 01/19/2019 (Sat) 12:39:35 Id: ac7c72 No. 11380

>>11053 https://musicbrainz.org/doc/Other_Databases I find that https://www.discogs.com/ and http://www.freedb.org/ are still alive so what about those? https://github.com/discogs/discogs_client could be good for example.

Anonymous 02/11/2019 (Mon) 07:58:55 Id: d46cda No. 11586

>>9281 https://github.com/rachmadaniHaryono/transformationInvariantImageSearch Looks like our men are getting into this

are these articles any good? Anonymous 03/18/2019 (Mon) 11:33:02 Id: f06e36 No. 11927

https://hackernoon.com/why-is-python-so-slow-e5074b6fe55b https://medium.freecodecamp.org/if-you-have-slow-loops-in-python-you-can-fix-it-until-you-cant-3a39e03b6f35 https://metarabbit.wordpress.com/2018/02/05/pythons-weak-performance-matters/ https://blog.codinghorror.com/the-infinite-space-between-words/ https://www.prowesscorp.com/computer-latency-at-a-human-scale/

Anonymous 04/18/2019 (Thu) 17:31:27 Id: 0a99e5 No. 12295

Anonymous 04/19/2019 (Fri) 18:10:30 Id: 0b5902 No. 12302

>>12295 Why don't you actually develop something on your own instead of endlessly shitting out github links

Anonymous 04/20/2019 (Sat) 04:28:38 Id: 0a99e5 No. 12307

>>12302 Nah that is for >>12277

Index Catalog Archive Top Reply

Manage Board Moderate Board Moderate Thread

Forms

Delete

Password Unlink (Removes file reference from posts) Delete (Removes file from the server)

Report

Reason Category Global

No Cookies?

Quick Reply


Sage Bypass Check