Thanks for these, I used the 10% one in this project
I had some spare time this morning so I decided to try to learn some transformers.js. I have made a simple yet effective semantic search tool for the Discord leaks. It all runs locally in the browser which means it's easy to download your own copy and run it at home in case the CF overlords decide my transphobic project has too much wrongthink in it. Unfortunately because of the size of the full embeddings file, this doesn't let you search
everyone's messages, only messages Chris sent. I'm gonna see if I can do some quantization and compression so that we can search everything, not just Chris' messages. I'll update if I can get it working
It's semantic searching so if you search for "child grooming" it won't just bring up messages which mention that exact phrase, but also messages with similar content e.g. "child gr00ming" or "rape" or "child abuse" etc. This makes it easier to find interesting stuff that might not match an exact keyword.
You can then click on a search result to see the context. Also, I split the giant ass HTML files into chunks, so your browser won't shit itself when you try to load the context. The first search you do will always be slow because it has to load the text embedding model but after that it should be faster.
View attachment 6244731
Someone tag dear feeder, idk how to do username mentions on here but I bet he'd love this with all the features this thread has been getting lately.