Kiwifarms dataset or proposals for data-related projects - Some sort of restricted KF API or dataset to help create new forum features

I mean if anything, provide a solid outline. Everything within this thread just doesn't track right. What 777Flux said sounds more apt. Everything you're saying is just generic and almost Chat GPT'd.

The enthusiasm is cool, though.
Raw ChatGPT doesn't make it easier (to say the least) to build a KG. At best, the GPT models allow you to extract tuples. After extracting, you have to store them and query those tuples in a scalable way.
 
I feel like I'm speaking in Braille, rn.
 
I hope to get some time this week to start this project (the last few weeks have been extremely intense at my job, really exhausted). The minimal prototype I want to work on is to extract all the relevant entities (using one or multiple NER models) for each post in community happenings and build an initial dataset with that. Got to write a small crawler and scraper for that. A small step but it's a solid one that has worked well in other places.

Things like event types and and relations between entities and events will only come after that.
 
A small step but it's a solid one that has worked well in other places.
What on Earth, are you some kind of extremely bored Pytorch enjoyer that goes around and makes knowledge graphs of random forums?
Things like event types and and relations between entities and events will only come after that.
So you're going to index all the lolcows from CH and then turn it into an automatic structured data collation tool of some sort. I've idly thought of something like this but it was beyond my ability or patience. I guess we'll have an automatic half-baked CWCki for most lolcows after you do whatever it is you're doing? How will this data be presented?
 
What on Earth, are you some kind of extremely bored Pytorch enjoyer that goes around and makes knowledge graphs of random forums?
Well ... JAX is also a thing. But whatever gets the job done. Open source NER models are available in many frameworks. One of them is spaCy. I'm not bored, Im just trying to offer help in topics I'm more versed on. It honestly bothered me to see what my industry has done to this guy. He just wanted to laugh at tards on the Internet. He is a simple man, let him be.
So you're going to index all the lolcows from CH and then turn it into an automatic structured data collation tool of some sort. I've idly thought of something like this but it was beyond my ability or patience. I guess we'll have an automatic half-baked CWCki for most lolcows after you do whatever it is you're doing? How will this data be presented?
After I'm done with this prototype, we will have the topics for the community happenings. Small steps.

Now, time to sleep.
 
Well ... JAX is also a thing. But whatever gets the job done. Open source NER models are available in many frameworks. One of them is spaCy. I'm not bored, Im just trying to offer help in topics I'm more versed on. It honestly bothered me to see what my industry has done to this guy. He just wanted to laugh at tards on the Internet. He is a simple man, let him be.
Maybe you should make a technical AI thread like the Programming thread but more oriented towards the funny linear algebra soup, without people posting pictures and text logs and other irrelevant things. I don't know enough to make one myself.

I might learn ML, I just need to write a library in C and BLAS first because any non-widget-toolkit software library larger than 64MiB is a crime against nature. I would have to learn the math, however, and that's going to suck.
 
Maybe you should make a technical AI thread like the Programming thread but more oriented towards the funny linear algebra soup, without people posting pictures and text logs and other irrelevant things. I don't know enough to make one myself.

I might learn ML, I just need to write a library in C and BLAS first because any non-widget-toolkit software library larger than 64MiB is a crime against nature. I would have to learn the math, however, and that's going to suck.
Are you telling me that your implementation would be better than the eigen library? Also, there are very good reasons for some of these frameworks or libraries to have that size. One obvious one is that you have different implementations per hardware. And by hardware, I don't mean just running matrix operations in a specific device, I'm also including distributed processing across machines.

Can you give me one example of linear algebra libraries that bother you?

Also, even if the library is "that big", your main focus should be to sketch or implement a Minimal Viable Product of your ideas. I learned the hard way that paralysis due to performance and scale concerns can block viable ideas.
 
Have you tried to send a curse to Joshua Connor Moon into falling off a cliff, thus activating his deadman switch and releasing the complete Kiwifarms database on the dark web.

You could find a Death Note™ at your nearest Wal-Mart, or chant the deadly incantations with me like: "Eid Llun, eid".
 
  • Winner
Reactions: cocozitu
Have you tried to send a curse to Joshua Connor Moon into falling off a cliff, thus activating his deadman switch and releasing the complete Kiwifarms database on the dark web.

You could find a Death Note™ at your nearest Wal-Mart, or chant the deadly incantations with me like: "Eid Llun, eid".
LOL Nah man. I really like the streams. When he streams, it's always a good time in my week. I might not align with all of his views on life and politics but he makes me laugh a lot and I sympathise a lot with the struggle he is going through. I was expecting to prove myself (both my skills and motivations) before I get access to anything from him.

Just stating my plans here for my first project (in case anyone has any suggestions):
* Crawl several pages from the community happenings thread (probably all of them).
* For each page, scrape the (1) post body and (2) date.
* For each scraped post, run multiple NER models and store the outputs.
* Create a golden dataset my manually verifying the model outputs on a percentage of those pages.
* Publish the python scripts, python environment definition, metrics, relevant observations.

I expect that having named entities available per post can, at least, improve retrieval. If you search for a person, event or organization, you can search for the named entities you previously inferred from your data instead of just search a raw text index.
 
re you telling me that your implementation would be better than the eigen library? Also, there are very good reasons for some of these frameworks or libraries to have that size. One obvious one is that you have different implementations per hardware. And by hardware, I don't mean just running matrix operations in a specific device, I'm also including distributed processing across machines.

Can you give me one example of linear algebra libraries that bother you?
I'm talking about writing my own backpropagation system instead of the entire number-crunching stack. CBLAS (much like eigen but not based on C++ templates) fits my criteria. I've seen some C-based backprop libraries, but they seem to be more primitive than they need to be for general use.

I might start using Pytorch, but it just makes me feel weird from code samples I've read. My Cnile instincts are simply keeping me from being a neural network chad.
I expect that having named entities available per post can, at least, improve retrieval. If you search for a person, event or organization, you can search for the named entities you previously inferred from your data instead of just search a raw text index.
Usually the subject is clearly visible in the text for Community Happenings post. It might be better for events, however.
 
One very obvious think is automating the creation of a big knowledge graph for people and events using a LLM. Having access to a knowledge graph can help A LOT with implementing A LOT of features. A very obvious one that all of you know: when you search for a well known topic on Google, either a person or event, you usually get a panel with very relevant fields describing that person/event. Josh can use a KF Graph for that to summarize things. This is one small example of many new features that can be powered by such tech (I know about them because I work with those systems).

I can guarantee anything that you could propose would be of extremely limited benefit to KF.
 
I can guarantee anything that you could propose would be of extremely limited benefit to KF.
It looks like he's making an external tool that indexes all the happenings on the Farms and makes searching them slightly easier. Integrating that into the forum itself is definitely a pipe dream, I agree.
 
I can guarantee anything that you could propose would be of extremely limited benefit to KF.
Can you explain why you think that? Knowledge graphs have been useful in environments that archive a lot of data and summarize that data in multiple ways. What am I missing?
 
Can you explain why you think that? Knowledge graphs have been useful in environments that archive a lot of data and summarize that data in multiple ways. What am I missing?

That it's something no one asked for. I don't see the point in adding features to a site that no one asked for, and thus no one is likely interested in. Null is good about making the site accessible and easy to use, implementing features that are within reason when the members here are interested in such. Obviously the current search system is just fine. I haven't heard anyone complaining about it. The site is organized in a way that is simple to understand. I doubt few if any people have trouble trying to find information here easily. I would imagine most users here would have the opinion that if the current organization and search capabilities are not enough for someone, they either aren't trying hard enough, or they're not someone that would generally be welcomed to start with. This site is very much a "if it isn't broke, why the hell would anyone try to fix it? Retards" place.

I see what you're suggesting as being like putting a cup holder on a vacuum cleaner. Some people may think it's nice, but overall it really does nothing to improve the experience or primary function of the vacuum.

Also, Null has made it fairly clear what he thinks of the idea, and in your zeal you've already committed a faux pas by trying to get attention by violating the rules of the forum intentionally like a kid yanking on their mother's skirt. That is something most people here would immediately find distasteful and likely to shun.

Do as you wish, but don't expect many others here to have a dramatically different take on this situation.
 
  • Agree
Reactions: seri0us
He said that he doesn't want to give access to the API.

Without a knowledge graph or a well tested LLM, how do you answers like "who did troon out this year?" Or "what troons likely pushed this other troon to troon out?" Or "which people have beaten Ethan Gunt Ralph"? Or "what places did Andy Racewarski lived?".

Probably, you perform several manual queries to manually aggregate data on that. Almost every single big information system that you have used to catalog and archive data uses a knowledge graph to make it either (1) faster to retrieve relevant information, (2) understand what relevant information would be likely to be asked in a follow-up search, be it in a search engine, wikis, social network, video streaming website, online store, etc. Good luck naming me a single common user that asked for a knowledge graph for any of those products. Yet, they exist and are EXTENSIVELY used.

If you are just acting like that because the existence of such a system bothers you, just say that is the case instead of dancing around the subject. If Null had told me something like "I don't care about anything that you have to offer, fuck off". He said "no access to API", most likely, because I came out of nowhere as a new user during a time where he is being pressured in multiple fronts (and yes, I know my approach wasn't the best one, as I previously stated). If Null comes to this thread and says that he wants no knowledge graph, named entity recognition, new infra/features to support better discoverability, better search capabilities, etc I will just fuck off and keep listening to his streams.
 
Last edited:
I have no doubt you are autistic to be so single minded.

I can almost guarantee Null considers what he told you to be quite clear regarding his entire position on this subject. This is why you never got a reply except as a warning to stop being retarded. If you ever hear more from him, it is essentially guaranteed a ban will be attached given how clear he was the first time.
 
I have no doubt you are autistic to be so single minded.

I can almost guarantee Null considers what he told you to be quite clear regarding his entire position on this subject. This is why you never got a reply except as a warning to stop being retarded. If you ever hear more from him, it is essentially guaranteed a ban will be attached given how clear he was the first time.

I wonder if we are talking about different subjects or if you are just trolling me.

To be EXTREMELY clear:
1. Derailing the other thread was not a good choice by me. And yeah, I would probably answer the same way he did or, at least, be extremely direct on what I think of such a request. Tldr: not the best approach from me.
2. His message was this (see attached). He said two things: (a) "stopping bugging me with in a random thread", (2) "no, I won't give you access to the API". You can extrapolate whatever you want but in none of those two points he said it wasn't ok to experiment with the forum's data (especially if there is no thread to user privacy/security). His entire position on that message was: stop bothering me and no access to API for any random guy.
3. There were no more interactions with him nor I was looking for such a thing.
4. My objective is doing what I said I would do, leave it here and call it a day. If he wants to use it, great. If not, it's ok, it is still a funny holiday project.
5. Unlike many retards in here, I'm not looking for his approval. Im just trying to lend a hand with the shit I know about. No hurt feelings if no help is required from me.

It's ok if you don't like the idea, I already said it is ok to say if that is the case. But I have nothing more to add to the topic. If you didn't understand my argument, you either get someone to translate it to your native language or...just go do something else..?

Anyway, I won't answer anymore messages here until I get this done. Not here to create drama. I'm here for a fun side project and, if it helps (and requested by him), offer the code and advice to the owner.

Have a good night.
 

Attachments

  • Screenshot_20231108-194700.png
    Screenshot_20231108-194700.png
    122.2 KB · Views: 17
Back