GitHub Support just straight up confirmed in an email that yes, they used all public GitHub code, for Codex/Copilot regardless of license - GitHub does another retarded initiative

  • 🔧 Actively working on site again.

awoo

Please be patient, I have awootism
True & Honest Fan
kiwifarms.net
Joined
Apr 20, 2018
https://www.reddit.com/r/programmin...hub_support_just_straight_up_confirmed_in_an/

Github Copilot is a retarded idea of a VS Code extension that suggests code for you based on some model trained on publicly available code on github. I'd love to know which braindead manager or tranny got swept up in the "AI hype" and decided this was remotely a good idea.
Anyone with two brain cells knows that machines can barely produce semantically meaningful human text and blindly copy-pasting code will lead to dumpster-fire code.
Now, they may be in legal trouble since their retarded devs used all code on github, regardless of license.
 
https://www.reddit.com/r/programmin...hub_support_just_straight_up_confirmed_in_an/

Github Copilot is a retarded idea of a VS Code extension that suggests code for you based on some model trained on publicly available code on github. I'd love to know which braindead manager or tranny got swept up in the "AI hype" and decided this was remotely a good idea.
Anyone with two brain cells knows that machines can barely produce semantically meaningful human text and blindly copy-pasting code will lead to dumpster-fire code.
Now, they may be in legal trouble since their retarded devs used all code on github, regardless of license.
It is an interesting problem to solve, which is why their ml team came up it I'm sure. I haven't and wont use it, but I imagine it looks at the name of the function you create in the language of your file, looks at the dependencies you're using, and then what, finds some public code to pull from. I wonder how they weigh code quality. Are they pulling mostly from open source?
So when you start writing in a node file Const fetchUserFromApi...
it will look where first? An open source project with a ton of stars/forks? the documentation for axios because you have it as a dependency? How far until it pulls from that personal project someone made that doesnt work. It tries to parse what you want from the name of the function, but will copilot have suggestions for naming conventions? I am tempted to see how it works.
Also, I should have recognized ML/AI would buttfuck devs eventually. This will undoubtedly become efficient. ML is really the only path to take as a dev at this point, that and Cloud engineering.

i ended up reading through that reddit thread, and i am once again reminded that the only group of people I am disgusted by more than trannies, are reddit trannies.
 
Last edited:
It is an interesting problem to solve, which is why their ml team came up it I'm sure. I haven't and wont use it, but I imagine it looks at the name of the function you create in the language of your file, looks at the dependencies you're using, and then what, finds some public code to pull from. I wonder how they weigh code quality. Are they pulling mostly from open source?
So when you start writing in a node file Const fetchUserFromApi...
it will look where first? An open source project with a ton of stars/forks? the documentation for axios because you have it as a dependency? How far until it pulls from that personal project someone made that doesnt work. It tries to parse what you want from the name of the function, but will copilot have suggestions for naming conventions? I am tempted to see how it works.
Also, I should have recognized ML/AI would buttfuck devs eventually. This will undoubtedly become efficient. ML is really the only path to take as a dev at this point, that and Cloud engineering.

I'm highly skeptical, because even if you name your functions well like Const fetchUserFromApi..., this function could do one of a million things, depending on which API you're using, what exactly a "user" is, etc. I don't see how this will be useful unless the machine knows exactly your program requirements, or it's a function extremely straightforward like read_csv_file
 
  • Like
Reactions: Full Race Replay
They should have trained the model on Stack Overflow instead. I'd be doing the needful at lightspeed.
1601701608004.png
 
I'm highly skeptical, because even if you name your functions well like Const fetchUserFromApi..., this function could do one of a million things, depending on which API you're using, what exactly a "user" is, etc. I don't see how this will be useful unless the machine knows exactly your program requirements, or it's a function extremely straightforward like read_csv_file
It has to look at a part of your code base before making any prediction otherwise you're right it would be useless. But how can it do all of this quickly?

I have to try this out now. Hopefully the trannies don't torpedo it before I can play with it.
 

I was waiting for Microsoft to make good on the Github acquisition, and there it is. Reading between the lines I'd guess they used all public repositories as training data, but until they say that private repositories are excluded, you can probably assume they're included too. Microsoft did make private repositories free when they acquired Github so it's possible they did that as a CYA.

The feedback loop definitely is coming from VS Code phoning home. Autocomplete a function definition, that's no good, that data is logged and used in future autocompletes. Good reason to stop using VS Code if you still needed one.
 

I was waiting for Microsoft to make good on the Github acquisition, and there it is. Reading between the lines I'd guess they used all public repositories as training data, but until they say that private repositories are excluded, you can probably assume they're included too. Microsoft did make private repositories free when they acquired Github so it's possible they did that as a CYA.

The feedback loop definitely is coming from VS Code phoning home. Autocomplete a function definition, that's no good, that data is logged and used in future autocompletes. Good reason to stop using VS Code if you still needed one.
I used neovim for all my personal projects, but for work I need to use something with the ability to live write together with others, like VScodes Liveshare. The only other one I know of is Intellij but it costs money. What else is there?
 
  • Like
Reactions: Full Race Replay
I used neovim for all my personal projects, but for work I need to use something with the ability to live write together with others, like VScodes Liveshare. The only other one I know of is Intellij but it costs money. What else is there?
Sublime Text comes to mind. Not free, but you can use it as nagware forever.

I've heard good things about TextWrangler/BBEdit, but that's macOS only.


Edit: didn't read the "live write together" part. Not sure then. Google Docs or codeshare.io?
 
I used neovim for all my personal projects, but for work I need to use something with the ability to live write together with others, like VScodes Liveshare. The only other one I know of is Intellij but it costs money. What else is there?

If you're still a student you can get JetBrains license for free

But do you really need live code writing? This is the kinda thing that should be handled with code reviews I thought
 
  • Agree
Reactions: Knight of the Rope
Would one of you be so kind as to translate this into Retard so that I might understand it?

Are you saying that they took all of GitHub's hard work and fed it into an AI? Which would make the acquisition of GitHub no more than a cynical grab at information? And that Microsoft didn't want to do the hard work themselves and really had no interest in the concept of GitHub in the first place seeing as they even took licensed work?
 
Would one of you be so kind as to translate this into Retard so that I might understand it?

Are you saying that they took all of GitHub's hard work and fed it into an AI? Which would make the acquisition of GitHub no more than a cynical grab at information? And that Microsoft didn't want to do the hard work themselves and really had no interest in the concept of GitHub in the first place seeing as they even took licensed work?
The Copilot Machine Learning model was trained using code from every repository available on GitHub. It has been shown that Copilot is able to reproduce code verbatim from its learning pool. That means Copilot can potentially be used to "launder" code with the goal of removing the Copyleft licenses (which make the software free and open) of the original code and use it for closed proprietary software. This is a massive legal shitshow brewing.
 
It uses public code (it might use private repos, we're not sure) so it is akin to copy pasting code you found on github, however since a tool does it for you it might be legally implicated for not respecting licenses as well.
 
Also, I should have recognized ML/AI would buttfuck devs eventually. This will undoubtedly become efficient.
I've worked in ML stuff for a bit recently and it's all very underwhelming to me. The models just seem way too specialized and easy to fool, and I'm starting to see stakeholders get really burned by their ridiculous expectations for what the AI 'should' be able to do. And if you're using ML/AI for mathematical/statistical modeling then they're next to worthless in terms of actually understanding the thing you're trying to model (which is arguably the entire point of the modeling process at all).

I'm fully blackpilled on the 'AI revolution' at this point and to be honest I'm all-in on another AI winter coming soon. I think people are starting to wake up from the hype.

It has to look at a part of your code base before making any prediction otherwise you're right it would be useless. But how can it do all of this quickly?
From what I read about it earlier all Copilot reads is the currently open file. So yes, in its current iteration it's useless for its stated purpose.
 
I've worked in ML stuff for a bit recently and it's all very underwhelming to me. The models just seem way too specialized and easy to fool, and I'm starting to see stakeholders get really burned by their ridiculous expectations for what the AI 'should' be able to do. And if you're using ML/AI for mathematical/statistical modeling then they're next to worthless in terms of actually understanding the thing you're trying to model (which is arguably the entire point of the modeling process at all).

The current state of the art is still very specialized tasks. So machines are very good at things like seeing a picture and telling you what animal is shown, or even performing translations, however they are not at the step of generalizing. I listened to an interview with Geoff Hinton, one of the pioneers of neural networks, and he said that we may only be able to generalize when we can train machines on vast quantities of unlabeled data where we have to guess our own labels rather than having them provided beforehand. He also suggests some of our intelligence such as the way our optic cells are arranged already encodes useful information that isn't learned but has evolved over time, which is some justification for constructing network architectures rather than learning everything.
 
Back