Proving Ethan Writes Pantsu's Tweets and Texts Using Plagiarism Detection Algorithms - Using ghostdetect.com to perform gunt analysis on supposed 'May' posts

temp o'rary · Apr 15, 2022

SiggerNlayer said:
Thats a good summary of how the faces are generated but youre still misunderstanding the process fundamentally. What you're asking is getting really schitzo

I'm just asking for you to comment on the paired faces without knowing the answers. Nothing difficult.

Then if you want you can explain my crucial misunderstanding. But the answers are more important and won't take any time.

MirnaMinkoff · Apr 15, 2022

SiggerNlayer said:
Thats a good summary of how the faces are generated but youre still misunderstanding the process fundamentally. What you're asking is getting really schitzo

Just ignore the people who don’t bother to read the OP or try to and then get mad that you won’t slowly spoon feed it to them to understand via correcting all their inaccurate statements and assertions.

AntiSchwuletteAktion · Apr 15, 2022

I feel like I can confirm nothing by looking at these stupid faces. Can we actually see the numbers this algorithm is spitting out?

Cpl. Long Dong Silver · Apr 15, 2022

My brothers in Christ, get a fucking hobby.

temp o'rary · Apr 15, 2022

MirnaMinkoff said:
Just ignore the people who don’t bother to read the OP or try to and then get mad that you won’t slowly spoon feed it to them to understand via correcting all their inaccurate statements and assertions.

I did read the OP, and then I went to try it out for myself. The results were not as consistent as the OP's, so I was curious as to whether the test would hold up if you remove the result-peeking and cherrypicking that seem to be biasing OP's results. The simplest way of doing that is to blind the authorship at the time of examination, and also to remove the tweet since anyone here can guess the authorship from the content of the tweet. I also collected a bunch of tweets that looked like good comparison material, rather than searching through hundreds of tweets looking for "good matches", since that is guaranteed to bias the results.

I was honestly expecting an answer.

"Oh, but you don't understand." I think I do, and the site doesn't provide a detailed explanation of the exact technique, so I gave a summary of my understanding expecting OP to point out where our understandings differed.

But he didn't, and he won't even try the test and then explain his results, so I'm leaning towards:

(1) OP suspects he'll get unimpressive results
(2) OP doesn't think he'll be able to explain away those results if the tweets turn out to be pretty similar
(3) OP doesn't understand or have full details of the (unpublished, as far as I can tell) process being used by ghostdetect himself, so is unable to point out exactly how what I'm proposing is invalid.

The test will take a minute or so. Really easy. Anyone can join in. Then I'll post the answers, and people who are unhappy with the tweets used can explain why that has affected their performance.

Otherwise, I'll just assume anyone who thinks the analysis is legit but who won't try the test without the answers is ducking like Ralph running from a fight.

AntiSchwuletteAktion said:
I feel like I can confirm nothing by looking at these stupid faces. Can we actually see the numbers this algorithm is spitting out?

I can't find an explanation of the exact algorithm used (I'm guessing the statistics further down the page are involved), nor any attempt to quantify the differences between the samples, nor any indication the algorithm has been studied to determine its success rate. There are published methods for doing this kind of analysis, but the site doesn't say whether it uses one of those or something proprietary.

But you can try the 6 pairs of faces I posted earlier, that might give you an idea of how well it works

GreeneCoDeputy · Apr 15, 2022

temp o'rary said:
Otherwise, I'll just assume anyone who thinks the analysis is legit but who won't try the test without the answers is ducking like Ralph running from a fight.

I can't find an explanation of the exact algorithm used (I'm guessing the statistics further down the page are involved), nor any attempt to quantify the differences between the samples, nor any indication the algorithm has been studied to determine its success rate. There are published methods for doing this kind of analysis, but the site doesn't say whether it uses one of those or something proprietary.

But you can try the 6 pairs of faces I posted earlier, that might give you an idea of how well it works

I agree, this is an important point.

I'm not very familiar with textual analysis algorithms but there is any number of open source/transparent multivariate statistical analysis libraries available that should be able to generate quantitative comparison values directly, like an actual higher dimensional "distance" between arbitrary tweets.

I understand that the Flury-Riedwyl faces are wholy based on some quantitative values but without being able to reduce to a single distance value (I forget the fancy linear algebra term for this) it's going to leave open a lot of room for interpretation (and disagreement).

My understanding is that ghostdetect would be a "presumptive" test, you observe something that you find suspect already and then run the samples for a confirmation that there is something to your suspicion.

It can't be conclusive but it can rule out vastly dissimilar samples (again, this is my understanding, I could be misinterpreting the intent/usage).

Example of open-source natural language processing library:

https://github.com/kpu/kenlm/blob/master/python/example.py

SiggerNlayer · Apr 15, 2022

GreeneCoDeputy said:
I agree, this is an important point.

I'm not very familiar with textual analysis algorithms but there is any number of open source/transparent multivariate statistical analysis libraries available that should be able to generate quantitative comparison values directly, like an actual higher dimensional "distance" between arbitrary tweets.

I understand that the Flury-Riedwyl faces are wholy based on some quantitative values but without being able to reduce to a single distance value (I forget the fancy linear algebra term for this) it's going to leave open a lot of room for interpretation (and disagreement).

My understanding is that ghostdetect would be a "presumptive" test, you observe something that you find suspect already and then run the samples for a confirmation that there is something to your suspicion.

It can't be conclusive but it can rule out vastly dissimilar samples (again, this is my understanding, I could be misinterpreting the intent/usage).

It's just illustrating patterns in the data that gets derived from the text. The issue with this scitzy idea is that it's not exact enough to build an actual universal identifiable face for an author, it might be able to if you had a novel's worth of text to sample but highly unlikely(and even then you'd need a novel's worth of text to compare it to from someone else). I believe this guy is attempting to be clever and has prepared some kind of Sargon-esque gotcha but gaaaaaaaaaaaaaaaaaaaaaaay Im not biting.

You're on the right track though and get what this guy clearly doesnt. We are comparing the similarities and lack of similarities between May and Ralph's writing and its conclusive because as demonstrated in the OP the differences are massive, Ralph's writing is so distinctly different from May's its plain to see in the comparison faces(which just means there's next to no similarities in the raw data).

Big Fat Frog · Apr 15, 2022

This is pretty interesting stuff. Now we need someone to make a Ralph Tweet generator!

Schlomo Silverscreenblatt · Apr 15, 2022

I just like the faces lol

GreeneCoDeputy · Apr 15, 2022

temp o'rary said:
I'm curious how well it works when you don't already know the author of the tweets? More of a blinded test. I spent five minutes grabbing tweets that weren't just links or that didn't crash the ghostdetect site (it chokes on certain combinations of punctuation); they haven't been specially cherrypicked.

So, the question for each picture is: same or different author? Which ones look like Ralph tweets?

1:
View attachment 3181955

2:
View attachment 3181957

3:
View attachment 3181958

4:
View attachment 3181961

5:
View attachment 3181964

6:
View attachment 3181965

For science, I will give it a go. Worst case scenario, I diagnose myself with autistic facial blindness.

1.
View attachment 3181955
Same author

2.
View attachment 3181957
Same author (Ralph)

3.
View attachment 3181958
Different authors

4.
View attachment 3181961
Different authors

5.
View attachment 3181964
Same author (Ralph)

6.
View attachment 3181965
Same author

Capt. Jean Luc Ritard · Apr 15, 2022

Big Fat Frog said:
This is pretty interesting stuff. Now we need someone to make a Ralph Tweet generator!

It's easy it's just "Haters say [insert random thing]. Actually, that's not true."

SiggerNlayer · Apr 15, 2022

GreeneCoDeputy said:
For science, I will give it a go. Worst case scenario, I diagnose myself with autistic facial blindness.

1.
View attachment 3181955
Same author
2.
View attachment 3181957
Same author (Ralph)
3.
View attachment 3181958
Different authors
4.
View attachment 3181961
Different authors
5.
View attachment 3181964
Same author (Ralph)
6.
View attachment 3181965
Same author

Not bad, only 2 results that are fucky(probably due to not being great samples). Now start doing some May tweets for the cause.

Procrastinhater · Apr 15, 2022

Spergichu said:
I doubt Ralph has explicitly told her that he'll fuck up her public image if she ever left him.

She's already a self admitted and previously very vocal pedophile, and everyone knows she fucked the fucking Gunt, not sure how it's possible to fuck up her image even more.

GL09 · Apr 15, 2022

I've done my own independent analysis and can confirm both tweets were definitely written by a rage pig.

ChromaQuack · Apr 15, 2022

@SiggerNlayer I hope you're happy about the DV situation you are about to cause due to ~~scientifically~~ autistically proving beyond a shadow of a doubt that the rage pig loves cosplaying as a horse.

Thots and prayers for Pigger on the phone.

temp o'rary · Apr 15, 2022

GreeneCoDeputy said:
I understand that the Flury-Riedwyl faces are wholy based on some quantitative values but without being able to reduce to a single distance value (I forget the fancy linear algebra term for this)

The general term's a "metric" but that's not very fancy, so... a Minkowski distance, and in practice usually the Euclidean distance, which is just Pythagoras' theorem extended to n dimensions.

GreeneCoDeputy said:
My understanding is that ghostdetect would be a "presumptive" test, you observe something that you find suspect already and then run the samples for a confirmation that there is something to your suspicion.

It can't be conclusive but it can rule out vastly dissimilar samples (again, this is my understanding, I could be misinterpreting the intent/usage).

The site text doesn't like any talk of "proof" and describes it more as a "first step" to flagging up "sharp divergences" in an author's style that call for an explanation and further investigation:

Any tool like this cannot completely rule out false positives (mistakenly ascribing distinct authorship) or false negatives (mistakenly ascribing identical authorship). For instance, the same author may write in very different styles depending on genre and context. At the same time, distinct authors may write in a similar style. As such, similarities or differences in style exposed by this tool cannot prove identical or distinct authorship.

Applying GhostDetect® is, however, a crucial first step in uncovering ghostwriting. Sharp divergences in style, when detected by this tool and purported to come from the same author, call for, at a minimum, explanation.

Based on the information presented on the page, and the lack of any other explanation, here is my guess as to what's happening: it calculates the Flesch-Kincaid, Gunning fog, sentence length etc. statistics listed further down the page. Those statistics are what's being compared and the faces are just a visualization tool, to point out glaring differences in the paired numbers. Those stats are the quantitative data, nothing's being hidden.

The statistics in question aren't independent, they're all slightly different functions of things like word and sentence length and syllable counts, so the feature space likely has a much lower effective dimensionality than what the lists of numbers might suggest. And a tweet is far too small a sample for these to have much certainty.

But would it work for its intended purpose? Probably, yes: it's on a site with a few other "educational" JS apps. If you feed in a high school student's new essay along with an older essay and find out that he's suddenly writing essays with a FK grade level 10 grades higher than before - well, that's a good clue that he's copied it from a book or an article, and it's time to look him in the eye and extract a confession.

It's a simple technique but - assuming that's what we have here - a useful one for longer texts in an educational setting. For comparing something the length of a tweet to determine authorship from among adults of similar reading age, and especially given that twitter circles tend to imitate one another in style for likes? Potentially useless.

Legal disclaimer: the above is pure conjecture and I claim no inside knowledge of how GhostDetect® actually works. Don't sue me if it turns out your product really is an advanced sock detector that works on tweets.

SiggerNlayer said:
It's just illustrating patterns in the data that gets derived from the text. The issue with this scitzy idea is that it's not exact enough to build an actual universal identifiable face for an author, it might be able to if you had a novel's worth of text to sample but highly unlikely(and even then you'd need a novel's worth of text to compare it to from someone else).

You have no idea how the site works because they don't give an explanation. If my guess at how it works is vaguely accurate - in short, it's simply comparing the statistics already listed further down the page - then the text samples we're dealing with are far too short. We're lacking the statistical power to get decent parameter estimates. This is just as invalidating for the comparisons you made in the OP, even if we choose to ignore the blatant problems with your subjective selection of which tweets to present.

SiggerNlayer said:
I believe this guy is attempting to be clever and has prepared some kind of Sargon-esque gotcha but gaaaaaaaaaaaaaaaaaaaaaaay Im not biting.

You already did bite. A bullshitter would duck the test. Someone sincerely mistaken would at least take it and then try to explain away the results. Other than that, there's no gotcha, I honestly expected you to answer. I'll post the answers later.

SiggerNlayer said:
You're on the right track though and get what this guy clearly doesnt. We are comparing the similarities and lack of similarities between May and Ralph's writing and its conclusive because as demonstrated in the OP the differences are massive, Ralph's writing is so distinctly different from May's its plain to see in the comparison faces(which just means there's next to no similarities in the raw data).

A quick tip: I might have dropped this earlier if it wasn't for the sus "oh you don't understand this, I do" followed by no explanation or vague handwaving BS. At first I assumed you probably did know what technique the site was using. It took a few "educate yourself, sweatie"s for the penny to drop: he's full of shit, trying to talk his way out of it, and this was just the first "writing analysis" site he found on google.

Chiridion · Apr 15, 2022

I activated my autism to compare the contributions of "Amanda Ralph" to the Ralph Retort with Ghost Detect.

Trial 1: WASTE OF GAS by "Amanda Ralph" vs DREAM TURNS TO NIGHTMARE by Ethan Ralph

Trial 2: No Child Left Ungroomed by "Amanda Ralph" vs DREAM TURNS TO NIGHTMARE by Ethan Ralph

Trial 3, for completeness: Both of "Amanda Ralph's" articles

The third trial is redundant because this algorithm isn't pair-dependent (the face for WASTE OF GAS is the same for trial 1 and trial 3). Some statistical methods are pair-dependent in this way, so it's good to check.

What do my fellow autists think? I think the faces say Ethan Ralph is a tranny.

Capt. Jean Luc Ritard · Apr 15, 2022

Why do so many of the faces look like Chinese retards that’s the real question here

Generic Retard · Apr 15, 2022

I don't see how comparing two small samples pairwise can form a conclusive statement.
If anything you should concatenate all known texts/tweets from Ethan and all from May and compare those results.

SiggerNlayer · Apr 16, 2022

Generic Retard said:
I don't see how comparing two small samples pairwise can form a conclusive statement.
If anything you should concatenate all known texts/tweets from Ethan and all from May and compare those results.

Read the OP, there are 3 way comparisons also thats not what concatenate means

Proving Ethan Writes Pantsu's Tweets and Texts Using Plagiarism Detection Algorithms - Using ghostdetect.com to perform gunt analysis on supposed 'May' posts

temp o'rary

MirnaMinkoff

Watch Your Step

AntiSchwuletteAktion

Punk's not dead

Cpl. Long Dong Silver

Mad Lad

temp o'rary

GreeneCoDeputy

ᛋᛋ-Scharf., Diet-ᛋᛋ Kill count: 6M+

SiggerNlayer

Least Facially Blind Kiwifarms User

Big Fat Frog

Blowtorch and Corkscrew

Schlomo Silverscreenblatt

Enjoy the ment

GreeneCoDeputy

ᛋᛋ-Scharf., Diet-ᛋᛋ Kill count: 6M+

Capt. Jean Luc Ritard

SiggerNlayer

Least Facially Blind Kiwifarms User

Procrastinhater

Lord Inquisitor of the Holy Inquisition

GL09

ChromaQuack

temp o'rary

Chiridion

If you get mad, you are the slave

Capt. Jean Luc Ritard

Generic Retard

SiggerNlayer

Least Facially Blind Kiwifarms User