Proving Ethan Writes Pantsu's Tweets and Texts Using Plagiarism Detection Algorithms - Using ghostdetect.com to perform gunt analysis on supposed 'May' posts

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Thats a good summary of how the faces are generated but youre still misunderstanding the process fundamentally. What you're asking is getting really schitzo
I'm just asking for you to comment on the paired faces without knowing the answers. Nothing difficult.

Then if you want you can explain my crucial misunderstanding. But the answers are more important and won't take any time.
 
  • Autistic
Reactions: SiggerNlayer
Thats a good summary of how the faces are generated but youre still misunderstanding the process fundamentally. What you're asking is getting really schitzo
Just ignore the people who don’t bother to read the OP or try to and then get mad that you won’t slowly spoon feed it to them to understand via correcting all their inaccurate statements and assertions.
 
Just ignore the people who don’t bother to read the OP or try to and then get mad that you won’t slowly spoon feed it to them to understand via correcting all their inaccurate statements and assertions.
I did read the OP, and then I went to try it out for myself. The results were not as consistent as the OP's, so I was curious as to whether the test would hold up if you remove the result-peeking and cherrypicking that seem to be biasing OP's results. The simplest way of doing that is to blind the authorship at the time of examination, and also to remove the tweet since anyone here can guess the authorship from the content of the tweet. I also collected a bunch of tweets that looked like good comparison material, rather than searching through hundreds of tweets looking for "good matches", since that is guaranteed to bias the results.

I was honestly expecting an answer.

"Oh, but you don't understand." I think I do, and the site doesn't provide a detailed explanation of the exact technique, so I gave a summary of my understanding expecting OP to point out where our understandings differed.

But he didn't, and he won't even try the test and then explain his results, so I'm leaning towards:

(1) OP suspects he'll get unimpressive results
(2) OP doesn't think he'll be able to explain away those results if the tweets turn out to be pretty similar
(3) OP doesn't understand or have full details of the (unpublished, as far as I can tell) process being used by ghostdetect himself, so is unable to point out exactly how what I'm proposing is invalid.

The test will take a minute or so. Really easy. Anyone can join in. Then I'll post the answers, and people who are unhappy with the tweets used can explain why that has affected their performance.

Otherwise, I'll just assume anyone who thinks the analysis is legit but who won't try the test without the answers is ducking like Ralph running from a fight.


I feel like I can confirm nothing by looking at these stupid faces. Can we actually see the numbers this algorithm is spitting out?
I can't find an explanation of the exact algorithm used (I'm guessing the statistics further down the page are involved), nor any attempt to quantify the differences between the samples, nor any indication the algorithm has been studied to determine its success rate. There are published methods for doing this kind of analysis, but the site doesn't say whether it uses one of those or something proprietary.

But you can try the 6 pairs of faces I posted earlier, that might give you an idea of how well it works
 
Otherwise, I'll just assume anyone who thinks the analysis is legit but who won't try the test without the answers is ducking like Ralph running from a fight.



I can't find an explanation of the exact algorithm used (I'm guessing the statistics further down the page are involved), nor any attempt to quantify the differences between the samples, nor any indication the algorithm has been studied to determine its success rate. There are published methods for doing this kind of analysis, but the site doesn't say whether it uses one of those or something proprietary.

But you can try the 6 pairs of faces I posted earlier, that might give you an idea of how well it works
I agree, this is an important point.

I'm not very familiar with textual analysis algorithms but there is any number of open source/transparent multivariate statistical analysis libraries available that should be able to generate quantitative comparison values directly, like an actual higher dimensional "distance" between arbitrary tweets.

I understand that the Flury-Riedwyl faces are wholy based on some quantitative values but without being able to reduce to a single distance value (I forget the fancy linear algebra term for this) it's going to leave open a lot of room for interpretation (and disagreement).

My understanding is that ghostdetect would be a "presumptive" test, you observe something that you find suspect already and then run the samples for a confirmation that there is something to your suspicion.

It can't be conclusive but it can rule out vastly dissimilar samples (again, this is my understanding, I could be misinterpreting the intent/usage).

Example of open-source natural language processing library:
 
Last edited:
I agree, this is an important point.

I'm not very familiar with textual analysis algorithms but there is any number of open source/transparent multivariate statistical analysis libraries available that should be able to generate quantitative comparison values directly, like an actual higher dimensional "distance" between arbitrary tweets.

I understand that the Flury-Riedwyl faces are wholy based on some quantitative values but without being able to reduce to a single distance value (I forget the fancy linear algebra term for this) it's going to leave open a lot of room for interpretation (and disagreement).

My understanding is that ghostdetect would be a "presumptive" test, you observe something that you find suspect already and then run the samples for a confirmation that there is something to your suspicion.

It can't be conclusive but it can rule out vastly dissimilar samples (again, this is my understanding, I could be misinterpreting the intent/usage).
It's just illustrating patterns in the data that gets derived from the text. The issue with this scitzy idea is that it's not exact enough to build an actual universal identifiable face for an author, it might be able to if you had a novel's worth of text to sample but highly unlikely(and even then you'd need a novel's worth of text to compare it to from someone else). I believe this guy is attempting to be clever and has prepared some kind of Sargon-esque gotcha but gaaaaaaaaaaaaaaaaaaaaaaay Im not biting.

You're on the right track though and get what this guy clearly doesnt. We are comparing the similarities and lack of similarities between May and Ralph's writing and its conclusive because as demonstrated in the OP the differences are massive, Ralph's writing is so distinctly different from May's its plain to see in the comparison faces(which just means there's next to no similarities in the raw data).
 
I'm curious how well it works when you don't already know the author of the tweets? More of a blinded test. I spent five minutes grabbing tweets that weren't just links or that didn't crash the ghostdetect site (it chokes on certain combinations of punctuation); they haven't been specially cherrypicked.

So, the question for each picture is: same or different author? Which ones look like Ralph tweets?

1:
View attachment 3181955

2:
View attachment 3181957

3:
View attachment 3181958

4:
View attachment 3181961

5:
View attachment 3181964

6:
View attachment 3181965

For science, I will give it a go. Worst case scenario, I diagnose myself with autistic facial blindness.

1.
View attachment 3181955
Same author

2.
View attachment 3181957
Same author (Ralph)

3.
View attachment 3181958
Different authors

4.
View attachment 3181961
Different authors

5.
View attachment 3181964
Same author (Ralph)

6.
View attachment 3181965
Same author
 
Last edited:
For science, I will give it a go. Worst case scenario, I diagnose myself with autistic facial blindness.

1.
View attachment 3181955
Same author
2.
View attachment 3181957
Same author (Ralph)
3.
View attachment 3181958
Different authors
4.
View attachment 3181961
Different authors
5.
View attachment 3181964
Same author (Ralph)
6.
View attachment 3181965
Same author
Not bad, only 2 results that are fucky(probably due to not being great samples). Now start doing some May tweets for the cause.
 
  • Like
Reactions: GreeneCoDeputy
I doubt Ralph has explicitly told her that he'll fuck up her public image if she ever left him.
She's already a self admitted and previously very vocal pedophile, and everyone knows she fucked the fucking Gunt, not sure how it's possible to fuck up her image even more.
 
I've done my own independent analysis and can confirm both tweets were definitely written by a rage pig.
rp.jpg
 
I understand that the Flury-Riedwyl faces are wholy based on some quantitative values but without being able to reduce to a single distance value (I forget the fancy linear algebra term for this)
The general term's a "metric" but that's not very fancy, so... a Minkowski distance, and in practice usually the Euclidean distance, which is just Pythagoras' theorem extended to n dimensions.

My understanding is that ghostdetect would be a "presumptive" test, you observe something that you find suspect already and then run the samples for a confirmation that there is something to your suspicion.

It can't be conclusive but it can rule out vastly dissimilar samples (again, this is my understanding, I could be misinterpreting the intent/usage).

The site text doesn't like any talk of "proof" and describes it more as a "first step" to flagging up "sharp divergences" in an author's style that call for an explanation and further investigation:
Any tool like this cannot completely rule out false positives (mistakenly ascribing distinct authorship) or false negatives (mistakenly ascribing identical authorship). For instance, the same author may write in very different styles depending on genre and context. At the same time, distinct authors may write in a similar style. As such, similarities or differences in style exposed by this tool cannot prove identical or distinct authorship.

Applying GhostDetect® is, however, a crucial first step in uncovering ghostwriting. Sharp divergences in style, when detected by this tool and purported to come from the same author, call for, at a minimum, explanation.

Based on the information presented on the page, and the lack of any other explanation, here is my guess as to what's happening: it calculates the Flesch-Kincaid, Gunning fog, sentence length etc. statistics listed further down the page. Those statistics are what's being compared and the faces are just a visualization tool, to point out glaring differences in the paired numbers. Those stats are the quantitative data, nothing's being hidden.

The statistics in question aren't independent, they're all slightly different functions of things like word and sentence length and syllable counts, so the feature space likely has a much lower effective dimensionality than what the lists of numbers might suggest. And a tweet is far too small a sample for these to have much certainty.

But would it work for its intended purpose? Probably, yes: it's on a site with a few other "educational" JS apps. If you feed in a high school student's new essay along with an older essay and find out that he's suddenly writing essays with a FK grade level 10 grades higher than before - well, that's a good clue that he's copied it from a book or an article, and it's time to look him in the eye and extract a confession.

It's a simple technique but - assuming that's what we have here - a useful one for longer texts in an educational setting. For comparing something the length of a tweet to determine authorship from among adults of similar reading age, and especially given that twitter circles tend to imitate one another in style for likes? Potentially useless.

Legal disclaimer: the above is pure conjecture and I claim no inside knowledge of how GhostDetect® actually works. Don't sue me if it turns out your product really is an advanced sock detector that works on tweets.

It's just illustrating patterns in the data that gets derived from the text. The issue with this scitzy idea is that it's not exact enough to build an actual universal identifiable face for an author, it might be able to if you had a novel's worth of text to sample but highly unlikely(and even then you'd need a novel's worth of text to compare it to from someone else).
You have no idea how the site works because they don't give an explanation. If my guess at how it works is vaguely accurate - in short, it's simply comparing the statistics already listed further down the page - then the text samples we're dealing with are far too short. We're lacking the statistical power to get decent parameter estimates. This is just as invalidating for the comparisons you made in the OP, even if we choose to ignore the blatant problems with your subjective selection of which tweets to present.

I believe this guy is attempting to be clever and has prepared some kind of Sargon-esque gotcha but gaaaaaaaaaaaaaaaaaaaaaaay Im not biting.
You already did bite. A bullshitter would duck the test. Someone sincerely mistaken would at least take it and then try to explain away the results. Other than that, there's no gotcha, I honestly expected you to answer. I'll post the answers later.

You're on the right track though and get what this guy clearly doesnt. We are comparing the similarities and lack of similarities between May and Ralph's writing and its conclusive because as demonstrated in the OP the differences are massive, Ralph's writing is so distinctly different from May's its plain to see in the comparison faces(which just means there's next to no similarities in the raw data).
A quick tip: I might have dropped this earlier if it wasn't for the sus "oh you don't understand this, I do" followed by no explanation or vague handwaving BS. At first I assumed you probably did know what technique the site was using. It took a few "educate yourself, sweatie"s for the penny to drop: he's full of shit, trying to talk his way out of it, and this was just the first "writing analysis" site he found on google.
 
I activated my autism to compare the contributions of "Amanda Ralph" to the Ralph Retort with Ghost Detect.

Trial 1: WASTE OF GAS by "Amanda Ralph" vs DREAM TURNS TO NIGHTMARE by Ethan Ralph
Test1.PNG

Trial 2: No Child Left Ungroomed by "Amanda Ralph" vs DREAM TURNS TO NIGHTMARE by Ethan Ralph
Test2.PNG
Trial 3, for completeness: Both of "Amanda Ralph's" articles
Test3.PNG

The third trial is redundant because this algorithm isn't pair-dependent (the face for WASTE OF GAS is the same for trial 1 and trial 3). Some statistical methods are pair-dependent in this way, so it's good to check.

What do my fellow autists think? I think the faces say Ethan Ralph is a tranny.
 
I don't see how comparing two small samples pairwise can form a conclusive statement.
If anything you should concatenate all known texts/tweets from Ethan and all from May and compare those results.
 
I don't see how comparing two small samples pairwise can form a conclusive statement.
If anything you should concatenate all known texts/tweets from Ethan and all from May and compare those results.
Read the OP, there are 3 way comparisons also thats not what concatenate means
 
  • Like
Reactions: I hate children?
Back