I understand that the Flury-Riedwyl faces are wholy based on some quantitative values but without being able to reduce to a single distance value (I forget the fancy linear algebra term for this)
The general term's a "metric" but that's not very fancy, so... a Minkowski distance, and in practice usually the Euclidean distance, which is just Pythagoras' theorem extended to n dimensions.
My understanding is that ghostdetect would be a "presumptive" test, you observe something that you find suspect already and then run the samples for a confirmation that there is something to your suspicion.
It can't be conclusive but it can rule out vastly dissimilar samples (again, this is my understanding, I could be misinterpreting the intent/usage).
The site text doesn't like any talk of "proof" and describes it more as a "first step" to flagging up "sharp divergences" in an author's style that call for an explanation and further investigation:
Any tool like this cannot completely rule out false positives (mistakenly ascribing distinct authorship) or false negatives (mistakenly ascribing identical authorship). For instance, the same author may write in very different styles depending on genre and context. At the same time, distinct authors may write in a similar style. As such, similarities or differences in style exposed by this tool cannot prove identical or distinct authorship.
Applying GhostDetect® is, however, a crucial first step in uncovering ghostwriting. Sharp divergences in style, when detected by this tool and purported to come from the same author, call for, at a minimum, explanation.
Based on the information presented on the page, and the lack of any other explanation, here is my guess as to what's happening: it calculates the Flesch-Kincaid, Gunning fog, sentence length etc. statistics listed further down the page. Those statistics
are what's being compared and the faces are just a visualization tool, to point out glaring differences in the paired numbers. Those stats
are the quantitative data, nothing's being hidden.
The statistics in question aren't independent, they're all slightly different functions of things like word and sentence length and syllable counts, so the feature space likely has a much lower effective dimensionality than what the lists of numbers might suggest. And a tweet is far too small a sample for these to have much certainty.
But would it work for its intended purpose? Probably, yes: it's on a site with a few other "educational" JS apps. If you feed in a high school student's new essay along with an older essay and find out that he's suddenly writing essays with a FK grade level 10 grades higher than before - well, that's a good clue that he's copied it from a book or an article, and it's time to look him in the eye and extract a confession.
It's a simple technique but - assuming that's what we have here - a useful one for longer texts in an educational setting. For comparing something the length of a tweet to determine authorship from among adults of similar reading age, and especially given that twitter circles tend to imitate one another in style for likes? Potentially useless.
Legal disclaimer: the above is pure conjecture and I claim no inside knowledge of how GhostDetect® actually works. Don't sue me if it turns out your product really
is an advanced sock detector that works on tweets.
It's just illustrating patterns in the data that gets derived from the text. The issue with this scitzy idea is that it's not exact enough to build an actual universal identifiable face for an author, it might be able to if you had a novel's worth of text to sample but highly unlikely(and even then you'd need a novel's worth of text to compare it to from someone else).
You have no idea how the site works because they don't give an explanation. If my guess at how it works is vaguely accurate - in short, it's simply comparing the statistics already listed further down the page - then the text samples we're dealing with are far too short. We're lacking the statistical power to get decent parameter estimates. This is just as invalidating for the comparisons you made in the OP, even if we choose to ignore the blatant problems with your subjective selection of which tweets to present.
I believe this guy is attempting to be clever and has prepared some kind of Sargon-esque gotcha but gaaaaaaaaaaaaaaaaaaaaaaay Im not biting.
You already did bite. A bullshitter would duck the test. Someone sincerely mistaken would at least take it and then try to explain away the results. Other than that, there's no gotcha, I honestly expected you to answer. I'll post the answers later.
You're on the right track though and get what this guy clearly doesnt. We are comparing the similarities and lack of similarities between May and Ralph's writing and its conclusive because as demonstrated in the OP the differences are massive, Ralph's writing is so distinctly different from May's its plain to see in the comparison faces(which just means there's next to no similarities in the raw data).
A quick tip: I might have dropped this earlier if it wasn't for the sus "oh you don't understand this, I do" followed by no explanation or vague handwaving BS. At first I assumed you probably
did know what technique the site was using. It took a few "educate yourself, sweatie"s for the penny to drop: he's full of shit, trying to talk his way out of it, and this was just the first "writing analysis" site he found on google.