Research in the Era of Bad Search Results - In search of the new "Google Fu"

Bongocat

kiwifarms.net
Joined
Jun 20, 2020
This is a branch off of this thread, but focused on solutions rather than the problem.

I think we can safely say that search (specifically Google) has gotten worse to an astonishing degree. I'm spending roughly x10 more effort scouring compared to what I used to do for work related searches. This thread is about asking the question of what we can do about it as users.

There used to be a thing called 'Google fu', which is a general term for using Google effectively, but in practice (in ye old days of 2014) it meant taking your search term and changing it in subtle ways to get the results you want. That no longer seems to apply today as everything filters to the same normie-centric results, even with drastic changes or complete replacements of words. For example, earlier today I was searching for disabling scripts/macros in epub readers. "<program name> disabling macros", "<program name> disable javascript", "<program name> block scripts", "<program name> disable network access". I did this for several readers. EVERYTHING gave me the same irrelevant results about either DRM or generic pages that had literally nothing to do with anything.

Even the classic "quotes" method no longer works the way you'd expect it to. There are times I know for absolute certain that my search in quotes should return what I want, and Google knows about it, but 9/10 times I get "It looks like there aren't any great matches for your search". Quoting single words seems to be almost entirely ineffective now as often Google will simply ignore the request and return nothing but normie shit about how to share photos with grandma on facebook.

From what I can tell, Google has shifted emphasis of their search algorithm off of literal words and onto synonyms of trending topics. In other words, it just assumes that you're a child and have no idea what you're talking about, but you're probably trying to search for what everyone else is searching for.

As a side note: If you search for "has Google search gotten worse" on Google (I'm a genius, I know), you do get legitimate results but also a lot of shilling. They say "It's gotten more accurate for most people". You know what? I actually believe that. It's because most people do not search for anything except the most basic possible pop culture and commercial results. For professionals that need precision results, however, Google's usefulness has tanked to the point that it's likely causing vast economic damage by obfuscating information that professionals need for reference.

I've started using Duckduckgo as a daily driver, and I can say that it and Google are about equal in usefulness now. They both shit the bed in different ways, but DDGs failures of shooting you off in a random direction are a lot less frustrating than Google's failures of sucking you into a black hole containing only WAP, Nike shoes and WSJ articles.
TL;DR of above, "Google Fu" does not appear to be a thing anymore, at least not in the way it used to work.

Stuff I've discovered that sometimes works:
  • Specifying Site: if you do not specify "site:something.com", 90+% of the time Google will redirect you to useless results from sites that are probably just paying to be in the top results or doing SEO abuse. The downside of course, is that you often have to be aware of the sites that contain your answers, which often is not the case (especially for tech related searches like I do).

  • Asking things as questions: This is a complete inversion of how Google used to work and I cannot explain it, but often putting your search in the form of a question will yield better results than the classic keyword search. For instance, while I was writing this, I got the thought to try another variant of what i said in [complaining], "how do I turn off javascript on <program name>". That gave me far, far more relevant results, even though nobody actually phrased the question that way in the search results.
While this post is mostly related to Google, I'd be happy with methods applying to any site. Do you have any methods you use to torture data out of search engines?
 
Even the classic "quotes" method no longer works the way you'd expect it to. There are times I know for absolute certain that my search in quotes should return what I want, and Google knows about it, but 9/10 times I get "It looks like there aren't any great matches for your search". Quoting single words seems to be almost entirely ineffective now as often Google will simply ignore the request and return nothing but normie shit about how to share photos with grandma on facebook.
Thanks for the confirmation at least. I've been suspecting for a while that boolean search is either nerfed or dead since it seems like no matter how many variations of -, +, NOT, AND, or quotes I try to use, searches still often return some results that completely ignore those parameters, even though Google itself claims that these methods still work. And that stuff was always my go-to for getting decent results.

Interestingly, I've found that sometimes messing with different SafeSearch options will net you different/better results even if you're not looking for something NSFW, because what search engines consider sexually explicit seems to be as ambiguous as everything else they do these days.

Asking things as questions: This is a complete inversion of how Google used to work and I cannot explain it, but often putting your search in the form of a question will yield better results than the classic keyword search. For instance, while I was writing this, I got the thought to try another variant of what i said in [complaining], "how do I turn off javascript on <program name>". That gave me far, far more relevant results, even though nobody actually phrased the question that way in the search results.
My god, it's 2021 and Google has become AskJeeves.
 
Duckduckgo >>>>> Google
DDG is my preferred engine now over google, but lets not pretend that DDG doesn't have serious problems. The two biggest weaknesses are that it's image search is embarrassingly bad and it has trouble distinguishing emphasis on words vs phrases. This is why it's harder to find memes/funny images with DDG.
 
tbh, if you're on clearnet, you should not have ANY expectation of privacy without additional measures. DDG has a massive privacy advantage simply by working with VPNs and Tor. Google actively makes it difficult to use their services on vpns, and nearly impossible to use on Tor.
Fair enough, but I'd honestly just use Google on the clearnet. DDG is about as good as Bing in 2012 and sells nearly as much data, so I don't see the point.
 
The question thing is because most people put in questions. Back when this was less effective, getting someone to just use keywords was pretty much impossible. 100% impossible if the person used speech to text.
 
  • Thunk-Provoking
Reactions: Bongocat
Fair enough, but I'd honestly just use Google on the clearnet. DDG is about as good as Bing in 2012 and sells nearly as much data, so I don't see the point.
I disagree. I've been using Google and DDG side by side for months now. DDG has made great strides in the past few years, likely due to it's increased traffic training it's backend. For the tech related searches that I do, DDG has actually become superior because it's more permissive about allowing 3rd party forums outside of stackexchange, which contain a wealth of information.

As I mentioned in my post, DDG still shits the bed sometimes, but just in different ways than Google does now. So it's optimal for me to use DDG as a first option and Google as a fallback if that fails. That brings a higher success/failure ratio than using Google first and DDG second.

EDIT: actually... I know a lot of techies moved over to DDG in the past few years. I wonder if that's trained it with a bias towards quality tech results, so that's why it's so much better for me.
 
Last edited:
That dudes website gets quoted so much around these parts and when you browse all of what he puts online he just sounds like some tinfoil hat wearing schizo IMO.

In the vein of asking questions, I noticed throwing more related words at the search often helps but google really has gotten astonishingly bad. Duckduckgo and yandex (also sometimes Bing) have more predictable results. Yandex does absolutely no screening of their results whatsoever though so better don't follow the links with javascript enabled, also it's very russia-centric. Yandex is also incredibly good in reverse image search. You can even send it a still of some youtube video and it'll find you the video. Same with pictures of irl areas and games etc.. I don't know what happened to google's reverse image search but a 97 year old with cataracts will do a better job.

Not quite search related but also a good non-google replacement = deepl.com. It does incredibly good translations.
 
That dudes website gets quoted so much around these parts and when you browse all of what he puts online he just sounds like some tinfoil hat wearing schizo IMO.
In the world of tech privacy, the schizos are right more often than not. I thought the "Windows keylogger" meme was hyperbole until I saw this. With default settings, literally every keypress is being sent off to a server somewhere. It's pretty unbelievable, but there it is. That said, I don't think the 'potential spyware' of DDG limits it's usefulness or safety if you take general clearnet precautions, depending on what you're doing.

In the vein of asking questions, I noticed throwing more related words at the search often helps but google really has gotten astonishingly bad. Duckduckgo and yandex (also sometimes Bing) have more predictable results. Yandex does absolutely no screening of their results whatsoever though so better don't follow the links with javascript enabled, also it's very russia-centric. Yandex is also incredibly good in reverse image search. You can even send it a still of some youtube video and it'll find you the video. Same with pictures of irl areas and games etc.. I don't know what happened to google's reverse image search but a 97 year old with cataracts will do a better job.

I tried yandex as a daily driver for a few days and absolutely could not tolerate it. The random russian results cluttering the page were bad enough, but the malicious links were a dealbreaker for me.

I've also tried out Searx, but I can never get any results from Google no matter what server I'm on. I assume google is simply bot blocking results, but then Searx just becomes a slower version of DDG with 10% downtime.
 
I don't know if there's a way to do this, but I doubt it. Writing it down while I'm thinking about it...

The problem with Google doesn't seem to be so much with content indexing as it is with site ranking. Sites with valuable content are buried under normie-favored sites that have nothing to do with what you're searching for. Just daydreaming pie in the sky solutions, I wonder if you could manually target lists of sites in 'levels'. What I mean by that is lets say a default google search contains all tech sites that are returned for tech topics. The top results are dominated by stackoverflow predictably.

Lets say stackoverflow doesn't have what you need. You 'go down a level' which filters out the most popular sites. Maybe that puts forums at the top of the list.

Those don't contain what you need? Go down one more step and you're at more obscure forums and mailing lists.

Those don't have it? Go down one more step and you're at man page archives. etc.

Even something like filtering for alexa ranking might work as a rough approximation to this. Of course the real solution would just be going back to the old f**king algorithm but I think we're passed that.
 
Sites with valuable content are buried under normie-favored sites that have nothing to do with what you're searching for.
Pinterest linking to the image you're looking for but it requires registering account to get the url to imgur. I think that's a good example of how they prioritize normie results.

Adding dates like years and months in can be useful on google. Instead of being served up old info from 2016 it might give you relevant results if you add "august 2020". The month and year doesn't have to be specific because google is already pretty fuzzy with their searches, you just want it to stop giving you the 2016 results.
This works best when searching for news. The inner workings of their news search is an absolute mystery to me.
 
I noticed my results got extremely poor after 2016 (wonder why?) - it felt like Google was giving me the results they 'want' me to see rather than I would on straight data. Alot of 'breaking news' (propaganda) at or near the top if the search was even marginally related to something current day political. Obscure, technical things have now become near impossible to find if you don't know exactly where it is located. You either get normie tier results or even worse, some of those awful bindi written 'tech help' shill sites.

DDG isn't a perfect solution, as many others have stated, but I don't know of a better solution at the moment. My biggest gripes outside the straight search - I wish there were better alternatives to Google Maps/Streetview (bing maps has something similar but its MS) and especially Google Books. Archive.org has a garbage search engine that gives you a bunch of irrelevant nonsense. Google Books started to suck a few years ago, but there is still not a better alternative.

With regard to adding dates, this seems to work less than it used to. I was searching for some old news items and the results I was given were the current day sidebar nonsense, the very thing I was trying to NOT get.
 
Searx is basically a condom for web search, allowing you to search the big engines like Google, Bing, etc and get back direct links rather than the tracking links they serve up. You can even self host it, so throw it on a VPS or a home server connected to your VPN and you've effectively got anonymized search.

I'd imagine something like this could be modified to filter out known spam domains, sites that invade your privacy with shit like tracking pixels, inconvenience-as-a-service type aggregators like Pinterest, etc. This could even possibly be done using a subscription list model a la AdBlock - precompiled community maintained lists you can add/remove/modify entries from for a tailored search experience.

I also came across YaCy recently, but I haven't vetted it enough to endorse it yet. A peer to peer search engine seems like a great place to start, if participation can be drummed up enough. I would say I'm pro-federation for something like this, but we've seen how that goes.
 
Searx is basically a condom for web search, allowing you to search the big engines like Google, Bing, etc and get back direct links rather than the tracking links they serve up. You can even self host it, so throw it on a VPS or a home server connected to your VPN and you've effectively got anonymized search.

I'd imagine something like this could be modified to filter out known spam domains, sites that invade your privacy with shit like tracking pixels, inconvenience-as-a-service type aggregators like Pinterest, etc. This could even possibly be done using a subscription list model a la AdBlock - precompiled community maintained lists you can add/remove/modify entries from for a tailored search experience.

I also came across YaCy recently, but I haven't vetted it enough to endorse it yet. A peer to peer search engine seems like a great place to start, if participation can be drummed up enough. I would say I'm pro-federation for something like this, but we've seen how that goes.
why does google never appear in Searx results? I've used it many times and all of the results are from DDG (literally all of them).
 
Back