film / tv / substack / social media / lists / web / celeb / pajiba love / misc / about / cbr
film / tv / substack / web / celeb


There Has Been a Massive Leak of Google's Search Algorithm Documentation

By Dustin Rowles | Social Media | May 29, 2024 |

By Dustin Rowles | Social Media | May 29, 2024 |


Folks who are not necessarily in the business of running a website or optimizing other websites for Google’s search engine may find this too boring and inside baseball, but it’s a big deal in certain circles of the Internet. An anonymous source has leaked thousands of search documents that provide information on Google’s secretive search algorithm.

It’s a big deal because there is an entire industry of professionals — many of whom are useful and some of whom are online grifters — devoted to optimizing websites so that they receive more traffic from Google. There are thousands of factors that go into the algorithm (apparently 14,000, in fact), but Google has not only been secretive about it, but their carefully worded public statements have often contradicted information in the leaked documentation.

It’s important to note here that, though Google has not always been straightforward about its algorithm factors, they have always stressed the importance of writing good, original content and letting the rest take care of itself. For the most part, that’s exactly what we try to do while mostly ignoring SEO minutiae. Over on Uproxx, where I worked for a decade, there was a lot more focus on search engine optimization (so much so that it would often take 20 or 30 percent longer to complete an article), and now half of Uproxx seems to be written not for an actual audience but for search engines (they recently laid off most of their feature writers and literally brought in and his AI software to reposition the site. Seriously).

Most of what was revealed was honestly common sense, but some of the revelations are interesting, nonetheless. For instance, despite public assertions to the contrary, the specific author of a piece does factor into search ranking. This explains a lot. For about a decade, I probably wrote more articles about The Walking Dead than anyone else on the Internet, so when I would write about the series — either here or over on Uproxx — my posts would generally rank well on Google. In contrast, other writers on the same sites would not necessarily rank as well on the same subject. That makes sense, though: Someone with authority on particular subjects should be able to maintain that authority with search engines across other sites.

In fact, that’s one of the main takeaways from the leak: So-called E.E.A.T. matters: Expertise, Experience, Authoritativeness, and Trustworthiness. Google also has specific systems in place to assess and score news content that falls under sensitive topics pertaining to a person’s health, financial stability, safety, or well-being (in other words, there’s a reason why WebMD and the Mayo Clinic rise to the top of most health-related topics. They’re trustworthy. is not.).

Other revelations: Despite their protestations, Google does assign a domain authority to every site; click data does factor in (in other words, if a certain site is frequently clicked for a specific search, the site will rise in the rankings); and there apparently is a so-called “sandbox” that limits visibility for new sites. Importantly, the documentation also says that certain sites that provide election information are whitelisted (or demoted) during election periods.

It’s also important to note here that, based on the information from the leak (and what site publishers have known for many, many years), it’s not easy to game Google, and tailoring articles and web pages specifically to Google can even backfire. SEO professionals can help, but they can only help so much.

What I can’t find in the documentation, however, is any reference pertaining to why 70 percent of search queries now bring up random Reddit pages. I suspect, however, that it has something to do with the fact that Google paid Reddit $60 million to use its content to train their AI.

Sources: ipullrank, Sparktoro, and the Verge