What is Google’s Penguin Link Algorithm? [RESEARCH]
Penguin note seed folder Google representatives said little about how the Penguin algorithm works. This means that the Penguin algorithm is more or less. A mystery to the search marketing community. But, there is enough evidence to explain what Penguin is and how it works.
This article’s goal is to evaluate the available data… And begin to understand the Penguin algorithm. Additionally, a patent was published by Google in late 2015.
Which is discussed in the SEO community and forgotten. This might be the key to understanding Penguin (I’ll talk about it later. I’ll provide more information on this subject).
Some people might question the necessity. , it’s our job as SEOs to at least understand how search engines work. This is what our industry has been doing since day one. No part of Google is uncontrolled. So why stop at Penguins? There is no point working in the dark. Let’s shed some light on this bird!
What Penguin is… Not
Is Penguin a Trust Algorithm?
Read Also:
To know what something is, it helps to know what it is not. Claims have surfaced that Penguin is a “relied-on algorithm.”?
The truth about trust algorithms is that they are aimed at larger websites. For this reason. The original Trust Rank research. The paper was replaced with another research paper. Topical Trust Rank. Topical Trust Rank was 19-43.1% better at spam detection than simple Trust Rank. But, the authors of this study acknowledged. The algorithm had some flaws and researchers needed to conduct further research.
Statements from Google employees in early 2007. Made it clear that Google did not use Trust Rank. Additionally, Google emphasized in 2011 that trust itself was not a ranking factor. But that “trust” was a catch-all term the company used for a variety of signals. The statements state that Google does not use the Trust Rank algorithm.
Not a patent application, not a Google blog post. Not an X (Twitter) tweet or Facebook post. suggesting that Penguin is some sort of trusted ranking algorithm. There is no evidence that I know. That Penguin is a trust-based algorithm. It is so fair to observe. That Penguin is not a trusted ranking algorithm.
Does Penguin Use Machine Learning?
Gary Ellis confirmed in October 2016 that Penguin did not use machine learning. This is an important note.
To put it simply, machine learning is the training process. A computer recognizes an object and gives HMI signals about it. What does that object look like even if there are any hypothetical scenarios that?. You can use it with a Computer.
You will be able to use your computer, regardless of whether. It indicates a problem or not, and if you do not have a computer, you will be able to use it. If you think about it, it’s black, it’s black and white, it’s a good thing to do.
In machine learning, these indicators are called classifiers. The SEO industry calls these classifier signals. Typical SEO tries to generate “quality signals”. Machine learning algorithms use classification to understand. Whether a web page meets the definition of a standard web page. It can also be counterproductive when it comes to spam.
Does Penguin Use Statistical Analysis?
The possibility that statistical analysis plays a role at Penguin is often raised. Statistical analysis identifies common variables. On normal websites and spam websites. Variables range from anchor text ratio. To percentage of links from the home page to the rest of the site. When scanning the entire web, unusual (spam) pages appear. These (spam) sites are called outliers.
The use of data analysis as a technique to combat spam. confirm at Pub Con New Orleans 2005. When Google engineers discussed. It was during a keynote presentation. So we know that statistical analysis has been part of Google’s. Anti-spam efforts since at least 2005.
One of the most famous research papers. Microsoft released a research paper in 2004 on statistical analysis. The title of this research paper is Spam, Demon Spam, and Statistics.
Statistical analysis has shown that. Spam websites follow certain patterns when constructing links. These models state their activities. Penguins can do much more than detect signs of activity.
The Significance of Penguin Not Being a Machine Learning System
The importance of this knowledge is that. We can now understand that. Penguin does not identify spam links based on quality signals, also called classifiers. Thus, we can be pretty sure that. Penguin does not learn to identify spam based on statistical signals.
Examples of link-based spam features:
- The percentage of inbound links containing anchor text.
- The ratio of links from the home page to internal pages
- Links/outgoing links ratio
- Edge Reciprocity (spam sites with higher PageRank have fewer reciprocal link patterns)
Our comprehension of its meaning has improved. When it says that Penguin is not machine learning. Which sometimes requires statistical analysis.
What the Penguin Algorithm May Be…
Research in information retrieval has taken many directions. But among the work on link analysis. One type of algorithm stands. Out as a new direction in link spam detection. This new type of algorithm is called the link ranking algorithm. Or link distance ranking algorithm. It is more appropriate to call. It is a link ranking algorithm. And I’ll go into further depth about it.
Instead of ranking websites. This new algorithm ranks links. This type of algorithm is different from all previous linking algorithms. A Google patent application was filed in 2006. And published in 2015 describes the algorithm as follows:
…a mechanism that assigns a web page’s ranking. On distances between the pages. where the pages interconnect with links to form a link graph. Moreover, a set of high-quality seed pages was chosen as references. For ranking the pages in the link graph. and the closest separations from the collection of seed pages. To each given page in the link-graph compute.
In plain English, this means that. Google chooses high-quality websites. As a starting point for creating a web map. (called a link graph). This link diagram measures the distance between a starting page. And another website and ranks websites. The shorter the distance between a classic website. And the start page, the more authoritative the website.
This is Not a Trust Algorithm
The patent makes no mention of being a trust algorithm. Six references to “trusted” websites were made. But this is in the context of describing the quality of the starting page. And not describing the algorithm. The words “distance” and “remoteness” are used 69 times in the patent. This is important. Because the term “distance” describes this algorithm more.
If this patent is the definition of Penguin. Then calling Penguin a trusted algorithm is incorrect. Penguin can be described as a link ranking algorithm. Short-haul routes are ranked higher. Than long-haul routes. This distance setting is important. Because distance from the homepage makes a link a high-quality link. There is no quality in trust. There is only distance. This can be called the link distance algorithm. Or link ranking algorithm.
How Are Link Distances Calculated?
The patent solves the problem of calculating distance. Ranking scores for the entire link graph are insoluble. It is published on Google:
In general, it is a must to use. A large number of seed sheets to accommodate. A wide range of domains in different languages. And fast-growing web content. Unfortunately, this variation of PageRank requires. The entire system is to solve for each seed. Thus, as the number of seed pages increases. Computational difficulty increases. Warning the number of seed pages that can be used in practice
The obvious terms. The problems of devious link money for a whole link graph. And suggests expanding. The pages of the starting set. Across careful topics. This shortens ranking controls. (And also explains the problem of choosing large and important websites). Here’s what Google clearly says:
… As the number of germ sheets increases. The computational difficulty increases. Warning the number of seed pages. That can be used in practical web pages. Mixed start page without. The practical issues described above.
What does Google mean by some starting pages? This diversity has before been described as linking to a range of websites. With the Google Index (DMOZ). And The New York Times cited as examples. This information supports this need.
“…it will be vital to have the main seed set possible. Counting as many different seed types as likely.
There are other link rankings. And click distance ranking algorithms. That addresses diversity in the context of exact topics. This is a common strategy for improving truth.
Distance Ranking Explained
The goal of this algorithm is to create a small link graph. That filters out websites manipulating links. Here’s how it was achieved:
 The system then displays the link’s length depending on that. On the link properties and the pages associated with the link. The system then defines the pages for each page in the page group based on the length of the links between the pages. “The set of Calculates the shortest distance. The algorithm then assigns a rating score to every page. In the page set based on the calculated shortest distance.
Penguin in Plain English
The system generates a score based. On the smallest distance between the seed set page and the non-seed set pages. The score is used to rank these pages. So it’s an overlay on the PageRank score. That allows manipulated links to be removed, based. On the idea that manipulated links. Have connections between themselves and the trusted crowd. Remote.
Good websites do not link to bad websites. Bad websites link to good websites. The seed distance algorithm strengthens. The linking trends of good websites. And the linking properties of bad websites separate them. And manages them in their own region (spam).
Link Direction and Spam Detection
A remarkable statement from 2007. (A large-scale study of link spam finding using graphical algorithms). Found that link way is a good indicator of spam:
“…when detecting link spam, the course of the links is vital since sites that are spam often link to legitimate sites while excellent sites never link to spam sites…”
The truth of this remark is that the direction of the links matters. Which shows Penguin’s truth? Algorithms can reject these links from reduced link graphs. So the net effect is that they don’t harm a good website. This observation is consistent with Google’s claim. That low-quality links do no harm to a non-spam website. And the links may have no effect on a normal website.
So What’s the Key Takeaway?
This moves the value of the negative. The deny noise is a table uploaded to Google to focus on low-quality links. Google staff have noted that rejects are no longer necessary for Penguin. Because low-quality links are no longer an issue in Penguin’s problems.
Disavowals, Penguin, and You
Let’s Market Museum, Inc. Stop by to view the disclaimer report, courtesy of Jeff Coyle, co-founder and CRO. Jeff has had a long. And distinguished career in search marketing, particularly in B2B lead generation. Its conclusions are as follows:
Low-quality link flow has little impact on a large website. On a website that struggles to gain any form of off-page authority or power. It can be a major blow when a series of unfortunate links come into play.
Then I flew to the UK to hear Jason Duke, CEO of Domain Name. Jason has decades of experience in competitive search marketing. Here are his thoughts on the rejected reports related to the Penguin algorithm:
It is normal for links to be of poor quality. Such unregulated properties can do whatever they want. And connect with you, and some of them are bad and not what you like.
it is important to disavow historical actions by you. Your predecessors, or any other party that owns your website. But looking at the Internet as a whole, it’s necessary. Low-quality links are ignored as they become the norm.
Both opinions (and my own) agree with what Gary Ellis described about denial. Disclaimers are not required in the Penguin context. But can be useful to make it clear that links are out of context or of poor quality and for which you are responsible.
Are You In or Are You Out?
With this algorithm, important keyword expressions. There is no chance of position if the page is told to the seed set. And is not linked to spam groups. The patent refers to the algorithm’s flexibility against link spam techniques:
 “A likely variation of PageRank that lessens. The impact of these methods is to select sure “trusted” pages (also called seed pages). And discover other pages that are trustworthy. A good way to do this is to follow links.
Note that this is different from the old Yahoo Trust Rank algorithm. It replaced that Yahoo Trust Rank bias near larger sites. Because the first set was not various. The next study showed that different seeds set by various subjects were more exact.
Side Note
Not all trust algorithms are equal. Majestic’s temporal trust flow metric is an example of a valid trust metric. The reason for its accuracy is that different seeds use it. Majestic’s Topical Trust Flow is a useful tool for assessing. The caliber of a webpage or website when it comes to establishing links.
Reduced Link Graph
As I understand it. This Google patent calculates. The distance between a set of trust seeds. And provides a trust/distance value which is then used as an overlay on ranked websites. , PageRank scores are almost like a filter on websites to remove less reputable websites. The result is called a sparse link graph. It is very important. Let’s take a closer look at what the reduced link graph means for your search engine marketing strategy.
TAKEAWAY 1: Websites that have inbound. And outbound link relationships to pages. Outside of the reduced link graphs are never included. and thus do not appear in the top ten ranking positions. Spam links offer no appeal.
TAKEAWAY 2: Since this algorithm prevents spam links. From having any impact (positive or negative). High-quality websites are not affected by spam links. With this algorithm, either a link contributes to a website’s ranking or it does not.
TAKEAWAY 3:Â The dual effect of identifying. Blocking spam sites is embodied in the concept of a sparse link graph.
Penguin’s goal is not to put spam labels on spam sites and trustworthy labels on normal sites. It’s about moving to a less connected pattern. The low link graph is Penguin’s goal. because it filters out websites that try to influence the algorithm.
Read Also: