Yandex scrapes Google and other SEO learnings from the source ... - Search Engine Land

Last updated Monday, January 30, 2023 16:05 ET , Source: NewsService

Yandex isn't Google, but there is a lot SEOs can learn about how a modern search engine is built from reviewing this codebase.

“Fragments” of Yandex’s codebase leaked online last week. Much like Google, Yandex is a platform with many aspects such as email, maps, a taxi service, etc. The code leak featured chunks of all of it.

According to the documentation therein, Yandex’s codebase was folded into one large repository called Arcadia in 2013. The leaked codebase is a subset of all projects in Arcadia and we find several components in it related to the search engine in the “Kernel,” “Library,” “Robot,” “Search,” and “ExtSearch” archives.

The move is wholly unprecedented. Not since the AOL search query data of 2006 has something so material related to a web search engine entered the public domain.

Although we are missing the data and many files that are referenced, this is the first instance of a tangible look at how a modern search engine works at the code level.

Personally, I can’t get over how fantastic the timing is to be able to actually see the code as I finish my book “The Science of SEO” where I’m talking about Information Retrieval, how modern search engines actually work, and how to build a simple one yourself.

In any event, I’ve been parsing through the code since last Thursday and any engineer will tell you that is not enough time to understand how everything works. So, I suspect there will be several more posts as I keep tinkering.

Before we jump in, I want to...

Read Full Story: https://news.google.com/__i/rss/rd/articles/CBMiOWh0dHBzOi8vc2VhcmNoZW5naW5lbGFuZC5jb20veWFuZGV4LWxlYWstbGVhcm5pbmdzLTM5MjM5M9IBAA?oc=5

Your content is great. However, if any of the content contained herein violates any rights of yours, including those of copyright, please contact us immediately by e-mail at media[@]kissrpr.com.