SEO Doctor

How Tokopedia monitor its SEO

Aatif Bandey
Tokopedia Engineering
5 min readJan 14, 2021

--

SEO is the backbone of any website. It makes the web page more visible for search engines, which means more traffic and more opportunities to convert prospects into customers.
At Tokopedia, our frontend engineering team has built multiple in-house tools to cater to our custom requirements.

In this blog, I will share how we created one such tool that is helping Tokopedia boost SEO performance. We call this tool SEO Doctor, a tool to monitor SEO health.

Elements of SEO Doctor

1. Meta Report

Tokopedia has multiple pages like home, category, find. We have around 3000+ categories, every page has different metadata, and it is tough to manage all our pages with the required and correct meta information.

Meta tags

Meta report is one of the features of SEO doctor. We generate it via a cron job. The job is run every day at a specified time to check if the required meta tags like title, description, keywords are available on the web page or not.

Meta report

The report card in green shows all tags are available for mobile and desktop pages. If any of the tags are missing, the card will turn red.

2. SSR report

To get better results on search engines, the content of the page should be readable by web crawlers. Crawlers like google bot scan your HTML and accordingly index your page in their search results.

In Tokopedia, we have Server-Side Rendered (SSR) pages built with react, but there might be a case where SSR gets broken due to bad deployment. If it is, then bots can’t read your content and you might lose traffic for your pages in a react app.

SEO doctor keeps a check on SSR for the desktop and mobile pages. The tool publishes the SSR report every 30 minutes by crawling the Tokopedia mobile and desktop pages.

SSR report

A cron job generates and saves the report in the database. Slack is alerted for issues found while crawling the page.

3. Rank Tracker

Keywords define what your content is all about. How SEO performant your page is directly proportional to keywords on your site.

If you have an online shop selling clothes, your keywords would be similar to buy jeans, shirts, buy pants, etc.

Tokopedia has multiple pages and every page has a different set of keywords defining its purpose. Rank tracker helps to check the page rank depending upon each keyword, for example, if you google the keyword buy iPhone, the search results will list websites selling iPhones.

The rank tracker tracks the daily data and posts the same into our database.

I have added the example report for a few keywords corresponding to Tokopedia on google search results.

I have shared the basic idea of how we monitor SEO health for our pages.
We will be adding more features like HTML-validator, amp-validator, etc into the product to strengthen our SEO.

Now let’s get a little deeper and understand the technology we have used to build this system.

The Engineering behind SEO doctor

This tool is one of the services in our frontend platform. We have used multiple libraries to build this tool: on the backend, we have koajs, node-cron, redis-war-lock, MongoDB, puppeteer, and on the frontend, we have React js.

We have used koa to write our APIs on the node. These are straightforward REST APIs, like GET, POST & DELETE performing CRUD operations. These APIs are consumed by the frontend dashboard to generate reports and are also triggered via multiple cron jobs, the cron-job is executed via node-cron (a scheduler written in javascript for Node js).

We are using puppeteer to crawl our web pages, a headless browser is launched via puppeteer which hits our web pages and extracts the HTML out of them. I have shared the sample code on how to make use of the puppeteer to crawl any web page.

The HTML is then filtered at different layers, there are various tests performed to check for SEO tags, server-side rendering, etc, these tests generate a JSON output and the result is then stored in our database.

In the snippet, we have POST API api/save/data which is used to save JSON results. We are using MongoDB as our database.

The schema for mongo collection

Above shared are the basic requirements to set up a cron, crawl pages, and store data in a node js app. But there are a few more things we require to keep the system running smoothly.

Redis and redis-warlock implementation

We have multiple servers where our application is deployed and we don’t want cron job to run on every server (if it had already been triggered on one of the servers), to prevent triggering cron on multiple servers we are creating sessions and managing them with ioredis and redis-war-lock.

We have used ioredis to create a redis-client which is used to communicate with redis database. We have also used redis warlock library to implement distributed lock with redis so that different processes can operate with shared resources.
With help of redis warlock, we were able to trigger one cron job from multiple servers for a specified cron time. Redis warlock creates a lock and, the lock key is saved in the redis. When a server is trying to trigger a cron job it creates a lock, and if any other servers try to activate the same cron job, redis warlock will not allow it, if the lock already exists. The lock can also be removed by calling the unlock() function

Closure

This is the complete architecture of SEO Doctor, how we have built this system using multiple technologies, and it also defines our tech culture we promote in Tokopedia and the innovations we do.

I hope you enjoyed reading this article.
Stay tuned for more interesting stuff coming up.

Thank you!

P.S. 👋 Hi, I’m Aatif! If you liked this, be sure to follow me here and on Twitter!

--

--