blog/sitemap/README.md

2.5 KiB

Sitemap Generator

A Python-based utility that automatically generates an XML sitemap for the blog website. The sitemap helps search engines discover and index the blog's content more efficiently.

Features

  • Automatically generates a sitemap.xml file following the Sitemap Protocol
  • Includes both static pages and dynamic blog post entries
  • Regularly updates the sitemap on a scheduled basis using cron jobs
  • Containerized for easy deployment

Requirements

  • Python 3.x
  • Docker (for containerized deployment)
  • Dependencies:
    • pydantic
    • requests

Configuration

The sitemap generator uses the following environment variables:

Variable Description Default
API_BASE_URL Base URL of the blog's API (required)
FRONTEND_URL Base URL of the frontend website (required)
STORAGE_PATH Path where the generated sitemap.xml will be stored ./static

Usage

Local Execution

  1. Install the required dependencies:

    pip install -r requirements.txt
    
  2. Set the environment variables:

    export API_BASE_URL=http://api.example.com
    export FRONTEND_URL=http://www.example.com
    
  3. Run the generator:

    python gen_sitemap.py
    

Docker Deployment

  1. Build the Docker image:

    docker build -t blog-sitemap-generator .
    
  2. Create a directory on the host to store the generated sitemap:

    mkdir -p /path/to/host/dir
    touch /path/to/host/dir/sitemap.xml
    
  3. Run the container:

    docker run -d \
      -e API_BASE_URL=http://api.example.com \
      -e FRONTEND_URL=http://www.example.com \
      -v /path/to/host/dir/sitemap.xml:/app/static/sitemap.xml \
      blog-sitemap-generator
    

Scheduled Execution

The sitemap is automatically generated according to the schedule defined in the crontab file:

  • Every day at 00:00, 08:00, and 16:00 (UTC)

File Structure

  • gen_sitemap.py: Main script that generates the sitemap
  • requirements.txt: Python dependencies
  • crontab: Cron job schedule configuration
  • Dockerfile: Container configuration for deployment
  • README.md: Documentation (this file)

Output

The generator produces a standard XML sitemap at the specified STORAGE_PATH containing:

  • The homepage URL
  • The posts listing page URL
  • Individual post URLs with their last modification dates