2.5 KiB
2.5 KiB
Sitemap Generator
A Python-based utility that automatically generates an XML sitemap for the blog website. The sitemap helps search engines discover and index the blog's content more efficiently.
Features
- Automatically generates a sitemap.xml file following the Sitemap Protocol
- Includes both static pages and dynamic blog post entries
- Regularly updates the sitemap on a scheduled basis using cron jobs
- Containerized for easy deployment
Requirements
- Python 3.x
- Docker (for containerized deployment)
- Dependencies:
- pydantic
- requests
Configuration
The sitemap generator uses the following environment variables:
Variable | Description | Default |
---|---|---|
API_BASE_URL |
Base URL of the blog's API | (required) |
FRONTEND_URL |
Base URL of the frontend website | (required) |
STORAGE_PATH |
Path where the generated sitemap.xml will be stored | ./static |
Usage
Local Execution
-
Install the required dependencies:
pip install -r requirements.txt
-
Set the environment variables:
export API_BASE_URL=http://api.example.com export FRONTEND_URL=http://www.example.com
-
Run the generator:
python gen_sitemap.py
Docker Deployment
-
Build the Docker image:
docker build -t blog-sitemap-generator .
-
Create a directory on the host to store the generated sitemap:
mkdir -p /path/to/host/dir touch /path/to/host/dir/sitemap.xml
-
Run the container:
docker run -d \ -e API_BASE_URL=http://api.example.com \ -e FRONTEND_URL=http://www.example.com \ -v /path/to/host/dir/sitemap.xml:/app/static/sitemap.xml \ blog-sitemap-generator
Scheduled Execution
The sitemap is automatically generated according to the schedule defined in the crontab
file:
- Every day at 00:00, 08:00, and 16:00 (UTC)
File Structure
gen_sitemap.py
: Main script that generates the sitemaprequirements.txt
: Python dependenciescrontab
: Cron job schedule configurationDockerfile
: Container configuration for deploymentREADME.md
: Documentation (this file)
Output
The generator produces a standard XML sitemap at the specified STORAGE_PATH
containing:
- The homepage URL
- The posts listing page URL
- Individual post URLs with their last modification dates