Files
jobs/README.md

1.3 KiB

jobs

job scraper

Features

  • Scrapes job listings from website (currently craigslist by region)
  • Saves job listings to a database
  • Users can search for job listings by keywords and region
  • Selection of job listings based on user preferences

Requirements

  • Database (MySQL/MariaDB)
  • Python 3.x
    • Required Python packages (see requirements.txt)

Installation

  1. Clone the repository
  2. Create a virtual environment
  3. Install dependencies
  4. Set up environment variables
  5. Run the application

Scheduler Configuration

The application includes an automated scheduler that runs the job scraping process every hour. The scheduler is implemented in web/craigslist.py and includes:

  • Automatic Scheduling: Scraping runs every hour automatically
  • Failure Handling: Retry logic with exponential backoff (up to 3 attempts)
  • Background Operation: Runs in a separate daemon thread
  • Graceful Error Recovery: Continues running even if individual scraping attempts fail

Scheduler Features

  • Retry Mechanism: Automatically retries failed scraping attempts
  • Logging: Comprehensive logging of scheduler operations and failures
  • Testing: Comprehensive test suite in tests/test_scheduler.py

To modify the scheduling interval, edit the start_scheduler() function in web/craigslist.py.