implement automated job scraping scheduler with retry logic and logging

2025-11-01 18:00:59 +01:00
parent 8e3a6f4f41
commit 504dc8e2b0
5 changed files with 109 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -22,3 +22,20 @@ job scraper
 3. Install dependencies
 4. Set up environment variables
 5. Run the application
+
+## Scheduler Configuration
+
+The application includes an automated scheduler that runs the job scraping process every hour. The scheduler is implemented in `web/craigslist.py` and includes:
+
+- **Automatic Scheduling**: Scraping runs every hour automatically
+- **Failure Handling**: Retry logic with exponential backoff (up to 3 attempts)
+- **Background Operation**: Runs in a separate daemon thread
+- **Graceful Error Recovery**: Continues running even if individual scraping attempts fail
+
+### Scheduler Features
+
+- **Retry Mechanism**: Automatically retries failed scraping attempts
+- **Logging**: Comprehensive logging of scheduler operations and failures
+- **Testing**: Comprehensive test suite in `tests/test_scheduler.py`
+
+To modify the scheduling interval, edit the `start_scheduler()` function in `web/craigslist.py`.