implement automated job scraping scheduler with retry logic and logging
This commit is contained in:
17
README.md
17
README.md
@@ -22,3 +22,20 @@ job scraper
|
||||
3. Install dependencies
|
||||
4. Set up environment variables
|
||||
5. Run the application
|
||||
|
||||
## Scheduler Configuration
|
||||
|
||||
The application includes an automated scheduler that runs the job scraping process every hour. The scheduler is implemented in `web/craigslist.py` and includes:
|
||||
|
||||
- **Automatic Scheduling**: Scraping runs every hour automatically
|
||||
- **Failure Handling**: Retry logic with exponential backoff (up to 3 attempts)
|
||||
- **Background Operation**: Runs in a separate daemon thread
|
||||
- **Graceful Error Recovery**: Continues running even if individual scraping attempts fail
|
||||
|
||||
### Scheduler Features
|
||||
|
||||
- **Retry Mechanism**: Automatically retries failed scraping attempts
|
||||
- **Logging**: Comprehensive logging of scheduler operations and failures
|
||||
- **Testing**: Comprehensive test suite in `tests/test_scheduler.py`
|
||||
|
||||
To modify the scheduling interval, edit the `start_scheduler()` function in `web/craigslist.py`.
|
||||
|
||||
Reference in New Issue
Block a user