42 lines
1.3 KiB
Markdown
42 lines
1.3 KiB
Markdown
# jobs
|
|
|
|
job scraper
|
|
|
|
## Features
|
|
|
|
- Scrapes job listings from website (currently craigslist by region)
|
|
- Saves job listings to a database
|
|
- Users can search for job listings by keywords and region
|
|
- Selection of job listings based on user preferences
|
|
|
|
## Requirements
|
|
|
|
- Database (MySQL/MariaDB)
|
|
- Python 3.x
|
|
- Required Python packages (see requirements.txt)
|
|
|
|
## Installation
|
|
|
|
1. Clone the repository
|
|
2. Create a virtual environment
|
|
3. Install dependencies
|
|
4. Set up environment variables
|
|
5. Run the application
|
|
|
|
## Scheduler Configuration
|
|
|
|
The application includes an automated scheduler that runs the job scraping process every hour. The scheduler is implemented in `web/craigslist.py` and includes:
|
|
|
|
- **Automatic Scheduling**: Scraping runs every hour automatically
|
|
- **Failure Handling**: Retry logic with exponential backoff (up to 3 attempts)
|
|
- **Background Operation**: Runs in a separate daemon thread
|
|
- **Graceful Error Recovery**: Continues running even if individual scraping attempts fail
|
|
|
|
### Scheduler Features
|
|
|
|
- **Retry Mechanism**: Automatically retries failed scraping attempts
|
|
- **Logging**: Comprehensive logging of scheduler operations and failures
|
|
- **Testing**: Comprehensive test suite in `tests/test_scheduler.py`
|
|
|
|
To modify the scheduling interval, edit the `start_scheduler()` function in `web/craigslist.py`.
|