allucanget/jobs

Go to file

zwitschi 1678e1366e

CI/CD Pipeline / test (push) Has been cancelled

Details

fix: remove unnecessary build and push jobs from CI/CD pipeline

2025-11-01 19:47:02 +01:00

.gitea/workflows

fix: remove unnecessary build and push jobs from CI/CD pipeline

2025-11-01 19:47:02 +01:00

initial project commit

2025-08-29 15:07:58 +02:00

removing static lists

2025-08-29 23:04:16 +02:00

fix: adjust exponential backoff timing in scrape_jobs_with_retry

2025-11-01 18:31:35 +01:00

fix: adjust exponential backoff timing in scrape_jobs_with_retry

2025-11-01 18:31:35 +01:00

.dockerignore

Docker functionality

2025-09-08 14:51:04 +02:00

.gitignore

changing app title

2025-08-30 13:24:33 +02:00

deploy.sh

Docker functionality

2025-09-08 14:51:04 +02:00

docker-compose-test.yml

simpler Dockerfile and compose test

2025-09-14 16:10:11 +02:00

docker-compose.yml

fix: update Dockerfile and documentation for APT_CACHER_NG configuration

2025-11-01 19:41:58 +01:00

docker-entrypoint.sh

correct db in docker entrypoint

2025-09-14 16:23:00 +02:00

Dockerfile

fix: update Dockerfile and documentation for APT_CACHER_NG configuration

2025-11-01 19:41:58 +01:00

gunicorn.conf.py

Docker functionality

2025-09-08 14:51:04 +02:00

LICENSE

Initial commit

2025-08-29 13:12:29 +02:00

main.py

implement automated job scraping scheduler with retry logic and logging

2025-11-01 18:00:59 +01:00

pytest.ini

initial project commit

2025-08-29 15:07:58 +02:00

README-Docker.md

fix: update Dockerfile and documentation for APT_CACHER_NG configuration

2025-11-01 19:41:58 +01:00

README-Traefik.md

adjusting for exsting traefik host

2025-09-08 18:34:10 +02:00

README.md

fix: update Dockerfile and documentation for APT_CACHER_NG configuration

2025-11-01 19:41:58 +01:00

requirements.txt

implement automated job scraping scheduler with retry logic and logging

2025-11-01 18:00:59 +01:00

setup.py

fix table setup

2025-09-17 17:11:45 +02:00

setup.sh

extending setup

2025-08-29 16:31:20 +02:00

README.md

jobs

job scraper

Features

Scrapes job listings from website (currently craigslist by region)
Saves job listings to a database
Users can search for job listings by keywords and region
Selection of job listings based on user preferences

Requirements

Database (MySQL/MariaDB)
Python 3.x
- Required Python packages (see requirements.txt)

Installation

Clone the repository
Create a virtual environment
Install dependencies
Set up environment variables
Run the application

Scheduler Configuration

The application includes an automated scheduler that runs the job scraping process every hour. The scheduler is implemented in web/craigslist.py and includes:

Automatic Scheduling: Scraping runs every hour automatically
Failure Handling: Retry logic with exponential backoff (up to 3 attempts)
Background Operation: Runs in a separate daemon thread
Graceful Error Recovery: Continues running even if individual scraping attempts fail

Scheduler Features

Retry Mechanism: Automatically retries failed scraping attempts
Logging: Comprehensive logging of scheduler operations and failures
Testing: Comprehensive test suite in tests/test_scheduler.py

To modify the scheduling interval, edit the start_scheduler() function in web/craigslist.py.

Docker Deployment

Please see README-Docker.md for instructions on deploying the application using Docker.

Languages

Python 79%

HTML 12.8%

JavaScript 4.9%

Shell 1.6%

CSS 1.1%

Other 0.6%