Commit Graph

  • 5901e9c1a1 Refactor scraper.py to improve code readability by removing unnecessary blank lines and ensuring consistent formatting. Ashby_agent Ofure Ikheloa 2025-12-15 14:18:52 +01:00
  • 5939f2bd04 Refactor RedisManager methods to enhance job caching and error handling; implement sent and error cache management for improved job processing flow. Ofure Ikheloa 2025-12-15 14:14:16 +01:00
  • e2e1bc442e Refactor RedisManager methods for improved error handling and logging; streamline job validation process by ensuring all compulsory fields are checked before processing. Ofure Ikheloa 2025-12-15 10:34:41 +01:00
  • 87c67265f8 Refactor environment variable handling in scraper; remove default values for RabbitMQ and Redis configurations. Enhance job validation by checking for all compulsory fields before processing. Ofure Ikheloa 2025-12-15 09:37:52 +01:00
  • 2c5b42b7bd Refactor job tracking to use job ID instead of job URL in RedisManager methods Ofure Ikheloa 2025-12-15 09:08:27 +01:00
  • b13d14d26d Enhance job handling in scraper and sender modules: - Update fetch timeout in StealthyFetcher for improved reliability. - Refactor LLMJobRefiner to create and manage Quelah Jobs table in PostgreSQL. - Modify RedisManager to track sent job counts for jobs.csv and adjust deduplication logic. - Implement job URL-based deduplication across scraper and sender. Ofure Ikheloa 2025-12-12 21:14:37 +01:00
  • c370de83d5 Refactor scraper and sender modules for improved Redis management and SSL connection handling Ofure Ikheloa 2025-12-12 13:48:26 +01:00
  • 0c447d0f77 Merge branch 'Ashby_agent' of https://gitea.thejobhub.xyz/Ofure/Web_scraping_project into Ashby_agent Ofure Ikheloa 2025-12-10 13:26:54 +01:00
  • 94d87943de Refactor environment variable handling in AshbyJobScraper and Sender classes; remove fallback values for RabbitMQ and Redis configurations. Ofure Ikheloa 2025-12-10 13:26:47 +01:00
  • c0c7925be3 Delete amazon_main.py Ofure 2025-12-10 11:07:39 +00:00
  • 20408dd5a6 Delete amazon_job_scraper.py Ofure 2025-12-10 11:07:15 +00:00
  • 762846cb4a Add AshbyJobScraper and Sender classes for job scraping and message sending; implement Redis caching and RabbitMQ integration. Ofure Ikheloa 2025-12-10 12:02:43 +01:00
  • 2d22fbdb92 Enhance AmazonJobScraper to support flexible location matching and extract posted dates; refine LLMJobRefiner prompts for better data extraction. amazon_agent Ofure Ikheloa 2025-12-09 12:00:57 +01:00
  • e216db35f9 Increase max pages to scrape and extend wait time between job title scrapes; add posted date to job data extraction Ofure Ikheloa 2025-12-09 09:30:44 +01:00
  • cbcffa8cd4 modify to queue failed jobs and also extract date of job posting Ofure Ikheloa 2025-12-09 09:12:35 +01:00
  • 2b1387b3e6 modify to include scraping date posted, queuing failed jobs to be sent to redis for later scraping with back-up scraper. llm_agent Ofure Ikheloa 2025-12-09 08:07:39 +01:00
  • 4782f174e2 Delete browser_sessions/job_scraping_12_session.json Ofure 2025-12-05 17:49:56 +00:00
  • 10fa1ac633 Delete browser_sessions/job_scraping_123_session.json Ofure 2025-12-05 17:49:46 +00:00
  • ba783112f5 Delete spoof_config.json Ofure 2025-12-05 17:49:30 +00:00
  • 8fa59ba69b modify llm agent to compulsorily identify and scrape all provided fields Ofure Ikheloa 2025-12-05 18:36:36 +01:00
  • 91047cfc5c Delete job_scraper.py Ofure 2025-12-05 17:00:25 +00:00
  • 9ed5641540 Delete tr.py Ofure 2025-12-05 16:50:52 +00:00
  • 370fce0514 Merge branch 'amazon_agent' of https://gitea.thejobhub.xyz/Ofure/Web_scraping_project into amazon_agent Ofure Ikheloa 2025-12-05 17:50:10 +01:00
  • efa47d50ae amazon specific built engine Ofure Ikheloa 2025-12-05 17:49:31 +01:00
  • e49860faae Delete linkedin_main.py Ofure 2025-12-05 16:45:12 +00:00
  • 0942339426 Delete job_scraper2.py Ofure 2025-12-05 16:44:52 +00:00
  • 7e80801f89 Delete job_scraper.py Ofure 2025-12-05 16:44:23 +00:00
  • 06f9820c38 Delete feedback_job_scraping_123.json Ofure 2025-12-05 16:44:08 +00:00
  • fbde4d03e1 Delete feedback_job_scraping_12.json Ofure 2025-12-05 16:43:42 +00:00
  • d0aabc5970 Delete .env Ofure 2025-12-05 16:43:25 +00:00
  • 672c6a0333 scraper for amazon Ofure Ikheloa 2025-12-05 17:25:54 +01:00
  • 224b9c3122 llm_agent now responsible for extraction. Ofure Ikheloa 2025-12-05 17:23:31 +01:00
  • 160efadbfb modifications to work with postgre and use llm to extract and refine data Ofure Ikheloa 2025-12-05 17:00:43 +01:00
  • 4f78a845ae refactor(llm_agent): switch from XAI to DeepSeek API and simplify job refinement Ofure Ikheloa 2025-12-01 10:25:37 +01:00
  • d7d92ba8bb fix(job_scraper): increase timeout values for page navigation Ofure Ikheloa 2025-11-27 12:28:21 +01:00
  • d025828036 feat: update LLM model and increase content size limit Ofure Ikheloa 2025-11-24 13:47:47 +01:00
  • fd4e8c9c05 feat(scraper): add LLM-powered job data refinement and new scraping logic Ofure Ikheloa 2025-11-24 12:25:50 +01:00
  • 7dca4c9159 refactor(job_scraper): improve page loading and typing in linkedin scraper Ofure Ikheloa 2025-11-23 09:27:05 +01:00
  • 458e914d71 feat(scraping): enhance job scraping with session persistence and feedback system main Headless Ofure Ikheloa 2025-11-21 16:51:26 +01:00
  • 68495a0a54 Update README.md Ofure 2025-11-21 08:53:05 +00:00
  • 01d4ca8001 Add linkedin_main.py Ofure 2025-11-20 19:00:43 +00:00
  • f52868edfa Add job_scraper.py Ofure 2025-11-20 18:59:46 +00:00
  • 1a216a1aa8 Add scraping_engine.py Ofure 2025-11-20 18:58:26 +00:00
  • 28d7197378 Initial commit Ofure 2025-11-20 18:56:21 +00:00