Web_scraping_project

Author	SHA1	Message	Date
Ofure Ikheloa	2b1387b3e6	modify to include scraping date posted, queuing failed jobs to be sent to redis for later scraping with back-up scraper.	2025-12-09 08:07:39 +01:00
Ofure Ikheloa	8fa59ba69b	modify llm agent to compulsorily identify and scrape all provided fields	2025-12-05 18:36:36 +01:00
Ofure Ikheloa	224b9c3122	llm_agent now responsible for extraction.	2025-12-05 17:23:31 +01:00
Ofure Ikheloa	160efadbfb	modifications to work with postgre and use llm to extract and refine data	2025-12-05 17:00:43 +01:00
Ofure Ikheloa	4f78a845ae	refactor(llm_agent): switch from XAI to DeepSeek API and simplify job refinement - Replace XAI/Grok integration with DeepSeek's OpenAI-compatible API - Remove schema generation and caching logic - Simplify prompt structure and response parsing - Standardize database schema and markdown output format - Update config to use DEEPSEEK_API_KEY instead of XAI_API_KEY - Change default search keyword in linkedin_main.py	2025-12-01 10:25:37 +01:00
Ofure Ikheloa	d7d92ba8bb	fix(job_scraper): increase timeout values for page navigation The previous timeout values were too short for slower network conditions, causing premature timeouts during job scraping. Increased wait_for_function timeout from 30s to 80s and load_state timeout from 30s to 60s to accommodate slower page loads.	2025-11-27 12:28:21 +01:00
Ofure Ikheloa	d025828036	feat: update LLM model and increase content size limit refactor: update timeout values in job scraper classes feat: add spoof config for renderers and vendors build: update pycache files for config and modules	2025-11-24 13:47:47 +01:00
Ofure Ikheloa	fd4e8c9c05	feat(scraper): add LLM-powered job data refinement and new scraping logic - Implement LLMJobRefiner class for processing job data with Gemini API - Add new job_scraper2.py with enhanced scraping capabilities - Remove search_keywords parameter from scraping engine - Add environment variable loading in config.py - Update main script to use new scraper and target field	2025-11-24 12:25:50 +01:00

8 Commits