Web_scraping_project

Author	SHA1	Message	Date
Ofure Ikheloa	c370de83d5	Refactor scraper and sender modules for improved Redis management and SSL connection handling - Introduced RedisManager class in scraper.py for centralized Redis operations including job tracking and caching. - Enhanced job scraping logic in MultiPlatformJobScraper to support multiple platforms (Ashby, Lever, Greenhouse). - Updated browser initialization and context management to ensure better resource handling. - Improved error handling and logging throughout the scraping process. - Added SSL connection parameters management in a new ssl_connection.py module for RabbitMQ connections. - Refactored sender.py to utilize RedisManager for job deduplication and improved logging mechanisms. - Enhanced CSV processing logic in sender.py with better validation and error handling. - Updated connection parameters for RabbitMQ to support SSL configurations based on environment variables.	2025-12-12 13:48:26 +01:00
Ofure Ikheloa	2d22fbdb92	Enhance AmazonJobScraper to support flexible location matching and extract posted dates; refine LLMJobRefiner prompts for better data extraction.	2025-12-09 12:00:57 +01:00
Ofure Ikheloa	e216db35f9	Increase max pages to scrape and extend wait time between job title scrapes; add posted date to job data extraction	2025-12-09 09:30:44 +01:00
Ofure Ikheloa	224b9c3122	llm_agent now responsible for extraction.	2025-12-05 17:23:31 +01:00
Ofure Ikheloa	160efadbfb	modifications to work with postgre and use llm to extract and refine data	2025-12-05 17:00:43 +01:00
Ofure Ikheloa	4f78a845ae	refactor(llm_agent): switch from XAI to DeepSeek API and simplify job refinement - Replace XAI/Grok integration with DeepSeek's OpenAI-compatible API - Remove schema generation and caching logic - Simplify prompt structure and response parsing - Standardize database schema and markdown output format - Update config to use DEEPSEEK_API_KEY instead of XAI_API_KEY - Change default search keyword in linkedin_main.py	2025-12-01 10:25:37 +01:00
Ofure Ikheloa	d7d92ba8bb	fix(job_scraper): increase timeout values for page navigation The previous timeout values were too short for slower network conditions, causing premature timeouts during job scraping. Increased wait_for_function timeout from 30s to 80s and load_state timeout from 30s to 60s to accommodate slower page loads.	2025-11-27 12:28:21 +01:00
Ofure Ikheloa	d025828036	feat: update LLM model and increase content size limit refactor: update timeout values in job scraper classes feat: add spoof config for renderers and vendors build: update pycache files for config and modules	2025-11-24 13:47:47 +01:00
Ofure Ikheloa	fd4e8c9c05	feat(scraper): add LLM-powered job data refinement and new scraping logic - Implement LLMJobRefiner class for processing job data with Gemini API - Add new job_scraper2.py with enhanced scraping capabilities - Remove search_keywords parameter from scraping engine - Add environment variable loading in config.py - Update main script to use new scraper and target field	2025-11-24 12:25:50 +01:00

9 Commits