- Replace XAI/Grok integration with DeepSeek's OpenAI-compatible API - Remove schema generation and caching logic - Simplify prompt structure and response parsing - Standardize database schema and markdown output format - Update config to use DEEPSEEK_API_KEY instead of XAI_API_KEY - Change default search keyword in linkedin_main.py
Web_scraping_project
Web_scraping_project
This project aims to be able to scrape job data from websites without been flagged by sites as a bot. At the heart of the project is the scraping_engine.py which has the necessary configurations to make the scraping stealth as much as possible. Currently, there is the job_scraper.py and linkedin_main.py which are specific for scraping linkedin. the scraping engine file is plugged into the linkedin_main.py to make it stealth. Which means for the scraping to be ffective, instead of building general scraping engines, we would build specific engines and plug the scraping_engine.py into them. The major advantage that zenrows would have over us would be the AI factor that can self learn and provide recommendations for improvements. The job_scraper.py does the work from logging in/signing to filling log-in details, search for jobs, scraping, scraping and saving the data into markdown and database. It does so with extreme human simulation to avoid been triggered as a bot. this is necessary as the scraping_engine.py handles canvas and webgl configurations(which means that it ensures that when the site interacts with the machine been used, it will see it as a real machine not bot operation) which is only one part of bye-passing antibot. the interactions itself also have to be undetected. the job_scraper.py was built to handle the second aspect while doing it's job. The linkedin_main.py is the entry point of the engine, this is where you input what to search, how many pages to set and ensure that the correct username and password identifier is the same as what is in your .env file, this is also where you set the seed name that enable the scraping engine to give a random profile to your name.