Putting it all together
Introduction In this blog post, I will combine my previous blog posts and lay out a draft plan for my research project. As I have said before, the basic idea is to use web scraping techniques to create a tool to filter recipes for the 14 allergen groups. As far as this project is concerned, there are two web scraping two main and relevant techniques: HTML and DOM parsing. There is the added component that, due to Google’s recipe cards, most recipe sites have metadata that includes the ingredients in the HTML content and is in a standard format. That means it will be easier for a web scraper to generalise the recipe searcher to different websites. Most web scrapers are written in either Python or R, and these languages have the most available open-source frameworks specifically for web scraping. There is also the added component that web scraping often has to work around anti-bot technology, as website owners usually try and stop the content on their websites from being ...