Posts

Putting it all together

  Introduction In this blog post, I will combine my previous blog posts and lay out a draft plan for my research project. As I have said before, the basic idea is to use web scraping techniques to create a tool to filter recipes for the 14 allergen groups. As far as this project is concerned, there are two web scraping two main and relevant techniques: HTML and DOM parsing. There is the added component that, due to Google’s recipe cards, most recipe sites have metadata that includes the ingredients in the HTML content and is in a standard format. That means it will be easier for a web scraper to generalise the recipe searcher to different websites. Most web scrapers are written in either Python or R, and these languages have the most available open-source frameworks specifically for web scraping. There is also the added component that web scraping often has to work around anti-bot technology, as website owners usually try and stop the content on their websites from being ...

Mixed Methods for research

Image
  Introduction One of the things I’ve learnt recently is about research methodologies and their different strengths and weaknesses. Understanding these can be very helpful as they can focus the research and allow you to develop a plan. However, before diving deeper into this, let’s define two terms we will use throughout this post: quantitative and qualitative. Quantitative – The oxford dictionary defines quantitative as relating to, measuring, or measuring by the quantity of something rather than its quality. In short, Quantitative is understood as numerical rather than based on ‘qualities’. Qualitative – Relates to ‘qualities’, such as colours, shapes, and sounds. In the research context, it will usually mean interviews with or accounts from research participants. Mixed Methods Research Some subjects lend themselves very heavily towards either qualitative or quantitative research methods. For example, papers in physics will usually use quantitative research methods. Alter...

Visual Aids and Productivity tools

Image
Introduction As discussed in the previous blog post, my project is to create a Web scraper to search for recipes and filter them for allergen groups. As part of the research, I want to compare the performance of different web scraping techniques and the performance of different languages. This is quite a lot of deliverables, to begin with. Not only that, but I also want to write an extensive write-up of my development experience and my results. Furthermore, I also know that the project might change. I don’t know, for example, how successful my bots will be without building them. One reason for this would be anti-bot validation. Certain methods may be more effective if this is very prominent, which may be a whole avenue for research. As a result, project management, especially time management, will be essential for finishing this project to any degree of success. In this blog post, I will compare a few tools and charts I have found and how effective they may be. My usual approach ...

My Research Project

Image
 Introduction As part of my master’s course at Sunderland University, I have been allowed to undertake a research project. The topic I have decided to research is something called Web-scraping and applying it to searching and filtering for recipes. In this blog post, I will explain what Web-scraping is, what research there is about it, and what research is still to be done. What is Web-scraping? Web scraping is the name of various techniques to extract information from web pages. Even someone manually copying and pasting content from websites has been described as web scraping. Usually, though, web scraping refers to building some software that can do this process automatically. Indeed any situation where someone or an organisation wants to extract information that is either difficult or tedious to collect or so large that it needs to be automated could apply web scraping. This includes researchers, business, finance, media, etc. The specific types of web scraping depend on t...