Data Science

Biscuit Dunking Data Analysis

How do you dunk your biscuit? Everyone seems to have an opinion, sometimes a strong one. Well it turns out that science does too, the Washburn equation for capillary action, has been shown to have great applicability to biscuits! This analysis presents a comprehensive description of biscuit dunking data to distinguish and predict brand of biscuit, as well as soaking time using physical data.

Biscuit dunking is an essential aspect of British culture, whether you’re dunking chocolate coated biscuits, oat, or digestives, everyone is guilty of the occasional dunk. This can lead even to the point of rivalry, betweeen the one second dunkers and the die hard, wait as long as you can without the biscuit crumbling camp. An investigation into McVities biscuit dunking data has been conducted to clear the air on the difference between three types of biscuits: Digestives, Hobnobs, and Rich Tea.

Project Objectives

This investigation aimed to explore the physical differences between biscuits through empirical data collected from dunking experiments. It analyzed the applicability of capillary flow action to biscuits, examining the relationship between the pore radius and capillary flow rate during dunking. This translated into insights on how tea travels up the different types of biscuit. Machine Learning methods were investigated for the classication of biscuit type, as well as for modelling the pore radius of the biscuits.

Background

Capillary action:

The primary focus of this study was the relationship between the pore radius of the biscuits and their ability to absorb liquid, which is governed by capillary action. This is the ability of a liquid to flow in narrow spaces, even against gravity, and it is heavily influenced by the physical properties of the material - in this case, biscuits. The Washburn equation, a fundemental model of capillary flow action, was found to give a good estimate of the pore radius from the available data, therefore this was used as a starting point as a pore radius predictor

Biscuit Classification:

Biscuit classification was attempted with and without the pore radius, finding greatly improved classification with the pore radius included. Both Supervised and Unsupervised methods were used, K-means clustering was attempted, however it was found to misinterpret Rich Tea for Hobnobs, and vice versa. A tuned Random Forest Classifier was found to be the best supervised predictor of biscuit type, with a 93.5 % minimum accuracy for Hobnobs, ranging to 100 % accuracy for Digestive biscuits

Regression

A regression analysis was performed to find the biscuit absorption rates, resolved for both short and long timescales. The classification algorithm was applied to absorption data to classify the absorption curves according to biscuit properties. The resulting regression model was used in combination with the classifier to give an approximation of biscuit type and absorption properties based on physical biscuit properties. Most importantly, this combined approach allows the classification of biscuits without the measurement of pore radius, our most significant classifying variable, saving great laboratory costs.

Results

Performed exploratory data analysis (EDA) to clean and interpret experimental biscuit dunking data.
Used Regression models, such as the Washburn model, Kernel Ridge Regression, and Random Forest Regression to predict the pore size of biscuits.
Trained a Random Forest Classifier with an average 96 % accuracy across biscuit types.

Other Projects

Konstantin N 2024-08-04 PROJECTS
Python Jupyter Notebooks Scikit-learn Machine Learning