Masala to Metrics: Predicting Calories
Name: Anjika Jain & Riya Bhivare
Email: anjika@umich & rbhiv@umich.edu
Website Link: https://anjikajain1.github.io/masala-to-metrics/
Introduction
Background
The dataset we chose was Recipes and Ratings. These 2 datasets were scrapped from food.com, a popular online platform for sharing a discovering new recipes. The datasets we worked with were divided in two main components.
RAW_recipes.csv: Contains all details about recipes, including preparation time, number of steps in recipe, and nutritional information.
RAW_interactions.csv: Includes user reviews and ratings for recipes found in RAW_recipes.csv
Question: What are the different aspects that could impact the calorie content of recipes?
While keeping this question as our focus point we analyzed many properties revolving around calories, such as the distribution of calories in the recipes given and distribution of different nutritional information as well.
This question was interesting to us, because coming from two south asian households, we have seen that it can be difficult to track calories in some of the recipes that are often meals in our culture. Understanding calorie content can help user cook recipes that meet their dietary goals. It can also help to give food.com’s recommendation system to filter by calories considerations.
DataSet
Rows in RAW_recipes: 83,782 Rows in RAW_interactions: 731,927
Columns in RAW_recipes:
- `rating`: rating given for recipe
- `review`: review given for recipe
Columns in RAW_recipes:
name
: Recipe nameminutes
: Amount of time to prepare the recipe Categories and words assgined to recipes, such as “vegetarian”Nutrition
: Nutrition information in the form [calories (#), total fat (PDV), sugar (PDV), sodium (PDV), protein (PDV), saturated fat (PDV), carbohydrates (PDV)]; PDV stands for “percentage of daily value”n_steps
: number of steps in recipedescription
: user description of recipe
Data Cleaning and Exploratory Data Analysis
Data Cleaning
Our team performed various steps in the data cleaning process to ensure our dataset was ready to be analyzed:
First, we performed a left merge between Recipes and Interactions on the recipe ID in order to bridge the two datasets.
Then, we noted that user ratings
of 0 existed and so, we replaced them with NaN so the average rating calculations are not skewed and there is not a downward bias.
Next, our team found the average rating for each recipe as a series and added it as a column in order to gain a better understanding of the overall merged dataset.
Finally, one of the major data cleaning steps we took involved the nutrition
column. Our team distributed the lists within the rows in order to seperate aspects of nutrition
such as calories, total_fat, sugar, sodium, protein, saturated_fat and carbohydrates.
Below we've shown the `head` of the cleaned Dataframe.
name | id | minutes | contributor_id | submitted | tags | n_steps | steps | description | ingredients | n_ingredients | user_id | recipe_id | date | rating | review | avg_rating | calories | total_fat | sugar | sodium | protein | saturated_fat | carbohydrates |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 brownies in the world best ever | 333281 | 40 | 985201 | 2008-10-27 | ['60-minutes-or-less', 'time-to-make', 'course', 'main-ingredient', 'preparation', 'for-large-groups', 'desserts', 'lunch', 'snacks', 'cookies-and-brownies', 'chocolate', 'bar-cookies', 'brownies', 'number-of-servings'] | 10 | ['heat the oven to 350f and arrange the rack in the middle', 'line an 8-by-8-inch glass baking dish with aluminum foil', 'combine chocolate and butter in a medium saucepan and cook over medium-low heat , stirring frequently , until evenly melted', 'remove from heat and let cool to room temperature', 'combine eggs , sugar , cocoa powder , vanilla extract , espresso , and salt in a large bowl and briefly stir until just evenly incorporated', 'add cooled chocolate and mix until uniform in color', 'add flour and stir until just incorporated', 'transfer batter to the prepared baking dish', 'bake until a tester inserted in the center of the brownies comes out clean , about 25 to 30 minutes', 'remove from the oven and cool completely before cutting'] | these are the most; chocolatey, moist, rich, dense, fudgy, delicious brownies that you'll ever make.....sereiously! there's no doubt that these will be your fav brownies ever for you can add things to them or make them plain.....either way they're pure heaven! | ['bittersweet chocolate', 'unsalted butter', 'eggs', 'granulated sugar', 'unsweetened cocoa powder', 'vanilla extract', 'brewed espresso', 'kosher salt', 'all-purpose flour'] | 9 | 386585.0 | 333281.0 | 2008-11-19 | 4.0 | These were pretty good, but took forever to bake. I would send it ended up being almost an hour! Even then, the brownies stuck to the foil, and were on the overly moist side and not easy to cut. They did taste quite rich, though! Made for My 3 Chefs. | 4.0 | 138.4 | 10.0 | 50.0 | 3.0 | 3.0 | 19.0 | 6.0 |
1 in canada chocolate chip cookies | 453467 | 45 | 1848091 | 2011-04-11 | ['60-minutes-or-less', 'time-to-make', 'cuisine', 'preparation', 'north-american', 'for-large-groups', 'canadian', 'british-columbian', 'number-of-servings'] | 12 | ['pre-heat oven the 350 degrees f', 'in a mixing bowl , sift together the flours and baking powder', 'set aside', 'in another mixing bowl , blend together the sugars , margarine , and salt until light and fluffy', 'add the eggs , water , and vanilla to the margarine / sugar mixture and mix together until well combined', 'add in the flour mixture to the wet ingredients and blend until combined', 'scrape down the sides of the bowl and add the chocolate chips', 'mix until combined', 'scrape down the sides to the bowl again', 'using an ice cream scoop , scoop evenly rounded balls of dough and place of cookie sheet about 1 - 2 inches apart to allow for spreading during baking', 'bake for 10 - 15 minutes or until golden brown on the outside and soft & chewy in the center', 'serve hot and enjoy !'] | this is the recipe that we use at my school cafeteria for chocolate chip cookies. they must be the best chocolate chip cookies i have ever had! if you don't have margarine or don't like it, then just use butter (softened) instead. | ['white sugar', 'brown sugar', 'salt', 'margarine', 'eggs', 'vanilla', 'water', 'all-purpose flour', 'whole wheat flour', 'baking soda', 'chocolate chips'] | 11 | 424680.0 | 453467.0 | 2012-01-26 | 5.0 | Originally I was gonna cut the recipe in half (just the 2 of us here), but then we had a park-wide yard sale, & I made the whole batch & used them as enticements for potential buyers ~ what the hey, a free cookie as delicious as these are, definitely works its magic! Will be making these again, for sure! Thanks for posting the recipe! | 5.0 | 595.1 | 46.0 | 211.0 | 22.0 | 13.0 | 51.0 | 26.0 |
412 broccoli casserole | 306168 | 40 | 50969 | 2008-05-30 | ['60-minutes-or-less', 'time-to-make', 'course', 'main-ingredient', 'preparation', 'side-dishes', 'vegetables', 'easy', 'beginner-cook', 'broccoli'] | 6 | ['preheat oven to 350 degrees', 'spray a 2 quart baking dish with cooking spray , set aside', 'in a large bowl mix together broccoli , soup , one cup of cheese , garlic powder , pepper , salt , milk , 1 cup of french onions , and soy sauce', 'pour into baking dish , sprinkle remaining cheese over top', 'bake for 25 minutes or until cheese is lightly browned', 'sprinkle with rest of french fried onions and bake until onions are browned and cheese is bubbly , about 10 more minutes'] | since there are already 411 recipes for broccoli casserole posted to "zaar" ,i decided to call this one #412 broccoli casserole.i don't think there are any like this one in the database. i based this one on the famous "green bean casserole" from campbell's soup. but i think mine is better since i don't like cream of mushroom soup.submitted to "zaar" on may 28th,2008 | ['frozen broccoli cuts', 'cream of chicken soup', 'sharp cheddar cheese', 'garlic powder', 'ground black pepper', 'salt', 'milk', 'soy sauce', 'french-fried onions'] | 9 | 29782.0 | 306168.0 | 2008-12-31 | 5.0 | This was one of the best broccoli casseroles that I have ever made. I made my own chicken soup for this recipe. I was a bit worried about the tsp of soy sauce but it gave the casserole the best flavor. YUM! \nThe photos you took (shapeweaver) inspired me to make this recipe and it actually does look just like them when it comes out of the oven. \nThanks so much for sharing your recipe shapeweaver. It was wonderful! Going into my family's favorite Zaar cookbook :) | 5.0 | 194.8 | 20.0 | 6.0 | 32.0 | 22.0 | 36.0 | 3.0 |
412 broccoli casserole | 306168 | 40 | 50969 | 2008-05-30 | ['60-minutes-or-less', 'time-to-make', 'course', 'main-ingredient', 'preparation', 'side-dishes', 'vegetables', 'easy', 'beginner-cook', 'broccoli'] | 6 | ['preheat oven to 350 degrees', 'spray a 2 quart baking dish with cooking spray , set aside', 'in a large bowl mix together broccoli , soup , one cup of cheese , garlic powder , pepper , salt , milk , 1 cup of french onions , and soy sauce', 'pour into baking dish , sprinkle remaining cheese over top', 'bake for 25 minutes or until cheese is lightly browned', 'sprinkle with rest of french fried onions and bake until onions are browned and cheese is bubbly , about 10 more minutes'] | since there are already 411 recipes for broccoli casserole posted to "zaar" ,i decided to call this one #412 broccoli casserole.i don't think there are any like this one in the database. i based this one on the famous "green bean casserole" from campbell's soup. but i think mine is better since i don't like cream of mushroom soup.submitted to "zaar" on may 28th,2008 | ['frozen broccoli cuts', 'cream of chicken soup', 'sharp cheddar cheese', 'garlic powder', 'ground black pepper', 'salt', 'milk', 'soy sauce', 'french-fried onions'] | 9 | 1196280.0 | 306168.0 | 2009-04-13 | 5.0 | I made this for my son's first birthday party this weekend. Our guests INHALED it! Everyone kept saying how delicious it was. I was I could have gotten to try it. | 5.0 | 194.8 | 20.0 | 6.0 | 32.0 | 22.0 | 36.0 | 3.0 |
412 broccoli casserole | 306168 | 40 | 50969 | 2008-05-30 | ['60-minutes-or-less', 'time-to-make', 'course', 'main-ingredient', 'preparation', 'side-dishes', 'vegetables', 'easy', 'beginner-cook', 'broccoli'] | 6 | ['preheat oven to 350 degrees', 'spray a 2 quart baking dish with cooking spray , set aside', 'in a large bowl mix together broccoli , soup , one cup of cheese , garlic powder , pepper , salt , milk , 1 cup of french onions , and soy sauce', 'pour into baking dish , sprinkle remaining cheese over top', 'bake for 25 minutes or until cheese is lightly browned', 'sprinkle with rest of french fried onions and bake until onions are browned and cheese is bubbly , about 10 more minutes'] | since there are already 411 recipes for broccoli casserole posted to "zaar" ,i decided to call this one #412 broccoli casserole.i don't think there are any like this one in the database. i based this one on the famous "green bean casserole" from campbell's soup. but i think mine is better since i don't like cream of mushroom soup.submitted to "zaar" on may 28th,2008 | ['frozen broccoli cuts', 'cream of chicken soup', 'sharp cheddar cheese', 'garlic powder', 'ground black pepper', 'salt', 'milk', 'soy sauce', 'french-fried onions'] | 9 | 768828.0 | 306168.0 | 2013-08-02 | 5.0 | Loved this. Be sure to completely thaw the broccoli. I didn't and it didn't get done in time specified. Just cooked it a little longer though and it was perfect. Thanks Chef. | 5.0 | 194.8 | 20.0 | 6.0 | 32.0 | 22.0 | 36.0 | 3.0 |
412 broccoli casserole | 306168 | 40 | 50969 | 2008-05-30 | ['60-minutes-or-less', 'time-to-make', 'course', 'main-ingredient', 'preparation', 'side-dishes', 'vegetables', 'easy', 'beginner-cook', 'broccoli'] | 6 | ['preheat oven to 350 degrees', 'spray a 2 quart baking dish with cooking spray , set aside', 'in a large bowl mix together broccoli , soup , one cup of cheese , garlic powder , pepper , salt , milk , 1 cup of french onions , and soy sauce', 'pour into baking dish , sprinkle remaining cheese over top', 'bake for 25 minutes or until cheese is lightly browned', 'sprinkle with rest of french fried onions and bake until onions are browned and cheese is bubbly , about 10 more minutes'] | since there are already 411 recipes for broccoli casserole posted to "zaar" ,i decided to call this one #412 broccoli casserole.i don't think there are any like this one in the database. i based this one on the famous "green bean casserole" from campbell's soup. but i think mine is better since i don't like cream of mushroom soup.submitted to "zaar" on may 28th,2008 | ['frozen broccoli cuts', 'cream of chicken soup', 'sharp cheddar cheese', 'garlic powder', 'ground black pepper', 'salt', 'milk', 'soy sauce', 'french-fried onions'] | 9 | 520830.0 | 306168.0 | 2017-10-17 | 5.0 | 5 stars from my husband and son, my toughest critics. I used a 10-oz bag of chopped broccoli and a 10-oz bag of flowerettes which gave it more texture. Very good flavor and the smell while cooking was great. The sauce held it together without overwhelming the broccoli. | 5.0 | 194.8 | 20.0 | 6.0 | 32.0 | 22.0 | 36.0 | 3.0 |
millionaire pound cake | 286009 | 120 | 461724 | 2008-02-12 | ['time-to-make', 'course', 'cuisine', 'preparation', 'occasion', 'north-american', 'desserts', 'american', 'southern-united-states', 'dinner-party', 'holiday-event', 'cakes', 'dietary', 'christmas', 'thanksgiving', 'low-sodium', 'low-in-something', 'taste-mood', 'sweet', '4-hours-or-less'] | 7 | ['freheat the oven to 300 degrees', 'grease a 10-inch tube pan with butter , dust the bottom and sides with flour , and set aside', 'in a large mixing bowl , cream the butter and sugar with an electric mixer and add the eggs one at a time , beating after each addition', 'alternately add the flour and milk , stirring till the batter is smooth', 'add the two extracts and stir till well blended', 'scrape the batter into the prepared pan and bake till a cake tester or knife blade inserted in the center comes out clean , about 1 1 / 2 hours', 'cool the cake in the pan on a rack for 5 minutes , then turn it out on the rack to cool completely'] | why a millionaire pound cake? because it's super rich! this scrumptious cake is the pride of an elderly belle from jackson, mississippi. the recipe comes from "the glory of southern cooking" by james villas. | ['butter', 'sugar', 'eggs', 'all-purpose flour', 'whole milk', 'pure vanilla extract', 'almond extract'] | 7 | 813055.0 | 286009.0 | 2008-04-09 | 5.0 | don't let the calories and fat grams scare you off. This is a wonderful recipe and is perfect for the summer cook-out topped with fresh berries! It will make you proud. This is meant to be shared! | 5.0 | 878.3 | 63.0 | 326.0 | 13.0 | 20.0 | 123.0 | 39.0 |
2000 meatloaf | 475785 | 90 | 2202916 | 2012-03-06 | ['time-to-make', 'course', 'main-ingredient', 'preparation', 'main-dish', 'potatoes', 'vegetables', '4-hours-or-less', 'meatloaf', 'simply-potatoes2'] | 17 | ['pan fry bacon , and set aside on a paper towel to absorb excess grease', 'mince yellow onion , red bell pepper , and add to your mixing bowl', 'chop garlic and set aside', 'put 1tbsp olive oil into a saut pan , along with chopped garlic , teaspoons white pepper and a pinch of kosher salt', 'bring to a medium heat to sweat your garlic', 'preheat oven to 350f', 'coarsely chop your baby spinach add to your heated pan , stir frequently for approximately 5 min to wilt', 'add your spinach to the mixing bowl', 'chop your now cooled bacon , and add it to the mixing bowl', 'add your meatloaf mix to the bowl , with one egg and mix till thoroughly combined', 'add your goat cheese , one egg , 1 / 8 tsp white pepper and 1 / 8 tsp of kosher salt and mix till thoroughly combined', 'transfer to a 9x5 meatloaf pan , and cook for 60 min or until the internal temperature is at least 160f', 'let stand for 5min', 'melt 1tbsp unsalted butter into a frying pan , and cook up to three eggs at a time', 'crack each egg into a separate dish , in order to prevent egg shells from reaching the pan , then add salt and pepper to taste', 'wait until the egg whites are firm looking , but slightly runny on top before flipping your eggs', 'after flipping , wait 10~20 seconds before removing each egg and placing it over your slices of meatloaf'] | ready, set, cook! special edition contest entry: a mediterranean flavor inspired meatloaf dish. featuring: simply potatoes - shredded hash browns, egg, bacon, spinach, red bell pepper, and goat cheese. | ['meatloaf mixture', 'unsmoked bacon', 'goat cheese', 'unsalted butter', 'eggs', 'baby spinach', 'yellow onion', 'red bell pepper', 'simply potatoes shredded hash browns', 'fresh garlic', 'kosher salt', 'white pepper', 'olive oil'] | 13 | 2204364.0 | 475785.0 | 2012-03-07 | 5.0 | Delicious!!!!! -- the goat cheese made the difference. My new favorite meatloaf. | 5.0 | 267.0 | 30.0 | 12.0 | 12.0 | 29.0 | 48.0 | 2.0 |
2000 meatloaf | 475785 | 90 | 2202916 | 2012-03-06 | ['time-to-make', 'course', 'main-ingredient', 'preparation', 'main-dish', 'potatoes', 'vegetables', '4-hours-or-less', 'meatloaf', 'simply-potatoes2'] | 17 | ['pan fry bacon , and set aside on a paper towel to absorb excess grease', 'mince yellow onion , red bell pepper , and add to your mixing bowl', 'chop garlic and set aside', 'put 1tbsp olive oil into a saut pan , along with chopped garlic , teaspoons white pepper and a pinch of kosher salt', 'bring to a medium heat to sweat your garlic', 'preheat oven to 350f', 'coarsely chop your baby spinach add to your heated pan , stir frequently for approximately 5 min to wilt', 'add your spinach to the mixing bowl', 'chop your now cooled bacon , and add it to the mixing bowl', 'add your meatloaf mix to the bowl , with one egg and mix till thoroughly combined', 'add your goat cheese , one egg , 1 / 8 tsp white pepper and 1 / 8 tsp of kosher salt and mix till thoroughly combined', 'transfer to a 9x5 meatloaf pan , and cook for 60 min or until the internal temperature is at least 160f', 'let stand for 5min', 'melt 1tbsp unsalted butter into a frying pan , and cook up to three eggs at a time', 'crack each egg into a separate dish , in order to prevent egg shells from reaching the pan , then add salt and pepper to taste', 'wait until the egg whites are firm looking , but slightly runny on top before flipping your eggs', 'after flipping , wait 10~20 seconds before removing each egg and placing it over your slices of meatloaf'] | ready, set, cook! special edition contest entry: a mediterranean flavor inspired meatloaf dish. featuring: simply potatoes - shredded hash browns, egg, bacon, spinach, red bell pepper, and goat cheese. | ['meatloaf mixture', 'unsmoked bacon', 'goat cheese', 'unsalted butter', 'eggs', 'baby spinach', 'yellow onion', 'red bell pepper', 'simply potatoes shredded hash browns', 'fresh garlic', 'kosher salt', 'white pepper', 'olive oil'] | 13 | 2216720.0 | 475785.0 | 2012-03-21 | 5.0 | What a fabulous recipe. I have a lot of friends who either love to cook, are cookbook authors, are on TV with a cooking show, or who have been featured on cooking shows, so I know a thing or two about cooking. I know, for instance that cooking offers up a continual stream of adventures that do not require a passport or long airline layovers. Cooking is creative, expressive and comforting. A form of open eyed meditation lifting one beyond the commonplace. I'm a vegetarian, but I love to visit other recipes for inspiration so that I can use them by adapting the meat ingredients and therefore adopting them into my favorite recipe file. All thumbs up for this recipe by an obviously gifted, dedicated and creative cook! | 5.0 | 267.0 | 30.0 | 12.0 | 12.0 | 29.0 | 48.0 | 2.0 |
5 tacos | 500166 | 20 | 2549237 | 2013-05-13 | ['weeknight', '30-minutes-or-less', 'time-to-make', 'course', 'main-ingredient', 'preparation', 'occasion', 'main-dish', 'beef', 'vegetables', 'easy', 'diabetic', 'dinner-party', 'kid-friendly', 'stove-top', 'dietary', 'comfort-food', 'inexpensive', 'ground-beef', 'meat', 'greens', 'lettuces', 'tomatoes', 'taste-mood', 'equipment', '3-steps-or-less'] | 5 | ['cook meat', 'add taco seasoning', 'place meat into taco shells / tortillas', 'top with tomatoes , onions , lettuce , salsa and cheese', 'boil corn cobs 5-7 minutes'] | costs about $5.00 to make. | ['ground beef', 'taco seasoning', 'taco shells', 'lettuce', 'tomatoes', 'onion', 'salsa', 'cheddar cheese', 'corn cobs'] | 9 | 369715.0 | 500166.0 | 2013-06-13 | 4.0 | I doubled the recipe for my family but used two pounds of meat instead of 1.5 pounds. I followed the recipe except we did not use the onions and topped them with sour cream. I also didn't make the corn. We all enjoyed these. | 4.0 | 249.4 | 26.0 | 4.0 | 6.0 | 39.0 | 39.0 | 0.0 |
Univariate Analysis
Explanation: From this histogram we see that the majority of recipes have calorie counts under 1000, and it is right skewed. It suggests that most recipes are relatively moderate in calories, and there are less outliers implying that individuals tend to not share recipes higher in calories.
Bivariate Analysis
Explanation: From this scatter plot we see how most recipes are around 10,000 calories even as the number of ingrediants increases showcasing how ingrediants aren’t may not be majorly relevant to the number of calories in a recip e. However, we also noticed how in a few cases as ingrediants increase the calorie count reduces which seemed not as intuitive for our team.
Interesting Aggregates
prep_time_group | calories | total_fat | sugar | sodium | protein | saturated_fat | carbohydrates |
---|---|---|---|---|---|---|---|
0–15 min | 301.65 | 23.45 | 66.56 | 27.42 | 16.57 | 27.07 | 10.09 |
16–30 min | 369.7 | 27.96 | 45.72 | 23.77 | 31.31 | 33.47 | 11.42 |
31–60 min | 432.15 | 33.04 | 60.68 | 27.09 | 34.56 | 42.44 | 13.67 |
61–120 min | 564.28 | 43.7 | 95.62 | 32.61 | 40.69 | 56.51 | 18.51 |
120+ min | 547.51 | 39.85 | 68.84 | 47.69 | 57.29 | 48.27 | 15.66 |
Significance: We wanted to build this pivot table in order to do some exploratory data analysis. We wanted to uncover interesting patters or insights in our datasets. From this we could see if calories were impacted by how long something needed to cook. A lot of south asian dishes require extended prep times so we wanted to see if general prep time of many different recipes had any impact on calories. We didn’t see any inherent patterns but we saw that sodium content was pretty high.
Imputation
Prior to imputation there were missing values in name
, description
,user_id
, recipe_id
, date
, rating
, review
and avg_rating
. This presented as opportunities for our team to impute these values. We decided tat iconstant mputation was only necessary fror name
, description
,user_id
, recipe_id
, date
, and review
. This is mainly because of the nature of the variables as textual (categ)rical variables filling them in with text indicating “No ___ were provided.” would not change the data in any fundamental way and for our analysis they weren’t very relevant.
We decided to not impute rating
and avg_rating
as these values were integral to our analysis and numerical. If we were to impute these values through mean imputation it would not be an accurate representation of the inputted values by the reviwer. This would likely introduce bias and change overall data analysis in an unpredictable way.
Framing a Prediction Problem
Prediction Problem
We are aiming to predict the calorie content of a recipe based on it’s nutritional components (example: total fat, sugar, carbohydrates, etc.) We chose to use nutritional components as our features as this information is available to us from the dataset itself.
Problem Type
This is a regression problem because the target, calories, is a continuous numeric value.
Response Variable
The response variable we chose is calories. We chose calories because it is an important indicator for indivudals who are tracking their diets for health, fitness, or medical reasons. Understanding how different nutritional components impact calories can give a more hollistic view to eating healthier. We also chose this as oftentimes south asian food has a lot of hidden calories and through these prediction models we hoped to see healthier recipes as people add their recipes and its nutritional components.
Evaluation Metric
We will be using Mean Absolute Error (MAE), to evaluate our model. It will tell us, on average, how many calories our model’s predictions are off by. MAE treats all errors equally.
We will also use Mean Square Error (MSE) as it provides an overall sense of prediction error, but will keep in mind that it is weighted more towards large error.
We will not look at the R^2 score as it isn’t intiuative for users, and provides a value which is unitless.
Baseline Model
Model Description and Features
We built a baseline linear regression model to predict the number of calories in recipe based on two nutritional components (carbohydrates, total fat).
Features
Quantative Features
carbohydrates
: Sugar content in Percent Daily Value (PDV%)
</br>
total_fat
: Total Fat in Percent Daily Value (PDV%)
Nominal Features
None
Ordinal Features
None
Response Varaible
calories
: Calorie content, measured as a continous value
Preprocessing Steps
We defined a sub-pipeline for numerical features, to use SimpleImputer to fill in missing values a replace them with the mean value in each column.
We evaluated the model on a test set of 20% of the data.
Model Performance
MSE Value: 9719.29 (RSME: 98.5) Interpretation: This measures the average square difference between predicted and actual calorie values. Because the errors are squared it may have penalized larger error more.
MAE Value: 55.75 This means our baseline model’s calorie predictions are off by an average of 55.75 calories.
Is This a Good Model?
Based on MAE value of 55.75, if our recipes on average have hundreds of calories being off by 56 could be okay for some users who just want a sort of general understanding of how calorie dense their meals are. However, for users who are looking for accurate predictions an average of 56 calories could be a large issue for them. A MSE value of 9719 is very high, suggesting that the errors that are occuring must be large. Since we use only two features (total_fat + carbohydrates), it is probably not highlighting how all the different nurtrional components (protein, sugar, etc.) impact the calories. This is an okay baseline mode, but not a great model, as we are not super precise.
Final Model
We introduce two engineered features to capture better predictions:
sugar_to_protein_ratio
: The ratio captures the ratio between sugar and protein in a recipe. High sugar to protein ratios indicate calorie-dense dessets, while lower ratio’s are more protein-rich and possibly more healthier. This ratio might signify more than just taking sugar and protein.
total_macro_sum
: This is the sum of six of the nutrional component (total fat, sugar, sodium, protein, statured_fat, carbohydrates). All calories come directly from the nutrional components. This feature acts as a overall proxy for nutrient density.
Quantative Features
carbohydrates
: Sugar content in Percent Daily Value (PDV%)
total_fat
: Total Fat in Percent Daily Value (PDV%)
Engineered Features
sugar_to_protein_ration
: The ration of sugar to protein in a recipe
</br>
total_macro_sum
: The sum of six key nutrient components
Nominal Features
None
Ordinal Features
None
Response Varaible
calories
: Calorie content, measured as a continous value
Modeling Algorithm
We used a Random Forest Regressor, choosen for it’s ability to be robust to outliers and irrelevant features, and capture non-linear relationships between nutrients and calories.
Hyperparameter Tuning
When tuning our Random Forest Regressor we wanted to choose hyperparameters that are high-impact and worth tunning. We chose these following hyperparamters to tune:
- n_estimators - Numbers of trees (more trees means better performance)
- max_depth - Maximum depth of each tree (too depth means overfitting of model and too shallow mean underfitting)
Best Hyperparamters
- n_estimators: 100
- max_depth: None
Our model performs best with 100 grown trees allowing it to capture complex patterns in the data. The no limit on the tree depth, allows for the model to learn complex relationships.
Model's Performance and Evaluation
MSE: 5336.18
</br>
MAE: 21.37
MAE dropped by ~62%, and our model on average is only off by about 21 calories. The descrease in the MSE score (~47% lower) also shows that the model is doing better on outliers. The final model captures general patterns.