When it comes to healthy food, many may associate it with boiling, steaming, or baking food, without salt, fat, and - alas - without taste ! That is why the very thought of healthy eating can make some of us upset. However, we have good news for those who struggle: data science can also help us make healthy food tasty.
Food pairing is the method which associates ingredients together using information about their flavors (Ahn 2011). The flavor of food is generated by combinations of molecules that can be sensed by our smell and taste receptors : the flavor molecules. The set of flavor molecules and their content are unique for each food, which results in a unique flavor profile.
On the other end, an empirical knowledge about good food associations can be found in the huge list of recipes that exist nowadays : a compilation of what men and women have build through ages, traditions and the innovations of some adventurous cooks.
Using flavor profiles of different foods, and the empirical knowledge of recipes, we can predict whether a given food combination is likely to be tasty. This is where the power of Machine Learning comes into use. The idea behind this project is to build a food pairing recommender system using conventional ML. This recommender system will give us the list of the ingredients which are likely to pair well with something we want to cook.
Throughout the story of the recommender system development, we will try to find good matches for broccoli. Because, for sure, everyone knows that there is nothing worse than simply boiled broccoli !
Overview of the solution
The recommender system (RECS) we build for food pairing is built on two algorithms:
Pointwise Mutual Information (PMI), which we use to find good association of ingredients in existing cooking recipes.
K-Nearest Neighbors (KNN) to evaluate the similarity of ingredients according to their flavor profile.
Then, generating the recommendations for a given food (let's call it the query food, e.g, broccoli) can be done by combining two approaches (Fig. 1). With the “forward” pass, we first look for the good pairs for broccoli, and then find foods similar to those pairs. With the “reverse” pass, we firstly look for foods similar to broccoli and then find pairs for these foods.
Finally, we can combine the results of both approaches to obtain our recommender system.
Fig. 1 Scheme of the food pairing recommender system : forward pass algorithm in blue arrows, reverse pass algorithm in red arrows.
In some cases, we may lack flavor information for a given ingredient, or we may not find recipes that use this ingredient, but combining the two approaches allows us to formulate recommendations in any case.
PMI: from recipes to pairings
Our first objective is to extract knowledge on food pairing from cooking recipes. Thanks to these data, we can find the best pairs for some foods using pointwise mutual information value.
We use two recipe data sets, both related to Flavor Network (Ahn 2011). These two datasets originally had 71 908 recipes of dishes from different regions, mostly from North America. To ease our evaluation of the recommendations, we chose to drop Asian and African recipes, which may contain rare ingredients, and also drop alcohol-containing recipes. In the end, we get a set of 58 416 recipes.
Pointwise Mutual Information is one of the famous NLP methods used to measure the relevance of two words occurring together. It relies on the comparison of marginal probabilities of occurrence of two words with their joint probability. If the occurrences of two words are independent events, the product of their marginal probabilities equals their joint probability, and their PMI equals zero.
We calculate P(A) and P(B), the probabilities of A and B food names occurrence, and the probability of their co-occurrence P(A, B). From these values, we compute PMI(A, B) for each pair of ingredients in the recipes.
# P(A) is counts(A) / num of any ingredient occurrence in all recipes
# P(A, B) is coocs(A, B) / num of any pair of ingredients co-occurrence in all recipes
p_a = pmi_df.a_count / sum(ing_count.values())
p_b = pmi_df.b_count / sum(ing_count.values())
p_a_b = pmi_df.ab_count / sum(cooc_counts.values())
pmi_df['pmi'] = np.log(p_a_b / (p_a * p_b))
KNN: flavor profiles similarity
In the second module of our recommender engine, we look for ingredients similar to the query one. We do this with a KNN model that finds the neighbors using their flavor profiles similarity.
First of all, we need to prepare our data. We want our ingredients to be represented by vectors of flavor compounds, so that we can compare these vectors and measure their similarity. We thus create a DataFrame with lists of flavor compounds for each food, and then encode them using boolean. As a result, each food is represented by a boolean vector with a length of 1107.
binarizer = MultiLabelBinarizer(classes=range(n_compounds))
encoded_ingredients = pd.DataFrame(
binarizer.fit_transform(flavors_df.compounds),
index=flavors_df.index)
Next, we use the flavor composition of each food represented by boolean vectors to build the KNN model. This model measures distances between foods using the Jaccard similarity between their binary flavor profiles. The Jaccard similarity is calculated with the following formula :
M11 represents the number of common flavor molecules for two foods.
M01 and M10 represent the number of flavor molecules different in two foods.
Combining KNN similarity with PMI
We calculate the resulting rating for each food in the list obtained by KNN with the combination of two values. We take PMIs with the query food, normalize them by min-max algorithm, and multiply it by the value of Jaccard similarity (which is 1 - distance). For instance, for rating feta cheese versus broccoli, we calculate its rating using PMI for cheddar and Jaccard similarity between these two cheeses:
So, what should we cook with broccoli ?
Brocoli happens to be present in both our recipe and flavor dataset, so our recommendation will combine results from both passes.
1. Forward pass
With the forward recommender system algorithm, on our first step, we got 10 foods with the highest PMI for broccoli. In this list, we can see macaroni, cheese, mushrooms, ham, and cashew.
Food PMI
cauliflower 3.565046
carrot 1.897348
macaroni 1.860217
cheddar_cheese 1.850735
swiss_cheese 1.806323
mushroom 1.794811
cashew 1.779868
sesame_oil 1.749150
ham 1.690183
brown_rice 1.676568
This sounds good, but we want more ingredients to combine with. We will search for ingredients that are similar to these using KNN to expand the recommendations list. With the KNN model, we found that other cheeses, like cottage, cream, swiss, and feta, are neighbors of cheddar cheese, so they may be good to pair with broccoli as well.
Similar food Jaccard distance
0 cheddar_cheese 0.000000
1 cottage_cheese 0.132450
2 cream_cheese 0.139073
3 swiss_cheese 0.150000
4 feta_cheese 0.151316
5 tilsit_cheese 0.152318
For each item from the list of foods with top PMI(food, broccoli) values, we get 4 nearest neighbors, and the resulting recommendations list had 50 items. Here are the top 30 of them having the highest resulting ratings (Fig. 2).
Fig. 2 Food pairs for broccoli generated with the forward RECS
2. Reverse pass
With the reverse recommender algorithm, we firstly found nine foods similar to broccoli. Mostly, we see that they are Brassica species like cauliflower, radish, and brussels sprout, but the distance between the broccoli and other instances is quite large (> 0.6). This large difference will affect the accuracy of the recommendations generated with the reverse approach and lowers the resulting rating values.
Similar food Jaccard distance
0 broccoli 0.000000
1 cauliflower 0.600000
2 radish 0.809524
3 raw_radish 0.818182
4 sweet_potato 0.823529
5 parsley 0.827586
6 mustard 0.827586
7 dried_parsley 0.833333
8 brussels_sprout 0.842105
9 kohlrabi 0.857143
Next, we found ten pairs for each broccoli neighbor using PMI values, and combined similarities and PMI in the single rating. Here we present the top 30 foods recommended for broccoli by reverse approach with the highest resulting ratings (Fig. 3).
Fig. 3 Food pairs for broccoli generated with reverse RECS
3. Final (combined) recommendation
Finally, we combine the results of both forward and reverse recommender systems to generate the best pairs for broccoli with the benefits of each algorithm (Fig. 4).
Fig. 4 Food pairs for broccoli generated with combined RECS
Conclusions
To sum up, with this recommender system we are able to propose pairs for any ingredient, leveraging precious information contained in thousands of recipes and in chemical descriptions of food, even though this information is incomplete.
As for us, we think we will try some broccoli with cheddar cheese topped with sesame oil, as this looks promising... bon appétit !
--------------------------------
* Follow us on LinkedIn for next blog updates:
* Interested in our skills? Let's discuss your projects together:
or
* Our public Github repository:
--------------------------------
References
Flavor network and the principles of food pairing. Y.-Y. Ahn, S. Ahnert, J. P. Bagrow, and A.-L. Barabási. Scientific Reports 1, 196 (2011)
http://www.yongyeol.com/2011/12/15/paper-flavor-network.html
Comments