Writing an AI To Predict Ramen Quality:
You know what's great? Ramen! But you know what's less great? Soykaf ramen! so.... what if we made an AI that could predict the qualtiy of ramen before we buy it? That'd be pretty schway and I have nothing else to do this afternoon so it's happening.
Finding The Data:
Alright, like all AI we need to start with finding a dataset, luckily we've got one from here. This data set's got all sorts of information relating to the 5 star rating of the ramen dish in the final column.
So what we're left with is the Brand, Style, Country of Origin, and of course the actual ratings themselves.
Pre-Processing The Data:
So our data is very stringy as you can see, except for the actual rankings, all of it's given to us as a bunch of names and tags. There are two common ways to go about vectorizing data like this, Ordinal Encoding and One Hot Encoding.
In Ordinal Encoding we assign each of the recurring strings it's own number, and then replace all instances of that string with that number. So for example, if we have the vectors:
In One Hot Encoding each type of data is given it's own input vector, so instead of the first two codeblocks we'd have a vector that looks like this:
Training The Network Itself:
So we're just going to use the sklearn module for Support Vector Regression. We're going to use a 75% training data, 25% testing data split, and we're just gonna shove it down
Train it for whatever the default number of epochs is (I think it's one??).
Obviously you can use any number of regressional models for this problem, and you can put way more effort into them than I am, but I wanted to do this in a day and I didn't want to wait for a massive neural network to train.
Alright I know y'all are too lazy to download and run things yourself, so here's a demo of the program:
My Web Browser Sucks
Well, I trained it up and ran it over the test data. The average difference (just regular subtraction not mean-squared or anything funky) between reality and the prediction was ~0.58 stars, which is pretty good considering the limited data set and possible lack of a correlation. I might come back to this and do a more elaborate job some time later (because I'm certain I could crank out better prediction) but for now you can download the content below.
Links and Downloads:
>> Project Folder<<
>> Just The Pickle File (Model) <<
>> Just The Pickle File (Encoder) <<