Blog guest post mathematics wine recommendation wine reviews

Wineinformatics: Using Qualitative Measures to Answer Quantitative Questions – A Guest Post by David Morrison

The following is a visitor publish by David Morrison.  Morrison grew up in Australia. He ultimately acquired a PhD degree in plant biology, during which he turned fascinated with wine. He began visiting wineries throughout Australia, simply at the time when high-quality wineries have been beginning to growth. Someplace alongside the road he moved to Sweden, and in his semi-retirement he runs a blog referred to as The Wine Gourd ( Within the pursuits of doing something totally different to every different wine blogger, this blog delves into the world of wine knowledge, as an alternative. He’s notably within the relationship of wine high quality to worth, and quantifying value-for-money wines.

There is a fixed stream of dialogue about wine reviewers, and their activities. The wine critiques themselves, nevertheless, tend to include certainly one of only two issues: phrase descriptions and / or numerical scores. I have just lately listed a number of different approaches to reviewing wine (If not scores or phrases to describe wine quality, then what?), however none of these have actually caught on.

The primary concern with wine-quality scores is that they provide the illusion of being mathematical without having any helpful mathematical properties. That is unlucky, as a result of the obvious precision of the numbers provides an phantasm of accuracy. Probably helpful mathematical properties embrace objectivity and repeatability. Nevertheless, objectivity doesn’t apply even for supposedly goal and repeatable scoring schemes, like the one developed on the University of California Davis, because the scores are an amalgam of disconnected sub-scores. Nevertheless, the primary problem is lack of repeatability, particularly between tasters. This causes confusion, as a result of individuals naturally interpret numbers by comparing them to other numbers. In any case, 5 apples is 5 apples regardless of who counts them; but this easy mathematical property does not apply to wine-quality scores.

The primary concern with word descriptions of wine, however, is the anomaly of languages, compared to using a mathematical language (see my publish Are words better than numbers for describing wine high quality?).

Ambiguity creates various limitations, together with: imprecision, in order that we can’t evaluate the wines; non-uniformity, so that we can’t examine descriptions; and impracticality, due to flowery or pompous wording (eg. see The noble artwork of wine pretension).

One response to this example has been the development of formal tasting sheets, the place we simply tick the packing containers or choose from among a pre-defined listing of phrases; however professional wine commentators eschew such artifices.

This has lead some individuals to attempt
to quantitatively investigate the actual degree of uniformity and precision
among wine descriptions by skilled commentators. The one I will talk about
right here is:

Bernard Chen, Valentin Velchev, James Palmer and Travis Atkison (2018) Wineinformatics: a quantitative evaluation of wine reviewers. Fermentation four: 82. [] (PDF obtain)

This paper is fascinating
as a result of it represents an attempt to deal with a fairly tough drawback — turning
phrases into numbers.

The authors have chosen to give
their common strategy the brand new identify Wineinformatics, as a result of they’ve taken on
the challenging activity of making an attempt to quantitatively evaluate phrase descriptions of
wine. This kind of language processing is comparatively new — earlier work has
clearly targeted on mathematical analysis of numbers. This is the reason we developed
the mathematical language within the first place, because quantitative evaluation is
then comparatively straightforward. Making an attempt to extract quantitative info from a set of
words in any meaningful method is orders of magnitude harder.

The authors have been tackling
this drawback for the previous few years, applying numerous methods from the sector
of Knowledge Mining to wine sensory critiques. They’ve now produced a “brand-new
dataset that incorporates greater than 100,000 wine critiques”, derived from on-line
evaluations produced for the Wine Spectator
magazine. On this new paper, they’ve used this dataset “to quantitatively
consider the consistency of the Wine
Spectator and all of its main reviewers”.

That’s, do the reviewers use
the same words to describe wines that get the same quality scores?


The strategy to turning phrases into numbers is predicated on what
the authors call the  Computational Wine
Wheel. In essence, this tries to discover words held in widespread between the written
critiques, whereas bearing in mind that many phrases are almost synonyms.

The thought, then, is to group attainable synonyms together,
representing a single concept. For example, the expressions “high tannins”,
“full tannins”, “lush tannins” and “rich tannins” might all be grouped as Tannins_High.
All evaluations that include any of these individual expressions would then be
scored as possessing this single basic idea.

This strategy reduces the individually worded evaluations from
their unique type right into a standardized type. The top result is a binary matrix
displaying the presence or absence of every of the pre-defined concepts for every of
the critiques in the dataset — a number of the ideas are shown in the following
phrase cloud. The 107,107 evaluations in the dataset cowl the years 2006–2015,
together with all wines with a top quality score of 80+. There were 10 reviewers in the
dataset, though considered one of them (Gillian Sciaretta) had solely a small number of
evaluations (428) and was thus excluded from the evaluation.

The dataset was divided into two
classes: wines with a score of 90+ and people wines with 89- (wines that
score 90+ are thought-about as either “outstanding” or “classic” by Wine Spectator). The thought of the info
analysis was to discover out whether or not the 90+ wines have any consistency of their
descriptions (ie. the identical words are used) in contrast to the 89- wines. Amusingly,
for the only wine introduced as a labored instance, the wine rating sadly
modifications from 95 (earlier than processing) to 90 (after processing)!

Two totally different classification
approaches have been compared: naïve bayes, and help vector machine. The distinction
wants not concern us here, but the common strategy in each instances was to choose
a subset of the info (say 20%) and “train” the classification algorithm to
distinguish 90+ wines from 89- wines, based mostly on the word ideas. Then, this
newly educated algorithm is requested to “predict” the quality score (both 89- or
90+) of the remaining subset of the info (the opposite 80%). The ensuing
predictions are then compared to the recognized scores of that second subset. If the
predictions match the recognized scores, then the coaching was successful; and we
might then conclude that the reviewer was consistent in their descriptions.

This processing is all very
normal in knowledge mining, and it produces four measures of reviewer consistency:

  • accuracy — proportion of all predictions which are right
  • precision — proportion of the 90+ predictions which are right
  • sensitivity — proportion of the 90+ wines which might be predicted appropriately
  • specificity — proportion of the 89- wines which might be predicted appropriately.


The outcomes include a couple
of prolonged (and boring) tables displaying the 4 consistency measures for every
reviewer for every classification technique, some of which is then repeated as

The 2 evaluation strategies turned
out to differ somewhat, but have been however generally agreement, which means
that the results did not depend upon the small print of the info evaluation. This can be a
good thing.

The accuracy and specificity
have been often in the vary zero.80–0.95, with the precision and sensitivity within the
range zero.65–0.75. The latter proportions are quite mediocre; and this results
from the truth that the vast majority of wines had scores under 90, which is the
“false prediction” class for the info analysis. This makes it onerous for the
algorithms — they’ve to predict much more 89- wines than 90+ wines, but their
success is predicated primarily on the 90+ wines; this is like operating 20 miles however
being judged solely in your velocity during the last mile.

This bias in the knowledge is clearly
a critical limitation of the experimental design, but not one that’s straightforward to
handle — there will all the time be more wines of 89- quality than 90+. Any try
to subsample from the wines, to stability the two teams, will scale back the facility
of the info evaluation, and it also risks creating unintended additional biases.

So, the authors targeted on the
accuracy measure, which is summarized in the desk above for each reviewer and
technique. The authors’ basic conclusion is that the reviewers did fairly properly
(>85%) — in other phrases, they have been far more consistent than not.

Nevertheless, the reviewers did
differ notably between each other. For instance, Tim Fish and MaryAnn Worobiec
produced larger accuracy than did Bruce Sanderson, Harvey Steiman or James
Laube. The latter three didn’t do badly, by any means, however they have been less
consistent of their use of phrases to describe wines with a 90+ rating.

So, what are a few of these phrases
that have been used persistently to describe extremely rated wines? The table under
exhibits a number of the words that have been used mostly by each reviewer and will
be used to distinguish between 90+ versus 89- wines. The numbers shown for every
word are [the number of times used for a 90+ wine] / [the entire number of occasions
used]. I don’t specifically know why reviewers Tim Fish and MaryAnn Worobiec
do not seem on this record.

You’ll be able to see that these phrases are sometimes somewhat generic. Moreover, they are principally distinctive to a given individual — solely “beauty” appears for five out of the 7 reviewers, with “gorgeous”, “remarkable”, “seamless” and “seductive” appearing three occasions, and “power”, “terrific” and “wonderful” appearing twice.

What does this all imply?

We might conclude that the
consistency of the Wine Spectator
reviewers is no less than 85%, on common. We will subsequently use their word evaluations
as a guide to excessive wine quality, as a result of every reviewer is constant in their
use of specific words — these phrases are usually not use in an arbitrary method. In
one sense this is not shocking, however in another sense it’s good to have it

However, there appears
to be little consistency between
the  reviewers. That is, we’ve solely
native consistency, not general consistency throughout the journal. Maybe we’re
meant to use the numerical quality score for this objective, however I discussed
above the fact that that is undoubtedly one thing we can’t do. It might subsequently be fascinating to attempt combining the
knowledge for all the reviewers, to attempt to quantify between-person consistency
versus within-person consistency.

The authors don’t talk about the psychology behind their outcomes. Are some individuals much less constant because they are prepared to use a much bigger vocabulary? Or are they much less constant due to more restricted olfactory talents? Either method, the authors are clearly thinking about making an attempt to relate the wine descriptions to physicochemical laboratory knowledge from the wines, in future work.