Skip to content

Introduction

In this set of exercises we will work with the Wine Reviews dataset.

Run the following cell to load your data and some utility functions (including code to check your answers).

python
import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.indexing_selecting_and_assigning import *
print("Setup complete.")
Setup complete.

Look at an overview of your data by running the following line.

python
reviews.head()
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
2USTart and snappy, the flavors of lime flesh and...NaN8714.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
3USPineapple rind, lemon pith and orange blossom ...Reserve Late Harvest8713.0MichiganLake Michigan ShoreNaNAlexander PeartreeNaNSt. Julian 2013 Reserve Late Harvest Riesling ...RieslingSt. Julian
4USMuch like the regular bottling from 2012, this...Vintner's Reserve Wild Child Block8765.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineSweet Cheeks 2012 Vintner's Reserve Wild Child...Pinot NoirSweet Cheeks

Exercises

1.

Select the description column from reviews and assign the result to the variable desc.

python
# Your code here
desc = reviews.description

# Check your answer
q1.check()
<IPython.core.display.Javascript object>

Correct

Follow-up question: what type of object is desc? If you're not sure, you can check by calling Python's type function: type(desc).

python
q1.hint()
q1.solution()
<IPython.core.display.Javascript object>

Hint: As an example, say we would like to select the column column from a DataFrame table. Then we have two options: we can call either table.column or table["column"].

<IPython.core.display.Javascript object>

Solution:

python
desc = reviews.description

or

python
desc = reviews["description"]

desc is a pandas Series object, with an index matching the reviews DataFrame. In general, when we select a single column from a DataFrame, we'll get a Series.

2.

Select the first value from the description column of reviews, assigning it to variable first_description.

python
first_description = reviews.description.iloc[0]

# Check your answer
q2.check()
first_description
<IPython.core.display.Javascript object>

Correct:

python
first_description = reviews.description.iloc[0]

Note that while this is the preferred way to obtain the entry in the DataFrame, many other options will return a valid result, such as reviews.description.loc[0], reviews.description[0], and more!

markdown
"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive,
offering unripened apple, citrus and dried sage alongside brisk acidity."
python
q2.hint()
q2.solution()
<IPython.core.display.Javascript object>

Hint: To obtain a specific entry (corresponding to column column and row i) in a DataFrame table, we can call table.column.iloc[i]. Remember that Python indexing starts at 0!

<IPython.core.display.Javascript object>

Solution:

python
first_description = reviews.description.iloc[0]

Note that while this is the preferred way to obtain the entry in the DataFrame, many other options will return a valid result, such as reviews.description.loc[0], reviews.description[0], and more!

3.

Select the first row of data (the first record) from reviews, assigning it to the variable first_row.

python
first_row = reviews.iloc[0,:]

# Check your answer
q3.check()
first_row
<IPython.core.display.Javascript object>

Correct

country                                                    Italy
description    Aromas include tropical fruit, broom, brimston...
                                     ...                        
variety                                              White Blend
winery                                                   Nicosia
Name: 0, Length: 13, dtype: object
python
q3.hint()
q3.solution()
<IPython.core.display.Javascript object>

Hint: To obtain a specific row of a DataFrame, we can use the iloc operator. For more information, see the section on Index-based selection in the reference component.

<IPython.core.display.Javascript object>

Solution:

python
first_row = reviews.iloc[0]

4.

Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions.

Hint: format your output as a pandas Series.

python
first_descriptions = reviews.description.iloc[:10]

# Check your answer
q4.check()
first_descriptions
<IPython.core.display.Javascript object>

Correct:

python
first_descriptions = reviews.description.iloc[:10]

Note that many other options will return a valid result, such as desc.head(10) and reviews.loc[:9, "description"].

0    Aromas include tropical fruit, broom, brimston...
1    This is ripe and fruity, a wine that is smooth...
                           ...                        
8    Savory dried thyme notes accent sunnier flavor...
9    This has great depth of flavor with its fresh ...
Name: description, Length: 10, dtype: object
python
q4.hint()
q4.solution()
<IPython.core.display.Javascript object>

Hint: We can use either the loc or iloc operator to solve this problem. For more information, see the sections on Index-based selection and Label-based selection in the reference component.

<IPython.core.display.Javascript object>

Solution:

python
first_descriptions = reviews.description.iloc[:10]

Note that many other options will return a valid result, such as desc.head(10) and reviews.loc[:9, "description"].

5.

Select the records with index labels 1, 2, 3, 5, and 8, assigning the result to the variable sample_reviews.

In other words, generate the following DataFrame:

python
rows = [1,2,3,5,8]
sample_reviews = reviews.loc[rows]

# Check your answer
q5.check()
sample_reviews
<IPython.core.display.Javascript object>

Correct

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
2USTart and snappy, the flavors of lime flesh and...NaN8714.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
3USPineapple rind, lemon pith and orange blossom ...Reserve Late Harvest8713.0MichiganLake Michigan ShoreNaNAlexander PeartreeNaNSt. Julian 2013 Reserve Late Harvest Riesling ...RieslingSt. Julian
5SpainBlackberry and raspberry aromas show a typical...Ars In Vitro8715.0Northern SpainNavarraNaNMichael Schachner@wineschachTandem 2011 Ars In Vitro Tempranillo-Merlot (N...Tempranillo-MerlotTandem
8GermanySavory dried thyme notes accent sunnier flavor...Shine8712.0RheinhessenNaNNaNAnna Lee C. IijimaNaNHeinz Eifel 2013 Shine Gewürztraminer (Rheinhe...GewürztraminerHeinz Eifel
python
q5.hint()
q5.solution()
<IPython.core.display.Javascript object>

Hint: Use either the loc or iloc operator to select rows of a DataFrame.

<IPython.core.display.Javascript object>

Solution:

python
indices = [1, 2, 3, 5, 8]
sample_reviews = reviews.loc[indices]

6.

Create a variable df containing the country, province, region_1, and region_2 columns of the records with the index labels 0, 1, 10, and 100. In other words, generate the following DataFrame:

python
rows = [0,1,10,100]
columns_index = ["country","province","region_1","region_2"]
df = reviews.loc[rows,columns_index]

# Check your answer
q6.check()
df
<IPython.core.display.Javascript object>

Correct

countryprovinceregion_1region_2
0ItalySicily & SardiniaEtnaNaN
1PortugalDouroNaNNaN
10USCaliforniaNapa ValleyNapa
100USNew YorkFinger LakesFinger Lakes
python
q6.hint()
q6.solution()
<IPython.core.display.Javascript object>

Hint: Use the loc operator. (Note that it is also possible to solve this problem using the iloc operator, but this would require extra effort to convert each column name to a corresponding integer-valued index.)

<IPython.core.display.Javascript object>

Solution:

python
cols = ['country', 'province', 'region_1', 'region_2']
indices = [0, 1, 10, 100]
df = reviews.loc[indices, cols]

7.

Create a variable df containing the country and variety columns of the first 100 records.

Hint: you may use loc or iloc. When working on the answer this question and the several of the ones that follow, keep the following "gotcha" described in the tutorial:

iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. loc, meanwhile, indexes inclusively.

This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000. In this case df.iloc[0:1000] will return 1000 entries, while df.loc[0:1000] return 1001 of them! To get 1000 elements using loc, you will need to go one lower and ask for df.iloc[0:999].

python
df = reviews.loc[0:99,["country","variety"]]

# Check your answer
q7.check()
df
<IPython.core.display.Javascript object>

Correct:

python
cols = ['country', 'variety']
df = reviews.loc[:99, cols]

or

python
cols_idx = [0, 11]
df = reviews.iloc[:100, cols_idx]
countryvariety
0ItalyWhite Blend
1PortugalPortuguese Red
.........
98ItalySangiovese
99USBordeaux-style Red Blend

100 rows × 2 columns

python
q7.hint()
q7.solution()
<IPython.core.display.Javascript object>

Hint: It is most straightforward to solve this problem with the loc operator. (However, if you decide to use iloc, remember to first convert each column into a corresponding integer-valued index.)

<IPython.core.display.Javascript object>

Solution:

python
cols = ['country', 'variety']
df = reviews.loc[:99, cols]

or

python
cols_idx = [0, 11]
df = reviews.iloc[:100, cols_idx]

8.

Create a DataFrame italian_wines containing reviews of wines made in Italy. Hint: reviews.country equals what?

python
italian_wines = reviews[reviews.country == "Italy"]

# Check your answer
q8.check()
<IPython.core.display.Javascript object>

Correct

python
q8.hint()
q8.solution()
<IPython.core.display.Javascript object>

Hint: For more information, see the section on Conditional selection in the reference component.

<IPython.core.display.Javascript object>

Solution:

python
italian_wines = reviews[reviews.country == 'Italy']

9.

Create a DataFrame top_oceania_wines containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand.

python

top_oceania_wines = reviews.loc[reviews.country.isin(["Australia","New Zealand"])
                                & (reviews.points >= 95)]

# Check your answer
q9.check()
top_oceania_wines
<IPython.core.display.Javascript object>

Correct

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
345AustraliaThis wine contains some material over 100 year...Rare100350.0VictoriaRutherglenNaNJoe Czerwinski@JoeCzChambers Rosewood Vineyards NV Rare Muscat (Ru...MuscatChambers Rosewood Vineyards
346AustraliaThis deep brown wine smells like a damp, mossy...Rare98350.0VictoriaRutherglenNaNJoe Czerwinski@JoeCzChambers Rosewood Vineyards NV Rare Muscadelle...MuscadelleChambers Rosewood Vineyards
..........................................
122507New ZealandThis blend of Cabernet Sauvignon (62.5%), Merl...SQM Gimblett Gravels Cabernets/Merlot9579.0Hawke's BayNaNNaNJoe Czerwinski@JoeCzSquawking Magpie 2014 SQM Gimblett Gravels Cab...Bordeaux-style Red BlendSquawking Magpie
122939AustraliaFull-bodied and plush yet vibrant and imbued w...The Factor98125.0South AustraliaBarossa ValleyNaNJoe Czerwinski@JoeCzTorbreck 2013 The Factor Shiraz (Barossa Valley)ShirazTorbreck

49 rows × 13 columns

python
q9.hint()
q9.solution()
<IPython.core.display.Javascript object>

Hint: For more information, see the section on Conditional selection in the reference component.

<IPython.core.display.Javascript object>

Solution:

python
top_oceania_wines = reviews.loc[
    (reviews.country.isin(['Australia', 'New Zealand']))
    & (reviews.points >= 95)
]