Introduction

The first step in most data analytics projects is reading the data file. In this exercise, you'll create Series and DataFrame objects, both by hand and by reading data files.

Run the code cell below to load libraries you will need (including code to check your answers).

python

import pandas as pd
pd.set_option('display.max_rows', 5)
from learntools.core import binder; binder.bind(globals())
from learntools.pandas.creating_reading_and_writing import *
print("Setup complete.")

Setup complete.

Exercises

1.

In the cell below, create a DataFrame fruits that looks like this:

python

# Your code goes here. Create a dataframe matching the above diagram and assign it to the variable fruits.


data_dict = {"Apples":30,"Bananas":21}

df = pd.DataFrame(data_dict,index=[0])

fruits = df

# Check your answer
q1.check()
fruits

<IPython.core.display.Javascript object>

Correct

	Apples	Bananas
0	30	21

python

#q1.hint()
#q1.solution()

2.

Create a dataframe fruit_sales that matches the diagram below:

python

# Your code goes here. Create a dataframe matching the above diagram and assign it to the variable fruit_sales.
data = {'Apples':[35,41],'Bananas':[21,34]}

df = pd.DataFrame(data,index = ['2017 Sales','2018 Sales'])


fruit_sales = df

# Check your answer
q2.check()
fruit_sales

<IPython.core.display.Javascript object>

Correct

	Apples	Bananas
2017 Sales	35	21
2018 Sales	41	34

python

q2.hint()
q2.solution()

<IPython.core.display.Javascript object>

Hint: Set the row labels in the DataFrame by using the index parameter in pd.DataFrame.

<IPython.core.display.Javascript object>

Solution:

python

fruit_sales = pd.DataFrame([[35, 21], [41, 34]], columns=['Apples', 'Bananas'],
                index=['2017 Sales', '2018 Sales'])

3.

Create a variable ingredients with a Series that looks like:

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object

python


data = ['4 cups', '1 cup ', ' 2 large','1 can']
items = ["Flour","Milk","Eggs","Spam"]
df1 = pd.Series(data,index = items, name='Dinner')

# Check your answer
q3.check()
ingredients

<IPython.core.display.Javascript object>

Correct

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object

python

q3.hint()
q3.solution()

<IPython.core.display.Javascript object>

Hint: Note that the Series must be named "Dinner". Use the name keyword-arg when creating your series.

<IPython.core.display.Javascript object>

Solution:

python

quantities = ['4 cups', '1 cup', '2 large', '1 can']
items = ['Flour', 'Milk', 'Eggs', 'Spam']
recipe = pd.Series(quantities, index=items, name='Dinner')

4.

Read the following csv dataset of wine reviews into a DataFrame called reviews:

The filepath to the csv file is ../input/wine-reviews/winemag-data_first150k.csv. The first few lines look like:

,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,"This tremendous 100% varietal wine[...]",Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and[...]",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez

python

df = pd.read_csv('../input/wine-reviews/winemag-data_first150k.csv',index_col =0)


reviews = df

# Check your answer
q4.check()
reviews

<IPython.core.display.Javascript object>

Correct

	country	description	designation	points	price	province	region_1	region_2	variety	winery
0	US	This tremendous 100% varietal wine hails from ...	Martha's Vineyard	96	235.0	California	Napa Valley	Napa	Cabernet Sauvignon	Heitz
1	Spain	Ripe aromas of fig, blackberry and cassis are ...	Carodorum Selección Especial Reserva	96	110.0	Northern Spain	Toro	NaN	Tinta de Toro	Bodega Carmen Rodríguez
...	...	...	...	...	...	...	...	...	...	...
150928	France	A perfect salmon shade, with scents of peaches...	Grand Brut Rosé	90	52.0	Champagne	Champagne	NaN	Champagne Blend	Gosset
150929	Italy	More Pinot Grigios should taste like this. A r...	NaN	90	15.0	Northeastern Italy	Alto Adige	NaN	Pinot Grigio	Alois Lageder

150930 rows × 10 columns

python

q4.hint()
q4.solution()

<IPython.core.display.Javascript object>

Hint: Note that the csv file begins with an unnamed column of increasing integers. We want this to be used as the index. Check out the description of the index_col keyword argument in the docs for read_csv.

<IPython.core.display.Javascript object>

Solution:

python

reviews = pd.read_csv('../input/wine-reviews/winemag-data_first150k.csv', index_col=0)

5.

Run the cell below to create and display a DataFrame called animals:

python

animals = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
animals

	Cows	Goats
Year 1	12	22
Year 2	20	19

In the cell below, write code to save this DataFrame to disk as a csv file with the name cows_and_goats.csv.

python

# Your code goes here
animals.to_csv('cows_and_goats.csv')
# Check your answer
q5.check()

<IPython.core.display.Javascript object>

Correct

python

#q5.hint()
#q5.solution()

Introduction ​

Exercises ​

1. ​

2. ​

3. ​

4. ​

5. ​

Introduction

Exercises

1.

2.

3.

4.

5.