iii. Boilerplate code review¶
Below are just a few examples of basic Python programming to accomplish data saving and importing tasks.
Variable assignment¶
In Python, data are saved in variables.
Variable names should be simple and descriptive.
Assign a variable by typing its name to the left of the equals sign. Whatever is written to the right of the equals sign will be saved in the variable.
You could read this as “x is defined as four”, “five is assigned to y”, or “z is six”.
# define one variable
x = 1
# assign multiple variables
x = 1
y = 2
z = 3
Functions, arguments, and methods¶
Functions, arguments, and methods form the core user framework for Python programming.
Functions: Perform actions on a thing
Argument: The “things” (text, values, expressions, datasets, etc.)
Note “parameters” are the variables during function definition. Arguments are the values we pass into these placeholders while calling the function.
Methods: Type-specific functions (i.e., can only use a specific type of data and not other types). Use “dot” notation to utilize methods on a variable or other object.
For example, you could type
gap = pd.read_csv('data/gapminder-FiveYearData.csv')
to use theread_csv()
method from the pandas library (imported as the aliaspd
) to load the Gapminder data.
Data types¶
Everything in Python has a type which determines how we can manipulate that piece of data. Be careful, it is easy to get confused when trying to complete multiple tasks that use lots of different variables!
# float (decimals)
# use a decimal to create floats
pi = 3.14
print(type(pi))
<class 'float'>
# integer (whole numbers)
# do not use a decimal for integers
amount = 4
print(type(amount))
<class 'int'>
# string (text)
# wrap text data in quotations
welcome = "Welcome to Stanford Libraries"
print(type(welcome))
<class 'str'>
# boolean (logical)
# True or False (stored as 1 and 0)
print(type(True))
print(False - True)
<class 'bool'>
-1
Addition examples with strings versus numbers¶
# character strings
'1' + '1'
'11'
# integers
1 + 1
2
Data structures¶
Data can be stored in a variety of ways.
Indexing¶
Python is a zero-indexed programming language and means that you start counting from zero. Thus, the first element in a collection is referenced by 0 instead of 1.
List¶
Lists are ordered groups of data that are both created and indexed (positionally referenced) with square brackets []
.
animals = ['shark', 'dolphin']
animals[0]
'shark'
animals = ['shark', 'dolphin', ['dog', 'cat'], ['tree', 'cactus']]
print(animals[3][0])
print(animals[2][1])
tree
cat
Dictionary¶
Dictionaries are unordered groups of “key:value” pairs. Use the key to access the value.
apple = {'name': 'apple', 'color': ['red', 'green'], 'recipes': ['pie', 'salad', 'sauce']}
orange = {'name': 'orange', 'color': 'orange', 'recipes': ['juice', 'marmalade', 'gratin']}
fruits = {'fruits': [apple, orange]}
fruits
{'fruits': [{'name': 'apple',
'color': ['red', 'green'],
'recipes': ['pie', 'salad', 'sauce']},
{'name': 'orange',
'color': 'orange',
'recipes': ['juice', 'marmalade', 'gratin']}]}
fruits['fruits'][1]['recipes'][0]
'juice'
Import text data as a character string¶
Import text using the open().read()
Python convention to import text as a single string.
frank = open('data/frankenstein.txt').read()
# print only the first 1000 characters
print(frank[:1000])
The Project Gutenberg eBook of Frankenstein, by Mary Wollstonecraft (Godwin) Shelley
This eBook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United States, you
will have to check the laws of the country where you are located before
using this eBook.
Title: Frankenstein
or, The Modern Prometheus
Author: Mary Wollstonecraft (Godwin) Shelley
Release Date: 31, 1993 [eBook #84]
[Most recently updated: November 13, 2020]
Language: English
Character set encoding: UTF-8
Produced by: Judith Boss, Christy Phillips, Lynn Hanninen, and David Meltzer. HTML version by Al Haines.
Further corrections by Menno de Leeuw.
*** START OF THE PROJECT GUTENBERG EBOOK FRANKENSTEIN ***
Frankenstein;
or, the Modern Prom
Import data frames with the pandas library¶
Data frames are programming speak for tabular spreadsheets organized into rows and columns and often stored in .csv format.
# Step 1. link the pandas library to our current notebook
import pandas as pd
# Step 2. enter the file path in pandas's read_csv() function
gap = pd.read_csv("data/gapminder-FiveYearData.csv")
# Step 3. view the data
print(gap)
country year pop continent lifeExp gdpPercap
0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314
1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030
2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710
3 Afghanistan 1967 11537966.0 Asia 34.020 836.197138
4 Afghanistan 1972 13079460.0 Asia 36.088 739.981106
... ... ... ... ... ... ...
1699 Zimbabwe 1987 9216418.0 Africa 62.351 706.157306
1700 Zimbabwe 1992 10704340.0 Africa 60.377 693.420786
1701 Zimbabwe 1997 11404948.0 Africa 46.809 792.449960
1702 Zimbabwe 2002 11926563.0 Africa 39.989 672.038623
1703 Zimbabwe 2007 12311143.0 Africa 43.487 469.709298
[1704 rows x 6 columns]
gap
country | year | pop | continent | lifeExp | gdpPercap | |
---|---|---|---|---|---|---|
0 | Afghanistan | 1952 | 8425333.0 | Asia | 28.801 | 779.445314 |
1 | Afghanistan | 1957 | 9240934.0 | Asia | 30.332 | 820.853030 |
2 | Afghanistan | 1962 | 10267083.0 | Asia | 31.997 | 853.100710 |
3 | Afghanistan | 1967 | 11537966.0 | Asia | 34.020 | 836.197138 |
4 | Afghanistan | 1972 | 13079460.0 | Asia | 36.088 | 739.981106 |
... | ... | ... | ... | ... | ... | ... |
1699 | Zimbabwe | 1987 | 9216418.0 | Africa | 62.351 | 706.157306 |
1700 | Zimbabwe | 1992 | 10704340.0 | Africa | 60.377 | 693.420786 |
1701 | Zimbabwe | 1997 | 11404948.0 | Africa | 46.809 | 792.449960 |
1702 | Zimbabwe | 2002 | 11926563.0 | Africa | 39.989 | 672.038623 |
1703 | Zimbabwe | 2007 | 12311143.0 | Africa | 43.487 | 469.709298 |
1704 rows × 6 columns
Challenge¶
Open JupyterLab. Try to import a:
different .txt file
different .csv file
If you encounter error messages, which ones?
Error messages¶
Python’s learning curve can feel creative and beyond frustrating at the same time. Just remember that everyone encounters errors - lots of them. When you do, start debugging by investigating the type of error message you receive.
Scroll to the end of the error message and read the last line to find the type of error.
Challenge¶
In JupyterLab, unhashtag the line of code for each error message below
Run each one
Inspect the error messages
Syntax errors¶
Invalid syntax
You have entered something python does not understand.
# x 89 5
Indentation
Your indentation does not conform to the rules
### indentation
# def example():
# test = "this is an example function"
# print(test)
# return example
Runtime errors¶
Name
You try to call a variable you have not yet assigned
# x
Or, you try to call a function from a library that you have not yet imported
# example()
Type
You write code with incompatible types
# "5" + 5
Index
You try to reference something that is out of range
my_list = ['green', True, 0.5, 4, ['cat', 'dog', 'pig']]
# my_list[5]
File errors¶
File not found
You try to import something that does not exist
# document = open('fakedocument.txt').read()