{ "cells": [ { "cell_type": "markdown", "id": "f540f43d", "metadata": {}, "source": [ "# iv. Numeric data wrangling\n", "\n", "> For text preprocessing, see the January 19 notebook." ] }, { "cell_type": "markdown", "id": "f7edf111", "metadata": {}, "source": [ "![wrangle](img/wrangle.png)" ] }, { "cell_type": "markdown", "id": "92f03b45", "metadata": {}, "source": [ "## The pandas library\n", "\n", "Import the pandas library with the alias pd. \n", "\n", "Use \"dot notation\" to apply its methods to the dataset, which is stored in a tabular .csv file. \n", "\n", "To import the .csv file, use the `.read_csv()` pandas method. The only argument is the file path to a .csv file. \n", "\n", "Import the data below and investigate it by applying pandas methods to the data frame `gap`. \n", "\n", "[Learn more about the Gapminder data](https://www.gapminder.org/data/)" ] }, { "cell_type": "code", "execution_count": 2, "id": "86631ea0", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "id": "245aa2f0", "metadata": {}, "outputs": [], "source": [ "gap = pd.read_csv(\"data/gapminder-FiveYearData.csv\")" ] }, { "cell_type": "code", "execution_count": 4, "id": "da27c125", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "print(type(gap))" ] }, { "cell_type": "markdown", "id": "4ee06727", "metadata": {}, "source": [ "## Pandas methods\n", "\n", "Just a small handful of pandas methods will help you accomplish several key data wrangling tasks. See the examplea code below.\n", "\n", "* First, look at the data\n", "* Summary statistics\n", "* Subset rows: \n", " * first row\n", " * first three rows\n", " * rows 10 thru 14\n", "* Subset columns: \n", " * one column\n", " * multiple columns\n", "* Row and column subset\n", "* Subset by logical condition(s)" ] }, { "cell_type": "markdown", "id": "6e682e67", "metadata": {}, "source": [ "## First, look at the data\n", "\n", "Use pandas methods to learn about the data!" ] }, { "cell_type": "markdown", "id": "69bffc0a", "metadata": {}, "source": [ "### `.head()`" ] }, { "cell_type": "code", "execution_count": 5, "id": "bf0813f7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyearpopcontinentlifeExpgdpPercap
0Afghanistan19528425333.0Asia28.801779.445314
1Afghanistan19579240934.0Asia30.332820.853030
2Afghanistan196210267083.0Asia31.997853.100710
3Afghanistan196711537966.0Asia34.020836.197138
4Afghanistan197213079460.0Asia36.088739.981106
\n", "
" ], "text/plain": [ " country year pop continent lifeExp gdpPercap\n", "0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314\n", "1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030\n", "2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710\n", "3 Afghanistan 1967 11537966.0 Asia 34.020 836.197138\n", "4 Afghanistan 1972 13079460.0 Asia 36.088 739.981106" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# look at first five rows by default\n", "gap.head()" ] }, { "cell_type": "markdown", "id": "44f40ba7", "metadata": {}, "source": [ "### `.columns`" ] }, { "cell_type": "code", "execution_count": 6, "id": "eb8ff873", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['country', 'year', 'pop', 'continent', 'lifeExp', 'gdpPercap'], dtype='object')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View column names\n", "# this is an attribute instead of a method - note the lack of parentheses ()\n", "gap.columns" ] }, { "cell_type": "markdown", "id": "f7530562", "metadata": {}, "source": [ "### `.shape`" ] }, { "cell_type": "code", "execution_count": 7, "id": "cb4967d3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1704, 6)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Show number of rows by columns\n", "# this is an attribute instead of a method - note the lack of parentheses ()\n", "gap.shape" ] }, { "cell_type": "markdown", "id": "3b902796", "metadata": {}, "source": [ "## Summary statistics\n", "\n", "Produce summary statistics, including: \n", "* Count, mean, sd, quartiles, min/max\n", "* Tabulate frequencies" ] }, { "cell_type": "markdown", "id": "53918b33", "metadata": {}, "source": [ "### `.describe()`" ] }, { "cell_type": "code", "execution_count": 8, "id": "7d45c9cd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearpoplifeExpgdpPercap
count1704.000001.704000e+031704.0000001704.000000
mean1979.500002.960121e+0759.4744397215.327081
std17.265331.061579e+0812.9171079857.454543
min1952.000006.001100e+0423.599000241.165876
25%1965.750002.793664e+0648.1980001202.060309
50%1979.500007.023596e+0660.7125003531.846988
75%1993.250001.958522e+0770.8455009325.462346
max2007.000001.318683e+0982.603000113523.132900
\n", "
" ], "text/plain": [ " year pop lifeExp gdpPercap\n", "count 1704.00000 1.704000e+03 1704.000000 1704.000000\n", "mean 1979.50000 2.960121e+07 59.474439 7215.327081\n", "std 17.26533 1.061579e+08 12.917107 9857.454543\n", "min 1952.00000 6.001100e+04 23.599000 241.165876\n", "25% 1965.75000 2.793664e+06 48.198000 1202.060309\n", "50% 1979.50000 7.023596e+06 60.712500 3531.846988\n", "75% 1993.25000 1.958522e+07 70.845500 9325.462346\n", "max 2007.00000 1.318683e+09 82.603000 113523.132900" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# produce summary statistics for numeric data\n", "gap.describe()" ] }, { "cell_type": "markdown", "id": "dcc9e580", "metadata": {}, "source": [ "### `.mean()` and `.std()`" ] }, { "cell_type": "code", "execution_count": 53, "id": "7e706097", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "59.47443936619714\n", "12.917107415241187\n" ] } ], "source": [ "# calculate mean and standard deviation of lifeExp\n", "lifeExp_mean = gap[\"lifeExp\"].mean()\n", "lifeExp_sd = gap[\"lifeExp\"].std()\n", "print(lifeExp_mean)\n", "print(lifeExp_sd)" ] }, { "cell_type": "markdown", "id": "be4f8fef", "metadata": {}, "source": [ "### `.groupby()` and `.count()`" ] }, { "cell_type": "code", "execution_count": 9, "id": "c06d3ce5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "continent\n", "Africa 624\n", "Americas 300\n", "Asia 396\n", "Europe 360\n", "Oceania 24\n", "Name: country, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# count the number of countries grouped by each continent\n", "gap.groupby(\"continent\").count()[\"country\"]" ] }, { "cell_type": "markdown", "id": "0e09b80a", "metadata": {}, "source": [ "## Subset rows or columns\n", "\n", "Sampling data is necessary for many reasons, including quick sanity checks. \n", "\n", "Slice a data frame by using bracket notation to specify start and end points `[start : end]`\n", "\n", "The `[start ` index is _included_ and the ` end]` index is **excluded**. \n", "\n", "> Remember that Python is a zero-indexed language, so starts counting from zero, not one.\n", "\n", "Leave the start or end values blank to start from the beginning, or go to the end of a collection. " ] }, { "cell_type": "markdown", "id": "656fb7c8", "metadata": {}, "source": [ "### Row subset: slice just the first row" ] }, { "cell_type": "code", "execution_count": 85, "id": "b97402ad", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyearpopcontinentlifeExpgdpPercap
0Afghanistan19528425333.0Asia28.801779.445314
\n", "
" ], "text/plain": [ " country year pop continent lifeExp gdpPercap\n", "0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gap[:1]" ] }, { "cell_type": "markdown", "id": "fee70150", "metadata": {}, "source": [ "### Row subset: slice first three rows" ] }, { "cell_type": "code", "execution_count": 86, "id": "17578ed4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyearpopcontinentlifeExpgdpPercap
0Afghanistan19528425333.0Asia28.801779.445314
1Afghanistan19579240934.0Asia30.332820.853030
2Afghanistan196210267083.0Asia31.997853.100710
\n", "
" ], "text/plain": [ " country year pop continent lifeExp gdpPercap\n", "0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314\n", "1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030\n", "2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gap[:3]" ] }, { "cell_type": "markdown", "id": "f07299a7", "metadata": {}, "source": [ "### Row subset: slice rows 10 thru 14" ] }, { "cell_type": "code", "execution_count": 87, "id": "8c00e332", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyearpopcontinentlifeExpgdpPercap
10Afghanistan200225268405.0Asia42.129726.734055
11Afghanistan200731889923.0Asia43.828974.580338
12Albania19521282697.0Europe55.2301601.056136
13Albania19571476505.0Europe59.2801942.284244
14Albania19621728137.0Europe64.8202312.888958
\n", "
" ], "text/plain": [ " country year pop continent lifeExp gdpPercap\n", "10 Afghanistan 2002 25268405.0 Asia 42.129 726.734055\n", "11 Afghanistan 2007 31889923.0 Asia 43.828 974.580338\n", "12 Albania 1952 1282697.0 Europe 55.230 1601.056136\n", "13 Albania 1957 1476505.0 Europe 59.280 1942.284244\n", "14 Albania 1962 1728137.0 Europe 64.820 2312.888958" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "subset1 = gap[10:15]\n", "subset1" ] }, { "cell_type": "markdown", "id": "a506d352", "metadata": {}, "source": [ "### Column subset: one column" ] }, { "cell_type": "code", "execution_count": 88, "id": "a5e85c66", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 28.801\n", "1 30.332\n", "2 31.997\n", "3 34.020\n", "4 36.088\n", " ... \n", "1699 62.351\n", "1700 60.377\n", "1701 46.809\n", "1702 39.989\n", "1703 43.487\n", "Name: lifeExp, Length: 1704, dtype: float64" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# type the column name as a string in square brackets\n", "gap['lifeExp']" ] }, { "cell_type": "markdown", "id": "c4e64819", "metadata": {}, "source": [ "### Column subset: multiple columns" ] }, { "cell_type": "code", "execution_count": 89, "id": "57f1d01a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
continentlifeExpgdpPercap
0Asia28.801779.445314
1Asia30.332820.853030
2Asia31.997853.100710
3Asia34.020836.197138
4Asia36.088739.981106
............
1699Africa62.351706.157306
1700Africa60.377693.420786
1701Africa46.809792.449960
1702Africa39.989672.038623
1703Africa43.487469.709298
\n", "

1704 rows × 3 columns

\n", "
" ], "text/plain": [ " continent lifeExp gdpPercap\n", "0 Asia 28.801 779.445314\n", "1 Asia 30.332 820.853030\n", "2 Asia 31.997 853.100710\n", "3 Asia 34.020 836.197138\n", "4 Asia 36.088 739.981106\n", "... ... ... ...\n", "1699 Africa 62.351 706.157306\n", "1700 Africa 60.377 693.420786\n", "1701 Africa 46.809 792.449960\n", "1702 Africa 39.989 672.038623\n", "1703 Africa 43.487 469.709298\n", "\n", "[1704 rows x 3 columns]" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# note the double sets of brackets\n", "subset2 = gap[['continent', 'lifeExp', 'gdpPercap']]\n", "subset2" ] }, { "cell_type": "markdown", "id": "7dc672ec", "metadata": {}, "source": [ "### Row and column subset" ] }, { "cell_type": "code", "execution_count": 95, "id": "7fc903a9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
continentlifeExpgdpPercap
855Asia64.62480894.88326
856Asia67.712109347.86700
857Asia69.34359265.47714
858Asia71.30931354.03573
\n", "
" ], "text/plain": [ " continent lifeExp gdpPercap\n", "855 Asia 64.624 80894.88326\n", "856 Asia 67.712 109347.86700\n", "857 Asia 69.343 59265.47714\n", "858 Asia 71.309 31354.03573" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# subset more than one column and rows 855 thru 858\n", "subset3 = gap[['continent', 'lifeExp', 'gdpPercap']][855:859]\n", "subset3" ] }, { "cell_type": "code", "execution_count": 19, "id": "f3275af8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(gap[\"lifeExp\"])" ] }, { "cell_type": "markdown", "id": "9a2e1768", "metadata": {}, "source": [ "### Subset by logical condition(s)" ] }, { "cell_type": "code", "execution_count": 104, "id": "dc58192f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyearpopcontinentlifeExpgdpPercap
71Australia200720434176.0Oceania81.23534435.36744
670Hong Kong China20026762476.0Asia81.49530209.01516
671Hong Kong China20076980412.0Asia82.20839724.97867
695Iceland2007301931.0Europe81.75736180.78919
802Japan2002127065841.0Asia82.00028604.59190
803Japan2007127467972.0Asia82.60331656.06806
1487Switzerland20077554661.0Europe81.70137506.41907
\n", "
" ], "text/plain": [ " country year pop continent lifeExp gdpPercap\n", "71 Australia 2007 20434176.0 Oceania 81.235 34435.36744\n", "670 Hong Kong China 2002 6762476.0 Asia 81.495 30209.01516\n", "671 Hong Kong China 2007 6980412.0 Asia 82.208 39724.97867\n", "695 Iceland 2007 301931.0 Europe 81.757 36180.78919\n", "802 Japan 2002 127065841.0 Asia 82.000 28604.59190\n", "803 Japan 2007 127467972.0 Asia 82.603 31656.06806\n", "1487 Switzerland 2007 7554661.0 Europe 81.701 37506.41907" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# lifeExp is greater than 80\n", "le2 = gap[gap['lifeExp'] > 81]\n", "le2" ] }, { "cell_type": "code", "execution_count": 109, "id": "9d6aba51", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyearpopcontinentlifeExpgdpPercap
695Iceland2007301931.0Europe81.75736180.78919
\n", "
" ], "text/plain": [ " country year pop continent lifeExp gdpPercap\n", "695 Iceland 2007 301931.0 Europe 81.757 36180.78919" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# logical AND (all conditions must be satisfied to be included)\n", "\n", "# create subset that includes life expectancy greater than 81 AND pop < 500,000.\n", "\n", "year2002 = gap[(gap[\"lifeExp\"] > 81) & (gap[\"pop\"] < 500000)]\n", "year2002" ] }, { "cell_type": "code", "execution_count": 129, "id": "041ebce4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryyearpopcontinentlifeExpgdpPercap
671Hong Kong China20076980412.0Asia82.20839724.978670
744Ireland19522952156.0Europe66.9105210.280328
745Ireland19572878220.0Europe68.9005599.077872
746Ireland19622830000.0Europe70.2906631.597314
747Ireland19672900100.0Europe71.0807655.568963
748Ireland19723024400.0Europe71.2809530.772896
749Ireland19773271900.0Europe72.03011150.981130
750Ireland19823480000.0Europe73.10012618.321410
751Ireland19873539900.0Europe74.36013872.866520
752Ireland19923557761.0Europe75.46717558.815550
753Ireland19973667233.0Europe76.12224521.947130
754Ireland20023879155.0Europe77.78334077.049390
755Ireland20074109086.0Europe78.88540675.996350
803Japan2007127467972.0Asia82.60331656.068060
\n", "
" ], "text/plain": [ " country year pop continent lifeExp gdpPercap\n", "671 Hong Kong China 2007 6980412.0 Asia 82.208 39724.978670\n", "744 Ireland 1952 2952156.0 Europe 66.910 5210.280328\n", "745 Ireland 1957 2878220.0 Europe 68.900 5599.077872\n", "746 Ireland 1962 2830000.0 Europe 70.290 6631.597314\n", "747 Ireland 1967 2900100.0 Europe 71.080 7655.568963\n", "748 Ireland 1972 3024400.0 Europe 71.280 9530.772896\n", "749 Ireland 1977 3271900.0 Europe 72.030 11150.981130\n", "750 Ireland 1982 3480000.0 Europe 73.100 12618.321410\n", "751 Ireland 1987 3539900.0 Europe 74.360 13872.866520\n", "752 Ireland 1992 3557761.0 Europe 75.467 17558.815550\n", "753 Ireland 1997 3667233.0 Europe 76.122 24521.947130\n", "754 Ireland 2002 3879155.0 Europe 77.783 34077.049390\n", "755 Ireland 2007 4109086.0 Europe 78.885 40675.996350\n", "803 Japan 2007 127467972.0 Asia 82.603 31656.068060" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# logical OR (one of multiple conditions must be satisfied to be included)\n", "\n", "# create subset that includes country equals Ireland OR life expectancy greater than 82. \n", "\n", "ireland82 = gap[(gap[\"country\"] == \"Ireland\") | (gap[\"lifeExp\"] > 82)]\n", "ireland82" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 5 }