# Pandas Lesson 1: Series¶

This tutorial introduces the fundamental building block of `pandas`

: `Series`

. By the end of this section, you will learn how to create different types of Series, subset them, modify them, and summarize them.

## 1. What is a Series?¶

In the simpliest terms, a `Series`

is an ordered collection of values, generally all of the same type. For example, you can have a Series that contains the ages of everyone in your class (a numeric Series), or a Series of all the names of people in your family (a string Series).

This may sound familiar: isn’t that how we described `numpy`

vectors (i.e. one-dimensional numpy arrays)? Yes! In fact, Series are basically one-dimensional `numpy`

arrays with lots of extra features added on top of them. As we’ll see, most everything you could do with a `numpy`

array you can do with a Series; Series can just do *more*.

Series are central to `pandas`

because `pandas`

was designed for statistics, and Series are a perfect way to collect lots of different observations of a variable.

There are lots of ways to create Series, but the easiest is to just pass a list or an array to the `pd.Series`

constructor.

To illustrate, let me tell you about a week at the zoo I wish I owned. Here’s what attendance looked like at my zoo last week:

Day of Week | Attendees |
---|---|

Monday | 132 people |

Tuesday | 94 people |

Wednesday | 112 people |

Thursday | 84 people |

Friday | 254 people |

Saturday | 322 people |

Sunday | 472 people |

Let’s make a Series for this attendance pattern:

```
[1]:
```

```
import pandas as pd # We have to import pandas to use Series!
attendance = pd.Series([132, 94, 112, 84, 254, 322, 472])
attendance
```

```
[1]:
```

```
0 132
1 94
2 112
3 84
4 254
5 322
6 472
dtype: int64
```

## Indices¶

One of the fundamental differences between `numpy`

arrays and Series is that all Series are associated with an `index`

. An index is a set of labels for each observation in a Series. If you don’t specify an `index`

when you create a Series, `pandas`

will just create a default index that just labels each row with it’s initial row number, but you can specify an index if you want.

In this case, for example, we know that these entries are associated with different days of the week, so let’s specify an index for our `attendance`

Series:

```
[2]:
```

```
attendance = pd.Series([132, 94, 112, 84, 254, 322, 472],
index=['Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday'])
attendance
```

```
[2]:
```

```
Monday 132
Tuesday 94
Wednesday 112
Thursday 84
Friday 254
Saturday 322
Sunday 472
dtype: int64
```

Now as we see the rows are labeled with days of the week on the left side, rather than with initial row numbers.

Note that you can always access a Series’ index with the `.index`

property:

```
[3]:
```

```
attendance.index
```

```
[3]:
```

```
Index(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
'Sunday'],
dtype='object')
```

An important property of index labels is that they stay with each row, even if you sort your data. So if I sort my Series by attendance, not only will rows re-order, but so will the index labels:

```
[4]:
```

```
attendance = attendance.sort_values()
attendance
```

```
[4]:
```

```
Thursday 84
Tuesday 94
Wednesday 112
Monday 132
Friday 254
Saturday 322
Sunday 472
dtype: int64
```

**Note:** This seems intuitive with days-of-the-week as our index labels, but it can be confusing when your index starts out as row numbers. For example, if you had not changed our index to be days of the week, then the default index would look like the index labels were just row numbers. But if we then *sort* the Series, the numbers will shuffle, and they will no longer correspond to row numbers:

```
[5]:
```

```
attendance = pd.Series([132, 94, 112, 84, 254, 322, 472])
attendance
```

```
[5]:
```

```
0 132
1 94
2 112
3 84
4 254
5 322
6 472
dtype: int64
```

```
[6]:
```

```
attendance = attendance.sort_values()
attendance
```

```
[6]:
```

```
3 84
1 94
2 112
0 132
4 254
5 322
6 472
dtype: int64
```

## 2. Subsetting Series¶

Extracting a subset of elements from a Series is an extremely important task, not least because it generalizes nicely to working with bigger datasets (which are at the heart of data science). This process — whether applied to a Series or a dataset — is often referred to as “taking a subset”, “subsetting”, or “filtering”. If there is one skill you need to master as quickly as possible, it’s this.

In `pandas`

, there are three ways to filter a Series: using a separate logical Series, using row-number indexing, and using index labels. I tend to use the first method most, but all three are useful. The first and second of these you will recognize from `numpy`

arrays, while the last once (since it uses index labels which only exist in `pandas`

) is unique to `pandas`

.

### Subsetting using row-number indexing¶

A different way to subset a Series is to specify the row-numbers you want to keep using the `iloc`

function. (`iloc`

stands for “integer location”, since row numbers are always integers). This will give you the behavior you’re more familiar with from `R`

or `numpy`

. Just remember that, as in all of Python, the first row is numbered `0`

!

```
[7]:
```

```
fruits = pd.Series(["apple", "banana"])
fruits.iloc[0]
```

```
[7]:
```

```
'apple'
```

You can also subset with lists of rows, or ranges, just like in `numpy`

:

```
[8]:
```

```
fruits.iloc[[0, 1]]
```

```
[8]:
```

```
0 apple
1 banana
dtype: object
```

```
[9]:
```

```
fruits.iloc[0:2]
```

```
[9]:
```

```
0 apple
1 banana
dtype: object
```

### Subsetting using index values¶

Lastly, we can subset our rows using the index values associated with each row using the `loc`

function.

```
[10]:
```

```
attendance = pd.Series([132, 94, 112, 84, 254, 322, 472],
index=['Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday'])
```

```
[11]:
```

```
attendance.loc["Monday"]
```

```
[11]:
```

```
132
```

You can also ask for ranges of index labels. Note that unlike in integer ranges (like the `0:2`

we used above to get rows 0 and 1), index label ranges *include* the last item in the range. So for example if I ask for `.loc["Monday":"Friday"]`

, I will get Friday included, even if `.iloc[0:2]`

doesn’t include 2.

```
[12]:
```

```
attendance.loc["Monday":"Friday"]
```

```
[12]:
```

```
Monday 132
Tuesday 94
Wednesday 112
Thursday 84
Friday 254
dtype: int64
```

### Subsetting with logicals¶

Let’s jump right into an example, using our Zoo attendance Series:

```
[13]:
```

```
attendance = pd.Series([132, 94, 112, 84, 254, 322, 472],
index=['Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday'])
attendance
```

```
[13]:
```

```
Monday 132
Tuesday 94
Wednesday 112
Thursday 84
Friday 254
Saturday 322
Sunday 472
dtype: int64
```

Suppose we want to only get days with at least 100 people attending. We can subset our Series by using a simple test to build a Series of booleans (True and False values), then asking `pandas`

for the rows of our Series for which the entry in our test Series is `True`

:

```
[14]:
```

```
was_busy = attendance > 100
was_busy
```

```
[14]:
```

```
Monday True
Tuesday False
Wednesday True
Thursday False
Friday True
Saturday True
Sunday True
dtype: bool
```

```
[15]:
```

```
busy_days = attendance.loc[was_busy]
busy_days
```

```
[15]:
```

```
Monday 132
Wednesday 112
Friday 254
Saturday 322
Sunday 472
dtype: int64
```

There is one really important distinction between how subsetting works in `pandas`

and most other languages though, which has to do with indices. Suppose we want to subset a Series with fruits to only get the entry “apple”. Would could do the following:

```
[16]:
```

```
fruits = pd.Series(["apple", "banana"])
apple_selector = pd.Series([True, False])
fruits.loc[apple_selector]
```

```
[16]:
```

```
0 apple
dtype: object
```

This looks familiar from `numpy`

, *but*:

A very important difference between `pandas`

and other languages and libraries (like `R`

and `numpy`

) is that when a logical Series is passed into `loc`

, evaluation is done *not* on the basis of the order of entries, but on the basis of index values. In the case above, because we did not specify indices for either `fruits`

or `apple_selector`

, they both got the usual default index values of their initial row numbers. But let’s see what happens if we change their indices so they don’t
match their order:

```
[17]:
```

```
fruits # We can leave fruits as they are
```

```
[17]:
```

```
0 apple
1 banana
dtype: object
```

```
[18]:
```

```
apple_selector = pd.Series([True, False], index=[1, 0])
apple_selector
```

```
[18]:
```

```
1 True
0 False
dtype: bool
```

Note that we’ve *flipped* the index order for `apple_selector`

: the first row has index value 1, and the second row has value 2. Now watch what happens when we put `apple_selector`

in square brackets:

```
[19]:
```

```
fruits.loc[apple_selector]
```

```
[19]:
```

```
1 banana
dtype: object
```

We get `banana`

! That’s because in `apple_selector`

, the index value associated with the `True`

entry as 1, and the row of `fruit`

that had index value 1 was `banana`

, even though they are in different rows. This is called `index alignment`

, and is absolutely crucial to keep in mind while using `pandas`

.

But note this only happens *if* your boolean array is a Series (and thus has an index). If you pass a `numpy`

boolean array or a list of booleans (neither of which have a concept of an index), then despite using `loc`

, alignment will be based on row numbers not index values (because there *are* no index values to align).

```
[20]:
```

```
fruits.loc[[True, False]]
```

```
[20]:
```

```
0 apple
dtype: object
```

**UGH** I know. If I wrote pandas, this would not work and this would throw an exception. But that’s how it is.

While this distinction between row numbers and index values is important, though, it comes up less often than you’d think. That’s because usually we subset by feeding in a new Series of booleans we made by hand; instead we build a new Series by executing a test on the Series we’re using. And when we do that, the new Series of booleans will have the same index as the old Series, so they align naturally. Look back at our example of `was_busy`

: you’ll see that it automatically got the same index
as our original Series, `attendance`

. As a result, the first row of our boolean Series will have the same index value as the first row of our original Series, the second row of our boolean Series will have the same index value as the second row of our original Series, and so on. As a result, there’s no difference between matching on row order and matching on index value. But it does *occassionally* come up (like if you tried to re-sort one of these), so keep it in mind!

### Single Square Brackets (`[]`

)¶

As discussed above, because Series have both an order of rows, and labels for each row, you should always think carefully about which method of subsetting you are invoking. My advice: **Always using the ``.loc`` (for index labels) and ``.iloc`` (for row numbers) selectors. If you use these, the only surprising behavior to watch out for is that ``loc`` will align on row numbers if you pass a list or array of booleans with no index.** But since you *can’t* align on an index in that case, there’s
no alternative behavior you would be expecting in that situation.

However, there is another way to subset Series that is a little… stranger. In an effort to be easier for users, `pandas`

allows subsetting using *just* square brackets (without a `.loc`

or `.iloc`

). With just square brackets, `pandas`

will do different things depending on what you put in the square brackets. *In theory* this should always “do what you want it to do”, but in my experience it’s a recipe for errors. With that in mind, I don’t suggest using it, but I will detail how it works
here so you know. If this makes your head swim, just remember: **you can always use ``loc`` and ``iloc``. Single square brackets do not offer any functionality you can’t get with ``.loc`` or ``.iloc``.**

So, if you pass an index values into square brackets, `pandas`

will subset based on index values (as though you were using `.loc`

).

```
[21]:
```

```
attendance
```

```
[21]:
```

```
Monday 132
Tuesday 94
Wednesday 112
Thursday 84
Friday 254
Saturday 322
Sunday 472
dtype: int64
```

```
[22]:
```

```
attendance['Sunday']
```

```
[22]:
```

```
472
```

Similarly, if you pass booleans to square brackets, then `pandas`

will behave like you are using `.loc`

as well:

```
[23]:
```

```
attendance[attendance > 100]
```

```
[23]:
```

```
Monday 132
Wednesday 112
Friday 254
Saturday 322
Sunday 472
dtype: int64
```

(If it’s not clear to you why `attendance[attendance > 100]`

is a test with an index: Python first evaluates `attendance > 100`

. That generates a new Series of booleans with the same index as `attendance`

. Then Python evaluates the `attendance[]`

part of the problem.)

**BUT**: *if* your Series index is not integer based, *and* if you pass integers into the square brackets, it will act like you’re using `iloc`

:

```
[24]:
```

```
attendance[0]
```

```
[24]:
```

```
132
```

*Most* of the time, this works out. But you can get confused you are working with a Series that has a numeric index. If you pass an integer into `[]`

, *and* you have an index of integers, then `[0]`

will be treated like your typing `.loc[0]`

, not `.iloc[0]`

:

```
[25]:
```

```
series_w_numeric_index = pd.Series(["dog", "cat", "fish"], index=[2, 1, 0])
series_w_numeric_index
```

```
[25]:
```

```
2 dog
1 cat
0 fish
dtype: object
```

```
[26]:
```

```
series_w_numeric_index[0]
```

```
[26]:
```

```
'fish'
```

So personally, I try to always use `loc`

or `iloc`

to avoid this kind of confusion. But if you do use `[]`

on their own, just be very careful that you don’t inadvertently select row based on index values when you think you’re selecting on

## Types of Series¶

Before we dive too far into Series manipulations, it’s important to talk about datatypes. Every Series, as we will see, has a “dtype” (short for datatype). The `dtype`

of a Series is important to understand because a Series’ `dtype`

determines what manipulations you can apply to that series.

There are, broadly, two types of Series:

- Numeric: these hold numbers that
`pandas`

understands are numbers. Specific numeric datatypes include things like`int64`

, and`int32`

(integers), or`float64`

and`float32`

(floating point numbers). - Object: these are Series that can hold any Python object, like strings, numbers, Sets, you name it. They have dtype
`O`

for “objects”. They are flexible, but also very slow and actually harder to work with.

Numeric Series are by far the easiest to work with, and are generally either *integers* (`int64`

, `int32`

, etc.) or *floating point numbers* (`float64`

, `float32`

). We’ll talk more about the differences between these data types later, but for the moment it’s enough to know that *integer* Series (datatypes that start with `int`

) can *only* hold… well, integers (whole numbers), while *floating point numbers* Series (datatypes that start with `float`

) can hold integers, numbers with
decimal points, and even missing values.

The numbers at the end of these types (`64`

, `32`

, etc.) have to do with how many actual bits of data are allocated to each number, something we’ll discuss later in the course. For the moment, the differences between them don’t matter, and in general you’ll likely always see (and should use) the `64`

suffix.

You can check the `dtype`

of a Series by typing `.dtype`

. For example, here are some different kinds of Series:

```
[27]:
```

```
s = pd.Series([1, 2, 3])
s.dtype
```

```
[27]:
```

```
dtype('int64')
```

```
[28]:
```

```
s = pd.Series([1, 2, 3.14])
s.dtype
```

```
[28]:
```

```
dtype('float64')
```

```
[29]:
```

```
s = pd.Series([1, 2, "a string"])
s.dtype
```

```
[29]:
```

```
dtype('O')
```

As you can see, integer (`int64`

) Series can *only* hold integers. If we add one number with a decimal component, the whole thing becomes a `float64`

. Similarly, floating point Series can only hold numbers. If we add a single String, the whole thing becomes an Object (`O`

) type.

### Converting datatypes¶

If you want to change the datatype of a Series, you can do so with the `.asdtype()`

method… provided a conversion is possible! For example, you can always convert integer arrays to floating point Series because a whole number can be represented as a floating point number (just trust me on this for now… we’ll discuss why later!).

```
[30]:
```

```
s = pd.Series([1, 2, 3])
s = s.astype('float64')
s
```

```
[30]:
```

```
0 1.0
1 2.0
2 3.0
dtype: float64
```

But be careful: since integers can’t ever hold decimals, if you try and convert a floating point Series to an integer Series, it will just drop the decimal part of numbers with decimals!

```
[31]:
```

```
s = pd.Series([1, 2, 3.14])
s = s.astype('int64')
s
```

```
[31]:
```

```
0 1
1 2
2 3
dtype: int64
```

(Note Pandas is just doing the same thing regular python would do:

```
[32]:
```

```
int(3.14)
```

```
[32]:
```

```
3
```

But if you try and convert an “object” Series to numeric and there are numbers that can’t be converted, `pandas`

will throw an error:

```
[33]:
```

```
s = pd.Series([1, 2, "a string"])
s.astype('float64')
```

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-33-c0312cc33b10> in <module>
1 s = pd.Series([1, 2, "a string"])
----> 2 s.astype('float64')
~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors, **kwargs)
5689 # else, only a single dtype is given
5690 new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5691 **kwargs)
5692 return self._constructor(new_data).__finalize__(self)
5693
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in astype(self, dtype, **kwargs)
529
530 def astype(self, dtype, **kwargs):
--> 531 return self.apply('astype', dtype=dtype, **kwargs)
532
533 def convert(self, **kwargs):
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
393 copy=align_copy)
394
--> 395 applied = getattr(b, f)(**kwargs)
396 result_blocks = _extend_blocks(applied, result_blocks)
397
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
532 def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
533 return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 534 **kwargs)
535
536 def _astype(self, dtype, copy=False, errors='raise', values=None,
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
631
632 # _astype_nansafe works fine with 1-d only
--> 633 values = astype_nansafe(values.ravel(), dtype, copy=True)
634
635 # TODO(extension)
~/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
700 if copy or is_object_dtype(arr) or is_object_dtype(dtype):
701 # Explicit copy, or required since NumPy can't view from / to object.
--> 702 return arr.astype(dtype, copy=True)
703
704 return arr.view(dtype)
ValueError: could not convert string to float: 'a string'
```

## 3. Series Arithmetics¶

One of the nice things about Series is that, like `numpy`

arrays, we can easily do things like multiple *all* the values by another number easily. For example, suppose tickets to my zoo cost $15 per person. What is the total money generated by ticket sales each day? Let’s find out!

```
[34]:
```

```
attendance = pd.Series([132, 94, 112, 84, 254, 322, 472],
index=['Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday'])
attendance
```

```
[34]:
```

```
Monday 132
Tuesday 94
Wednesday 112
Thursday 84
Friday 254
Saturday 322
Sunday 472
dtype: int64
```

```
[35]:
```

```
revenue = attendance * 15
revenue
```

```
[35]:
```

```
Monday 1980
Tuesday 1410
Wednesday 1680
Thursday 1260
Friday 3810
Saturday 4830
Sunday 7080
dtype: int64
```

Now what if we want to know to the total amount raised in a week, instead of just the amount on each day? We can use one of R’s many helper functions – in this case `sum`

– which adds up all the values of a Series

```
[36]:
```

```
revenue.sum()
```

```
[36]:
```

```
22050
```

Cool!

This is an example of one of the three forms of Series arithmetic:

- A Series with more than one element and a Series with only one element.
- A Series modified by a function.
- Two Series with the same number of elements.
**When working with two Series, elements are matched based on index values, not row numbers**.

But note that the types of things you can do with a Series depends on the Series `dtype`

. Math functions, for example, can only be applied to numeric datatypes!

### Summarizing with Functions¶

We often want to get summary statistics from a Series — that is, learn something general about it by looking beyond its constituent elements. If we have a Series in which each element represents a person’s height, we may want to know who the shortest or tallest person is, what the median or mean height is, what the standard deviation is, etc. Here are common summary facts for numeric Series (some also work for object types):

```
my_numbers = pd.Series([1, 2, 3, 4])
my_numbers.dtype #check the dtype
len(my_numbers) #number of elements
my_numbers.max() #maximum value
my_numbers.min() #minimum value
my_numbers.sum() #sum of all values in the Series
my_numbers.mean() #mean
my_numbers.median() #median
my_numbers.var() #variance
my_numbers.std() #standard deviation
my_numbers.quantile() #return specified quantile, 0.5 if none specified
my_numbers.describe() #function that contains many summary stats from above
my_numbers.value_counts() # Tabulate out all the values. Add the argument `normalize=True` to get shares in each big.
```

Of those, two of the most powerful are `.describe()`

(for numeric Series that take on lots of values):

```
[37]:
```

```
my_numbers = pd.Series(range(100))
my_numbers.describe()
```

```
[37]:
```

```
count 100.000000
mean 49.500000
std 29.011492
min 0.000000
25% 24.750000
50% 49.500000
75% 74.250000
max 99.000000
dtype: float64
```

and `.value_counts()`

for numeric series with only a few unique values:

```
[38]:
```

```
my_numbers = pd.Series([1, 2, 2, 2, 2, 1, 1, -1, -1])
my_numbers.value_counts()
```

```
[38]:
```

```
2 4
1 3
-1 2
dtype: int64
```

Note that `.value_counts()`

can be combined with the `normalize=True`

argument to get the share of observations that have each unique value, rather than the count:

```
[39]:
```

```
my_numbers.value_counts(normalize=True)
```

```
[39]:
```

```
2 0.444444
1 0.333333
-1 0.222222
dtype: float64
```

## 4. Modifying Series Elements¶

The subsetting logic from above can be used to modify Series. The idea here is that instead of keeping elements that meet a logical condition or occur at a specific index, we can change them. For example, what if we had mis-entered attendance for our zoo? We can fix it using a logical test, row-number indexing (`iloc`

), or by index-value (`loc`

).

```
[40]:
```

```
attendance = pd.Series([132, 94, 112, 84, 254, 322, 472],
index=['Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday'])
attendance
```

```
[40]:
```

```
Monday 132
Tuesday 94
Wednesday 112
Thursday 84
Friday 254
Saturday 322
Sunday 472
dtype: int64
```

Oops! Turns out Tuesday attendance was 194, not 94! (It was a holiday).

```
[41]:
```

```
# Edit with a test:
attendance[attendance == 94] = 194
```

```
[42]:
```

```
# Edit with `iloc`:
attendance.iloc[1] = 194
```

```
[43]:
```

```
# Edit with `loc`:
attendance.loc['Tuesday'] = 194
```

```
[44]:
```

```
attendance
```

```
[44]:
```

```
Monday 132
Tuesday 194
Wednesday 112
Thursday 84
Friday 254
Saturday 322
Sunday 472
dtype: int64
```

### Element Modification and DataTypes¶

One of the big differences between Series and `numpy`

arrays is that Series are “dynamically typed”, meaning that if you have an integer Series and you try and add a number like 3.14 (which has a decimal component, and thus cannot be represented as a floating point number), `pandas`

will just convert your whole array to floating point numbers so that it can hold 3.14. Similarly, if you try and add a string to a floating point array, `pandas`

will just convert the whole array to an Object
Series.

```
[45]:
```

```
attendance.loc['Tuesday'] = 3.14
attendance
```

```
[45]:
```

```
Monday 132.00
Tuesday 3.14
Wednesday 112.00
Thursday 84.00
Friday 254.00
Saturday 322.00
Sunday 472.00
dtype: float64
```

```
[46]:
```

```
attendance.loc['Tuesday'] = 'no one showed up on Tuesday! :('
attendance
```

```
[46]:
```

```
Monday 132
Tuesday no one showed up on Tuesday! :(
Wednesday 112
Thursday 84
Friday 254
Saturday 322
Sunday 472
dtype: object
```

This is different than `numpy`

, where once an array has a type, it will only change types if you ask `numpy`

to change the type explicitly. If you try and stick 3.14 into an integer array, it will just coerce the value 3.14 into an integer by dropping the decimal component:

```
[47]:
```

```
import numpy as np
my_array = np.array([1, 2, 3], dtype='int')
my_array
```

```
[47]:
```

```
array([1, 2, 3])
```

```
[48]:
```

```
my_array[0] = 3.14
my_array
```

```
[48]:
```

```
array([3, 2, 3])
```

```
[49]:
```

```
my_array.dtype
```

```
[49]:
```

```
dtype('int64')
```

And if you try and insert a string, you’ll get an exception:

```
[50]:
```

```
my_array[0] = 'first entry'
```

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-50-39825a86f88a> in <module>
----> 1 my_array[0] = 'first entry'
ValueError: invalid literal for int() with base 10: 'first entry'
```

## 5. numpy Under The Hood¶

As you may recall, at the start of this tutorial I recommended thinking of Series as augmented 1-dimensional `numpy`

arrays. It turns out that’s more than just a metaphor: behind every Series *is* a numpy array which you can access with the `.values`

method:

```
[51]:
```

```
attendance.values
```

```
[51]:
```

```
array([132.0, 'no one showed up on Tuesday! :(', 112.0, 84.0, 254.0,
322.0, 472.0], dtype=object)
```

This is good to know because every now and then you my find a tool that works with `numpy`

arrays but *not* `pandas`

. And when that happens, you now know how to pull out the `numpy`

array underlying your Series and use it directly!

## 6. Exercises!¶

*If you are enrolled in Practical Data Science at Duke, don’t do these exercises on your own – we’ll do them in class!*