# Index Alignment Exercises, Discussion¶

If you didn’t see the problem coming and change your code, you probably did the following:

```
[1]:
```

```
import pandas as pd
attendees = pd.DataFrame({'names': ["Jill", "Kumar", "Zaira"],
'prizes': [0, 0, 0],
'arrival_order': [2, 1, 3]})
arrival_prizes = pd.Series([20, 10, 0])
arrival_prizes
```

```
[1]:
```

```
0 20
1 10
2 0
dtype: int64
```

```
[2]:
```

```
attendees = attendees.sort_values('arrival_order')
attendees
```

```
[2]:
```

names | prizes | arrival_order | |
---|---|---|---|

1 | Kumar | 0 | 1 |

0 | Jill | 0 | 2 |

2 | Zaira | 0 | 3 |

```
[3]:
```

```
attendees['prizes'] = attendees['prizes'] + arrival_prizes
attendees
```

```
[3]:
```

names | prizes | arrival_order | |
---|---|---|---|

1 | Kumar | 10 | 1 |

0 | Jill | 20 | 2 |

2 | Zaira | 0 | 3 |

## The Problem¶

Uh oh… as you can see, 20 dollars went to the person who arrived second, and 10 dollars go to the person who arrived first… Why did that happen?

The answer is index alignment.

In `numpy`

or `R`

, when you try and add two arrays of the same length, the first entry of the first array is added to the first entry of the second array to create the first entry of the result; the second entry is added to the second entry, etc. For example:

```
[4]:
```

```
import numpy as np
np.array([1, 2, 3]) + np.array([1, 2, 3])
```

```
[4]:
```

```
array([2, 4, 6])
```

But that is NOT how `pandas`

operates. Instead, `pandas`

will always align data based on **index values**. And when you sort data, the index value associated with each row doesn’t change. Take a look at `attendees`

: when we sorted the data, the data was re-ordered, but so were the index values: Jill is now the second row, but her index value is still 0, Kumar is now the first row, but his index value is still 1.

```
[5]:
```

```
attendees
```

```
[5]:
```

names | prizes | arrival_order | |
---|---|---|---|

1 | Kumar | 10 | 1 |

0 | Jill | 20 | 2 |

2 | Zaira | 0 | 3 |

The result is that when you combine `attendees['arrival_order']`

with `prizes`

, the entry of `prizes`

with index value `0`

(20 dollars) is added to Jill’s row, and the entry with index value `1`

(10 dollars) is added to Kumar’s row.

### Forcing row-by-row alignment¶

Thankfully, it’s not too hard to avoid index alignment. When you reset an index, by default your data gets a new index where each row’s index value is it’s row number. To see this in action, let’s start our exercise over by creating our original data structures again:

```
[6]:
```

```
import pandas as pd
attendees = pd.DataFrame({'names': ["Jill", "Kumar", "Zaira"],
'prizes': [0, 0, 0],
'arrival_order': [2, 1, 3]})
arrival_prizes = pd.Series([20, 10, 0])
```

Now let’s sort `attendees`

by `arrival_order`

just like last time:

```
[7]:
```

```
attendees = attendees.sort_values('arrival_order')
attendees
```

```
[7]:
```

names | prizes | arrival_order | |
---|---|---|---|

1 | Kumar | 0 | 1 |

0 | Jill | 0 | 2 |

2 | Zaira | 0 | 3 |

But now *before* we add `arrival_prizes`

to `attendees`

, let’s reset the index of `arrival_prizes`

:

```
[8]:
```

```
attendees = attendees.reset_index()
attendees
```

```
[8]:
```

index | names | prizes | arrival_order | |
---|---|---|---|---|

0 | 1 | Kumar | 0 | 1 |

1 | 0 | Jill | 0 | 2 |

2 | 2 | Zaira | 0 | 3 |

As you can see, the new index (numbers in bold on left side) is now just the row numbers.

However, as you’ll see the old index has now been moved over to create a new column. Confusingly, `pandas`

likes to call that new column… `index`

. Yeah, I know. It’s not *the* index, it’s just a column *named* index. 😫.

(To avoid this problem, you can use the `drop=True`

option (`reset_index(drop=True)`

). But I wanted to show you the behavior if you don’t specify that so you aren’t confused when you see this for the first time. )

OK, so now that the index for `attendees`

is just row numbers, and that’s also the organization we have for `arrival_prizes`

:

```
[9]:
```

```
arrival_prizes
```

```
[9]:
```

```
0 20
1 10
2 0
dtype: int64
```

NOW we can add them together and they will add up row-by-row:

```
[10]:
```

```
attendees['prizes'] = attendees['prizes'] + arrival_prizes
attendees
```

```
[10]:
```

index | names | prizes | arrival_order | |
---|---|---|---|---|

0 | 1 | Kumar | 20 | 1 |

1 | 0 | Jill | 10 | 2 |

2 | 2 | Zaira | 0 | 3 |

And we get the result we expected!

### When Index Alignment Comes Up¶

The other thing about index alignment is that it thankfully doesn’t come up all that often. Indeed, that’s why it’s often not emphasized in intro exercises. **That’s because different columns in the same DataFrame always share the same index, so when you execute operations using columns from the same DataFrame, index alignment looks like order alignment.** This issue only comes up with you are doing an operation on `Series`

that are *not* from the same `DataFrame`

.