Pandas tools you didn’t know you needed 1: Apply
Pandas tools you didn’t know you needed 1: Apply#
Dataframe rows can contain arrays rather than a single value. This is the case with Datajoint’s “blobs.” Let’s examine an instance of such a dataframe I fetched from a datajoint table:
fid |
brain_region |
single_unit |
single_unit_phy |
spike_time |
waveform |
snr |
|
---|---|---|---|---|---|---|---|
0 |
bucket_1_m026_1568757659_ |
pfc |
0 |
12 |
[ 1.71006667 2.14576667 3.35306667 3.6982 22.4359 22.80743333 |
[] |
[] |
26.00536667 26.61663333 28.39046667 29.45536667] |
|||||||
1 |
bucket_1_m026_1568757659_ |
pfc |
1 |
15 |
[17.8181 37.5422 37.90253333 39.36106667 46.77196667 60.7908 |
[] |
[] |
77.62783333 78.71806667 82.33186667 93.15203333] |
|||||||
2 |
bucket_1_m026_1568757659_ |
pfc |
2 |
19 |
[ 1.552 115.68683333 115.69416667 160.0344 342.9736 |
[] |
[] |
346.51526667 346.8301 348.25513333 348.2767 348.29066667] |
|||||||
3 |
bucket_1_m026_1568757659_ |
pfc |
3 |
37 |
[ 64.1048 145.53183333 185.3421 187.57793333 281.31683333 |
[] |
[] |
304.80466667 326.5742 339.3119 348.08556667 350.2595 ] |
|||||||
4 |
bucket_1_m026_1568757659_ |
pfc |
4 |
40 |
[ 0.66776667 3.9761 4.8187 16.72106667 22.01286667 25.6975 |
[] |
[] |
29.069 29.25703333 30.07223333 31.55446667] |
We’d like to get the interspike intervals.
In this case,
np.diff(df['spike_time'])
will not work as it will attempt to work across rows:
37.98336667, 51.62246667, 52.10143333, 53.9414 , 63.69666667]),
array([-16.2661 , 78.14463333, 77.79163333, 120.67333333,
296.20163333, 285.72446667, 269.20226667, 269.53706667,
265.94483333, 255.13863333]),
array([ 62.5528 , 29.845 , 69.64793333, 27.54353333,
-61.65676667, -41.7106 , -20.2559 , -8.94323333,
-0.19113333, 1.96883333]),
...,
array([29.80516667, 38.70003333, 40.26796667, 53.5446 , 45.7118 ,
55.4116 , 55.9258 , 79.12003333, 77.03993333, 83.48073333]),
array([-11.45576667, -11.35606667, -9.69276667, -23.187 ,
-22.03416667, -31.00213333, -29.06093333, -51.92653333,
-51.33696667, -57.59203333]),
array([ 72.3652 , 92.1018 , 157.1723 , 173.74453333,
185.13646667, 188.13473333, 187.84116667, 187.3681 ,
206.6341 , 206.1082 ])], dtype=object)```
You can use a `for` loop, but there is an easier way.
To get our operation of interest to work row by row without looping, use `apply`:
```python
df['ISI']=df['spike_time'].apply(np.diff)
df['ISI']
fid |
brain_region |
single_unit |
single_unit_phy |
spike_time |
waveform |
snr |
ISI |
|
---|---|---|---|---|---|---|---|---|
0 |
bucket_1_m026_1568757659_ |
pfc |
0 |
12 |
[ 1.71006667 2.14576667 3.35306667 3.6982 22.4359 22.80743333 |
[] |
[] |
[ 0.4357 1.2073 0.34513333 18.7377 0.37153333 3.19793333 |
26.00536667 26.61663333 28.39046667 29.45536667] |
0.61126667 1.77383333 1.0649 ] |
|||||||
1 |
bucket_1_m026_1568757659_ |
pfc |
1 |
15 |
[17.8181 37.5422 37.90253333 39.36106667 46.77196667 60.7908 |
[] |
[] |
[19.7241 0.36033333 1.45853333 7.4109 14.01883333 16.83703333 |
77.62783333 78.71806667 82.33186667 93.15203333] |
1.09023333 3.6138 10.82016667] |
|||||||
2 |
bucket_1_m026_1568757659_ |
pfc |
2 |
19 |
[ 1.552 115.68683333 115.69416667 160.0344 342.9736 |
[] |
[] |
[1.14134833e+02 7.33333333e-03 4.43402333e+01 1.82939200e+02 |
346.51526667 346.8301 348.25513333 348.2767 348.29066667] |
3.54166667e+00 3.14833333e-01 1.42503333e+00 2.15666667e-02 |
|||||||
1.39666667e-02] |
||||||||
3 |
bucket_1_m026_1568757659_ |
pfc |
3 |
37 |
[ 64.1048 145.53183333 185.3421 187.57793333 281.31683333 |
[] |
[] |
[81.42703333 39.81026667 2.23583333 93.7389 23.48783333 21.76953333 |
304.80466667 326.5742 339.3119 348.08556667 350.2595 ] |
12.7377 8.77366667 2.17393333] |
|||||||
4 |
bucket_1_m026_1568757659_ |
pfc |
4 |
40 |
[ 0.66776667 3.9761 4.8187 16.72106667 22.01286667 25.6975 |
[] |
[] |
[ 3.30833333 0.8426 11.90236667 5.2918 3.68463333 3.3715 |
29.069 29.25703333 30.07223333 31.55446667] |
0.18803333 0.8152 1.48223333] |
This get the mean without having to resort to looping over rows. If you have a more complex procedure in mind, simply define it before using apply
:
def inverse_isi(spikes):
"""
Inverse mean ISI of spikes is an alternative measure of firing rate
"""
return 1/np.mean(np.diff(spikes))
and then apply it in the same way
df['ISI-based firing rate']=df['blob_column'].apply(inverse_isi)