Pandas tools you didn’t know you needed 1: Apply#

Dataframe rows can contain arrays rather than a single value. This is the case with Datajoint’s “blobs.” Let’s examine an instance of such a dataframe I fetched from a datajoint table:

	fid	brain_region	single_unit	single_unit_phy	spike_time	waveform	snr
0	bucket_1_m026_1568757659_	pfc	0	12	[ 1.71006667 2.14576667 3.35306667 3.6982 22.4359 22.80743333	[]	[]
					26.00536667 26.61663333 28.39046667 29.45536667]
1	bucket_1_m026_1568757659_	pfc	1	15	[17.8181 37.5422 37.90253333 39.36106667 46.77196667 60.7908	[]	[]
					77.62783333 78.71806667 82.33186667 93.15203333]
2	bucket_1_m026_1568757659_	pfc	2	19	[ 1.552 115.68683333 115.69416667 160.0344 342.9736	[]	[]
					346.51526667 346.8301 348.25513333 348.2767 348.29066667]
3	bucket_1_m026_1568757659_	pfc	3	37	[ 64.1048 145.53183333 185.3421 187.57793333 281.31683333	[]	[]
					304.80466667 326.5742 339.3119 348.08556667 350.2595 ]
4	bucket_1_m026_1568757659_	pfc	4	40	[ 0.66776667 3.9761 4.8187 16.72106667 22.01286667 25.6975	[]	[]
					29.069 29.25703333 30.07223333 31.55446667]

We’d like to get the interspike intervals. In this case, np.diff(df['spike_time']) will not work as it will attempt to work across rows:

       37.98336667, 51.62246667, 52.10143333, 53.9414    , 63.69666667]),
       array([-16.2661    ,  78.14463333,  77.79163333, 120.67333333,
       296.20163333, 285.72446667, 269.20226667, 269.53706667,
       265.94483333, 255.13863333]),
       array([ 62.5528    ,  29.845     ,  69.64793333,  27.54353333,
       -61.65676667, -41.7106    , -20.2559    ,  -8.94323333,
        -0.19113333,   1.96883333]),
       ...,
       array([29.80516667, 38.70003333, 40.26796667, 53.5446    , 45.7118    ,
       55.4116    , 55.9258    , 79.12003333, 77.03993333, 83.48073333]),
       array([-11.45576667, -11.35606667,  -9.69276667, -23.187     ,
       -22.03416667, -31.00213333, -29.06093333, -51.92653333,
       -51.33696667, -57.59203333]),
       array([ 72.3652    ,  92.1018    , 157.1723    , 173.74453333,
       185.13646667, 188.13473333, 187.84116667, 187.3681    ,
       206.6341    , 206.1082    ])], dtype=object)```


You can use a `for` loop, but there is an easier way.

To get our operation of interest to work row by row without looping, use `apply`:
```python
df['ISI']=df['spike_time'].apply(np.diff)
df['ISI']

	fid	brain_region	single_unit	single_unit_phy	spike_time	waveform	snr	ISI
0	bucket_1_m026_1568757659_	pfc	0	12	[ 1.71006667 2.14576667 3.35306667 3.6982 22.4359 22.80743333	[]	[]	[ 0.4357 1.2073 0.34513333 18.7377 0.37153333 3.19793333
					26.00536667 26.61663333 28.39046667 29.45536667]			0.61126667 1.77383333 1.0649 ]
1	bucket_1_m026_1568757659_	pfc	1	15	[17.8181 37.5422 37.90253333 39.36106667 46.77196667 60.7908	[]	[]	[19.7241 0.36033333 1.45853333 7.4109 14.01883333 16.83703333
					77.62783333 78.71806667 82.33186667 93.15203333]			1.09023333 3.6138 10.82016667]
2	bucket_1_m026_1568757659_	pfc	2	19	[ 1.552 115.68683333 115.69416667 160.0344 342.9736	[]	[]	[1.14134833e+02 7.33333333e-03 4.43402333e+01 1.82939200e+02
					346.51526667 346.8301 348.25513333 348.2767 348.29066667]			3.54166667e+00 3.14833333e-01 1.42503333e+00 2.15666667e-02
								1.39666667e-02]
3	bucket_1_m026_1568757659_	pfc	3	37	[ 64.1048 145.53183333 185.3421 187.57793333 281.31683333	[]	[]	[81.42703333 39.81026667 2.23583333 93.7389 23.48783333 21.76953333
					304.80466667 326.5742 339.3119 348.08556667 350.2595 ]			12.7377 8.77366667 2.17393333]
4	bucket_1_m026_1568757659_	pfc	4	40	[ 0.66776667 3.9761 4.8187 16.72106667 22.01286667 25.6975	[]	[]	[ 3.30833333 0.8426 11.90236667 5.2918 3.68463333 3.3715
					29.069 29.25703333 30.07223333 31.55446667]			0.18803333 0.8152 1.48223333]

This get the mean without having to resort to looping over rows. If you have a more complex procedure in mind, simply define it before using apply:

def inverse_isi(spikes):
    """
    Inverse mean ISI of spikes is an alternative measure of firing rate
    """
    return 1/np.mean(np.diff(spikes))

and then apply it in the same way

df['ISI-based firing rate']=df['blob_column'].apply(inverse_isi)

Maxym "Max" Myroshnychenko

Pandas tools you didn’t know you needed 1: Apply

Pandas tools you didn’t know you needed 1: Apply#