pandas groupby - How to merge a grouped and aggregated df?

I have grouped and aggregated transactions per account number (to calculate monthly statisitics) and now I want to merge the output with another dataframe on account numbers. The account numbers are however no longer in the index/columns. Group transactions per account and month and perform aggregated calculationsdf1 = df.groupby(['AcctNr','Month']).sum().groupby(level=0).agg({'Amount': 'mean', 'median', max, 'std', percentile(75), iqr]})df1.columns = ["_".join(x) for x in df1.columns.ravel()]This results in the following results from df1.colum...Read more

How to do a mean and a max in the same time doing a groupby?

I have the same list of destinations (Italy, Greece, Spain,etc.) included in a list of airline companies (Easyjet, Panam, Ryan Air, etc.). Each of these destinations has a length.Depending of the company, the same destination can have a different length so:1. I want to find the mean of the destinations' length 2. I want to find the 5 longest destinationsAs a beginner in pandas, I'm wondering if there is a way to calculate this with a one and only groupby?Thanks for your help!...Read more

Assign values to pandas column based on the index of the groupby operation

I have a dataframe like so:rng = np.random.RandomState(0)tf = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'], 'data1': [1,2,3,2,2,3], 'data2': rng.randint(0, 10, 6)}, columns = ['key', 'data1', 'data2'])tf key data1 data20 A 1 51 B 2 02 C 3 33 A 2 34 B 2 75 C 3 9If I have an array x = np.arange(4) which is the same length as the number of groups in column ['key', 'data1']. grouped = tf.groupby(['key', 'data1'])print(grouped.get_group(('A', 1)), '...Read more

Pandas series.groupby().apply( .sum() ), .sum() not summing values

I have the following test code: import pandas as pd import numpy as npdf = pd.DataFrame({'MONTH': [1,2,3,1,1,1,1,1,1,2,3,2,2,3,2,1,1,1,1,1,1,1], 'HOUR': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 'CIGFT': [np.NaN,12000,2500,73300,73300,np.NaN,np.NaN,np.NaN,np.NaN,12000,100,100,15000,2500,np.NaN,15000,11000,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]})cigs = pd.DataFrame()cigs['cigsum'] = df.groupby(['MONTH','HOUR'])['CIGFT'].apply(lambda c: (c>=0.0).sum())cigs['cigcount'] = df.groupby(['MONTH','HOUR'])...Read more

quantile method on groupby of xarray dataset

I have a classic xarray Dataset. These are monthly data (38 years of monthly data).I am interested in calculating the quantile values for each month separately.<xarray.Dataset>Dimensions: (lat: 26, lon: 71, time: 456)Coordinates: * lat (lat) float32 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 ... * lon (lon) float32 -130.0 -129.0 -128.0 -127.0 -126.0 -125.0 ... * time (time) datetime64[ns] 1979-01-31 1979-02-28 1979-03-31 ...Data variables: var1 (time, lat, lon) float32 nan nan nan nan nan ...Read more

Including NaN values in function applied to Pandas GroupBy object

I would like to calculate the mean of replicate measurements and return NaN when one or both replicates have an NaN value. I am aware that groupby excludes NaN values, but it took me some time to realize apply was doing the same thing. Below is an example of my code. It only returns NaN when both replicates have missing data. In this example I would like it to return NaN for Sample 1, Assay 2. Instead, it is behaving as if I applied np.nanmean and returns the one nonzero element, 27.0. Any ideas on a strategy to include NaN values in the functi...Read more

How to access a column of grouped data to perform linear regression in pandas?

I want to perform a linear regression on groupes of grouped data frame in pandas. The function I am calling throws a KeyError that I cannot resolve.I have an environmental data set called dat that includes concentration data of a chemical in different tree species of various age classes in different country sites over the course of several time steps. I now want to do a regression of concentration over time steps within each group of (site, species, age). This is my code:```import pandas as pdimport statsmodels.api as smdat = pd.read_csv('data....Read more