How to Mean Centering in Pandas

Method 1: Mean centering just one column in a dataframe

You can use mean() function to do mean centering for one column in dataframes in Python Pandas. Below, we generate a sample data first.

# Generate a sample data
import pandas as pd
new_df=pd.DataFrame({'Col_1':[20,10,50,30],
       'Col_2':[50,50.5,88,99]})

# Print it out 
print(new_df)
   Col_1  Col_2
0     20   50.0
1     10   50.5
2     50   88.0
3     30   99.0

The following is to use the function of mean() to mean center column of col_1 and add the centered column as a new column.

# Mean centering one column
new_df["Col_1_centered"]=new_df["Col_1"]-new_df["Col_1"].mean()
print(new_df)
   Col_1  Col_2  Col_1_centered
0     20   50.0            -7.5
1     10   50.5           -17.5
2     50   88.0            22.5
3     30   99.0             2.5

We can also check whether it works as planned.

# double check whether we did it correctly
new_df.mean()
Col_1             27.500
Col_2             71.875
Col_1_centered     0.000
dtype: float64

Method 2: Mean centering all columns in a dataframe

We can use lambda and mean() to do mean centering for all column in Pandas dataframes.

# Generate a sample data
import pandas as pd
new_df=pd.DataFrame({'Col_1':[20,10,50,30],
       'Col_2':[50,50.5,88,99]})

# Print it out 
print(new_df)
   Col_1  Col_2
0     20   50.0
1     10   50.5
2     50   88.0
3     30   99.0
# a combination of lambda and mean()
new_df_centered = new_df.apply(lambda column: column -column.mean())
print(new_df_centered)
   Col_1   Col_2
0   -7.5 -21.875
1  -17.5 -21.375
2   22.5  16.125
3    2.5  27.125
# double check whether we did it correctly
new_df_centered.mean()
Col_1    0.0
Col_2    0.0
dtype: float64

Other Resource

How to use Lambda functions in Python (Pandas)