Method 1: Mean centering just one column in a dataframe
You can use mean()
function to do mean centering for one column in dataframes in Python Pandas. Below, we generate a sample data first.
# Generate a sample data
import pandas as pd
new_df=pd.DataFrame({'Col_1':[20,10,50,30],
'Col_2':[50,50.5,88,99]})
# Print it out
print(new_df)
Col_1 Col_2 0 20 50.0 1 10 50.5 2 50 88.0 3 30 99.0
The following is to use the function of mean()
to mean center column of col_1
and add the centered column as a new column.
# Mean centering one column
new_df["Col_1_centered"]=new_df["Col_1"]-new_df["Col_1"].mean()
print(new_df)
Col_1 Col_2 Col_1_centered 0 20 50.0 -7.5 1 10 50.5 -17.5 2 50 88.0 22.5 3 30 99.0 2.5
We can also check whether it works as planned.
# double check whether we did it correctly
new_df.mean()
Col_1 27.500 Col_2 71.875 Col_1_centered 0.000 dtype: float64
Method 2: Mean centering all columns in a dataframe
We can use lambda
and mean()
to do mean centering for all column in Pandas dataframes.
# Generate a sample data
import pandas as pd
new_df=pd.DataFrame({'Col_1':[20,10,50,30],
'Col_2':[50,50.5,88,99]})
# Print it out
print(new_df)
Col_1 Col_2 0 20 50.0 1 10 50.5 2 50 88.0 3 30 99.0
# a combination of lambda and mean()
new_df_centered = new_df.apply(lambda column: column -column.mean())
print(new_df_centered)
Col_1 Col_2 0 -7.5 -21.875 1 -17.5 -21.375 2 22.5 16.125 3 2.5 27.125
# double check whether we did it correctly
new_df_centered.mean()
Col_1 0.0 Col_2 0.0 dtype: float64