The following is the rule of using ddof in np.std()
in Numpy.
Rule 1: If you are calculating standard deviation for a sample, set ddof = 1 in np.std().
np.std(sample_name, ddof=1)
Rule 2: If you are calculating standard deviation for a population, set ddof = 0 in np.std().
np.std(population_name, ddof=0)
Example of ddof = 1
The following is the Python code example for ddof = 1 in np.std(). That is, this is to show how you can use np.std() to calculate standard deviation for a sample.
# import numpy
import numpy as np
# set seed, and you can change the number of 10
np.random.seed(10)
# Generate 5 numbers following standard normal distribution
Array_numbers = np.random.randn(5)
print("Array of Numbers: \n", Array_numbers)
# setting ddof=1, if Array_numbers is a sample
print("Use np.std for a sample (ddof=1): \n",np.std(Array_numbers,ddof=1))
The following is the output, and we can see the standard deviation for this sample is 1.09667.
Array of Numbers: [ 1.3315865 0.71527897 -1.54540029 -0.00838385 0.62133597] Use np.std for a sample (ddof=1): 1.0966713483434376
Example of ddof = 0
The following is the Python code example for ddof = 0 in np.std(). That is, this is to show how you can use np.std() to calculate standard deviation for a population.
# import numpy
import numpy as np
# set seed, and you can change the number of 10
np.random.seed(10)
# Generate 10 numbers following standard normal distribution
Array_numbers = np.random.randn(10)
print("Array of Numbers: \n", Array_numbers)
# setting ddof=0, if Array_numbers is a population
print("Use np.std for a population (ddof=0): \n",np.std(Array_numbers,ddof=0))
The following is the output, and we can see the standard deviation for this population is 0.7519756.
Array of Numbers: [ 1.3315865 0.71527897 -1.54540029 -0.00838385 0.62133597 -0.72008556 0.26551159 0.10854853 0.00429143 -0.17460021] Use np.std for a population (ddof=0): 0.7519756036909285
Formulas for np.std()
1. General Formula
The following is the full formula for np.std().
\[\sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}\]
where
- \( x_i \): The ith element in the data set
- \( \bar{x} \): the mean of the data set
- N: the number of elements in the data set
2. For population SD
When you have the whole population, you do NOT need ddof=1 because we do not need to estimate the mean of the population (we already have all the data in the population).
In this case, ddof=0
and the formula below is to calculate SD for population data.
\[ population: \sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}=\sqrt{\frac{1}{N} \sum_{i=1}^N (x_i – \overline{x})^2}\]
3. For sample SD
When calculating the standard deviation for a sample, you need to set ddof=1. Here, ddof = 1 means that you use 1 degree of freedom from the sample to estimate the population mean.
In this case, ddof=
1. The following is the formula to calculate SD for a sample.
\[ sample: \sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}=\sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i – \overline{x})^2}\]
Calculate Population SD from Scratch
We can also write a function to calculate population standard deviation (SD) from scratch in Python. The following is the full Python code.
# import numpy
import numpy as np
# set seed, and you can change the number of 10
np.random.seed(10)
# Generate 10 numbers following standard normal distribution
Array_numbers = np.random.randn(10)
print("Array of Numbers: \n", Array_numbers)
# setting ddof=0, if we assume it is a population
print("Use np.std for a population (ddof=0): \n",np.std(Array_numbers,ddof=0))
# Standard deviation function from scratch for a population
mean_number=np.mean(Array_numbers)
sd_from_scratch_population=np.sqrt((1/len(Array_numbers))*np.sum(np.square(Array_numbers-mean_number)))
print('SD function from scratch for a population:\n',sd_from_scratch_population)
The following is the output. We can see np.std() and our Python function from scratch for the population SD reach the same number (i.e., 0.7519756).
Array of Numbers: [ 1.3315865 0.71527897 -1.54540029 -0.00838385 0.62133597 -0.72008556 0.26551159 0.10854853 0.00429143 -0.17460021] Use np.std for a population (ddof=0): 0.7519756036909285 SD function from scratch for a population: 0.7519756036909285
Calculate Sample SD from Scratch
We can also write a function to calculate sample standard deviation (SD) from scratch in Python. The following is the full Python code.
# import numpy
import numpy as np
# set seed, and you can change the number of 10
np.random.seed(10)
# Generate 5 numbers following standard normal distribution
Array_numbers = np.random.randn(5)
print("Array of Numbers: \n", Array_numbers)
# setting ddof=1, if we assume it is a sample
print("Use np.std for a sample (ddof=1): \n",np.std(Array_numbers,ddof=1))
# Standard deviation function from scratch for a population
mean_number=np.mean(Array_numbers)
sd_from_scratch_sample=np.sqrt((1/(len(Array_numbers)-1))*np.sum(np.square(Array_numbers-mean_number)))
print('SD function from scratch for a sample:\n',sd_from_scratch_sample)
The following is the output. We can see np.std() and the Python function from scratch for a sample SD reach the same number (i.e., 1.09667).
Array of Numbers: [ 1.3315865 0.71527897 -1.54540029 -0.00838385 0.62133597] Use np.std for a sample (ddof=1): 1.0966713483434376 SD function from scratch for a sample: 1.0966713483434376