When to Use ddof=1 in np.std()

The following is the rule of using ddof in np.std() in Numpy.

Rule 1: If you are calculating standard deviation for a sample, set ddof = 1 in np.std().

np.std(sample_name, ddof=1)

Rule 2: If you are calculating standard deviation for a population, set ddof = 0 in np.std().

np.std(population_name, ddof=0)


Example of ddof = 1

The following is the Python code example for ddof = 1 in np.std(). That is, this is to show how you can use np.std() to calculate standard deviation for a sample.

# import numpy
import numpy as np

# set seed, and you can change the number of 10 
np.random.seed(10)

# Generate 5 numbers following standard normal distribution
Array_numbers = np.random.randn(5) 
print("Array of Numbers: \n", Array_numbers)

# setting ddof=1, if Array_numbers is a sample
print("Use np.std for a sample (ddof=1): \n",np.std(Array_numbers,ddof=1))

The following is the output, and we can see the standard deviation for this sample is 1.09667.

Array of Numbers: 
 [ 1.3315865   0.71527897 -1.54540029 -0.00838385  0.62133597]

Use np.std for a sample (ddof=1): 
 1.0966713483434376

Example of ddof = 0

The following is the Python code example for ddof = 0 in np.std(). That is, this is to show how you can use np.std() to calculate standard deviation for a population.

# import numpy
import numpy as np

# set seed, and you can change the number of 10 
np.random.seed(10)

# Generate 10 numbers following standard normal distribution
Array_numbers = np.random.randn(10) 
print("Array of Numbers: \n", Array_numbers)

# setting ddof=0, if Array_numbers is a population
print("Use np.std for a population (ddof=0): \n",np.std(Array_numbers,ddof=0))

The following is the output, and we can see the standard deviation for this population is 0.7519756.

Array of Numbers: 
 [ 1.3315865   0.71527897 -1.54540029 -0.00838385  0.62133597 -0.72008556
  0.26551159  0.10854853  0.00429143 -0.17460021]

Use np.std for a population (ddof=0): 
 0.7519756036909285

Formulas for np.std()

1. General Formula

The following is the full formula for np.std().

\[\sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}\]

where

  • \( x_i \): The ith element in the data set
  • \( \bar{x} \): the mean of the data set
  • N: the number of elements in the data set

2. For population SD

When you have the whole population, you do NOT need ddof=1 because we do not need to estimate the mean of the population (we already have all the data in the population).

In this case, ddof=0 and the formula below is to calculate SD for population data.

\[ population: \sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}=\sqrt{\frac{1}{N} \sum_{i=1}^N (x_i – \overline{x})^2}\]

3. For sample SD

When calculating the standard deviation for a sample, you need to set ddof=1. Here, ddof = 1 means that you use 1 degree of freedom from the sample to estimate the population mean.

In this case, ddof=1. The following is the formula to calculate SD for a sample.

\[ sample: \sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}=\sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i – \overline{x})^2}\]


Calculate Population SD from Scratch

We can also write a function to calculate population standard deviation (SD) from scratch in Python. The following is the full Python code.

# import numpy
import numpy as np

# set seed, and you can change the number of 10 
np.random.seed(10)

# Generate 10 numbers following standard normal distribution
Array_numbers = np.random.randn(10) 
print("Array of Numbers: \n", Array_numbers)

# setting ddof=0, if we assume it is a population
print("Use np.std for a population (ddof=0): \n",np.std(Array_numbers,ddof=0))

# Standard deviation function from scratch for a population
mean_number=np.mean(Array_numbers)
sd_from_scratch_population=np.sqrt((1/len(Array_numbers))*np.sum(np.square(Array_numbers-mean_number)))
print('SD function from scratch for a population:\n',sd_from_scratch_population)

The following is the output. We can see np.std() and our Python function from scratch for the population SD reach the same number (i.e., 0.7519756).

Array of Numbers: 
 [ 1.3315865   0.71527897 -1.54540029 -0.00838385  0.62133597 -0.72008556
  0.26551159  0.10854853  0.00429143 -0.17460021]

Use np.std for a population (ddof=0): 
 0.7519756036909285

SD function from scratch for a population:
 0.7519756036909285

Calculate Sample SD from Scratch

We can also write a function to calculate sample standard deviation (SD) from scratch in Python. The following is the full Python code.

# import numpy
import numpy as np

# set seed, and you can change the number of 10 
np.random.seed(10)

# Generate 5 numbers following standard normal distribution
Array_numbers = np.random.randn(5) 
print("Array of Numbers: \n", Array_numbers)

# setting ddof=1, if we assume it is a sample
print("Use np.std for a sample (ddof=1): \n",np.std(Array_numbers,ddof=1))

#  Standard deviation function from scratch for a population
mean_number=np.mean(Array_numbers)
sd_from_scratch_sample=np.sqrt((1/(len(Array_numbers)-1))*np.sum(np.square(Array_numbers-mean_number)))
print('SD function from scratch for a sample:\n',sd_from_scratch_sample)

The following is the output. We can see np.std() and the Python function from scratch for a sample SD reach the same number (i.e., 1.09667).

Array of Numbers: 
 [ 1.3315865   0.71527897 -1.54540029 -0.00838385  0.62133597]

Use np.std for a sample (ddof=1): 
 1.0966713483434376

SD function from scratch for a sample:
 1.0966713483434376