Pandas: How to Select Rows Based on Column Values
This tutorial includes methods that you can select rows based a specific column value or a few column values by using loc() or query() in Python Pandas.
This tutorial includes methods that you can select rows based a specific column value or a few column values by using loc() or query() in Python Pandas.
This short tutorial shows you how you can use melt() funtion in Pandas. It is often used when we need to change the format of dataframe to fit into a certain statistical functions. Example 1 of Using melt() City1 City2 City3 0 6 2 4 1 2 1 1 2 3 3 2 3 4 … Read more
One-Way ANOVA is to compare the means of different groups, to see whether the mean difference is statistically significant. For instance, you would like to compare the average household size of three cities. You can collect 3 samples from these three cities and conduct a one-way ANOVA to check the difference. Formulas of One-way ANOVA … Read more
What is Correlation? Correlation is a statistical measure of the relationship between two variables, X and Y. For instance, you can measure to what extent temperature (X) is related to the production of ice cream (Y). You probably would expect that higher temperatures correspond with higher production of ice cream. On the plot shown below, … Read more
How to Write Null and Alternative Hypotheses
Since both correlation and t-test are about relationships between X and Y, what is the difference between them and when do you use t-test (or correlation)? This tutorial aims to answer these two questions. The following figure presents the difference between t-test and correlation. In particular, t-test deals with situations where X is a binary … Read more
This tutorial explains what t-test is, and the difference between independent sample t-test and paired sample t-test. It also explains what two-sample and one-sample t-test are. What is independent sample t-test? Indepdent sample t-test examines whether the means from 2 separate groups of people or objects are statistically significantly different. That is, we calculate two … Read more
This short tutorial shows how you can calculate standard deviation in Python using NumPy.
This short tutorial shows how you can calculate mean in Python using NumPy. First, we generate the random data with mean of 5 and standard deviation (SD) of 1. Then, you can use the numpy is mean() function. As you can see, the mean of the sample is close to 5. 4.943504497663466 Regarding of how … Read more
This tutorial shows how to generate a sample of normal distrubution using NumPy in Python. The following shows syntax of two methods. Method 1: It can change the default values (Default: mu=0 and sd=1). np.random.normal(mu=0, sigma=1, size) Method 2: It can only generate numbers of standard normal (mu=0 and sd=1). But, it can have different … Read more