Pandas: How to Select Rows Based on Column Values
This tutorial includes methods that you can select rows based a specific column value or a few column values by using loc() or query() in Python Pandas.
This tutorial includes methods that you can select rows based a specific column value or a few column values by using loc() or query() in Python Pandas.
This tutorial shows how to apply correlation in real world. I will use the Peloton and covid example to illustrate this concept. In early 2021, a lot of consumers wanted to buy Peloton bikes but Peloton had difficulty meeting the demand. One year after in Jan. 2022 , Forbes instead reported that Peloton faced challenges … Read more
This short tutorial shows you how you can use melt() funtion in Pandas. It is often used when we need to change the format of dataframe to fit into a certain statistical functions. Example 1 of Using melt() City1 City2 City3 0 6 2 4 1 2 1 1 2 3 3 2 3 4 … Read more
One-Way ANOVA is to compare the means of different groups, to see whether the mean difference is statistically significant. For instance, you would like to compare the average household size of three cities. You can collect 3 samples from these three cities and conduct a one-way ANOVA to check the difference. Formulas of One-way ANOVA … Read more
This page includes statistics formulas in raw LaTex code. It is painful sometimes to write a complex formula and thus I hope this page is useful for those need to write them. In case you need to find symbols in LaTex, this linked pdf could be useful. Latex Code for Correlation Formula The following are … Read more
What is Correlation? Correlation is a statistical measure of the relationship between two variables, X and Y. For instance, you can measure to what extent temperature (X) is related to the production of ice cream (Y). You probably would expect that higher temperatures correspond with higher production of ice cream. On the plot shown below, … Read more
How to Write Null and Alternative Hypotheses
Since both correlation and t-test are about relationships between X and Y, what is the difference between them and when do you use t-test (or correlation)? This tutorial aims to answer these two questions. The following figure presents the difference between t-test and correlation. In particular, t-test deals with situations where X is a binary … Read more
This tutorial explains what t-test is, and the difference between independent sample t-test and paired sample t-test. It also explains what two-sample and one-sample t-test are. What is independent sample t-test? Indepdent sample t-test examines whether the means from 2 separate groups of people or objects are statistically significantly different. That is, we calculate two … Read more
This tutorial explains the difference between scatter plots and line charts in data visualization. I will use actual data and Python code to illustrate the nuanced difference between them. Data is pulled from Github. It includes keywords of Peloton and Covid as the search queries in Google Trends from early 2020 to early 2022. For … Read more