How to Create Dummy Variable in Python

This tutorial shows two methods of creating dummy variables in Python. The following shows the key syntax. Method 1: Use Numpy.where() to create a dummy variable np.where(df[‘column_of_interest’] == ‘value’ ,1,0) Method 2: Use apply() and lambda function to create a dummy variable df[‘column_of_interest’].apply(lambda x: 1 if x==’value’ else 0) Example 1: Use numpy.where() to create … Read more

Linear Regression: Python Numpy Implementation from Scratch

This tutorial shows how you can conduct linear regression Python Numpy from scratch. 1. Math and Matrix of Linear Regression We can use just use pure matrix calculation to estimate the regression coefficients in a linear regression model. Below is the process. Thus, we can simplify the function above to the function below. We can … Read more

How to Use numpy.random.seed()

numpy.random.seed() provides a seed, which acts as a starting point number generator algorithm. For the same seed, we will always get the same set of random numbers on any machine. If you prefer to have different sets of random numbers every time you run the code, do not set the seed. In contrast, if you … Read more

Use seaborn to Plot Histogram in Python (3 Examples)

Introduction You can use histplot() from seaborn module to do the histogram plot. The following provides 3 examples. The following is the basic syntax of using histplot() for the examples. Example 1: Core syntax sns.histplot(data=dataset, x=’column_name’) Example 2: Group by the histogram sns.histplot(data=dataset, x=’column_name’, hue=’column_groupby’) Example 3: Add a kernel density estimate sns.histplot(data=dataset, x=’column_name’, kde=True) … Read more

Built-in Sample Datasets in Python

There are built-in datasets in Python and you can use them to do some practice. In doing so, you do not need to import external datasets. The following provides a list of built-in sample datasets in Python. 1. penguins in seaborn The penguins dataset was collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER. … Read more

Calculate Means Group by Two Columns in Pandas (3 Examples)

The following provides 3 different methods of calculating means group by two Columns in Python. Method 1: df.groupby([“column_1″,”column_2”]).mean() Method 2: df.groupby([“column_1″,”column_2”]).agg(‘mean’) Method 3: pd.crosstab(index=df[‘column_1’], columns=df[‘column_2’],values=df[‘dv’],aggfunc=’mean’) Prepare the data Output: city store sales 0 City1 store1 10 1 City1 store2 20 2 City1 store1 20 3 City1 store2 50 4 City1 store1 30 5 City2 store2 10 … Read more

Plot Two-Way ANOVA in Python (with Example)

This tutorial shows how you can plot Two-Way ANOVA interaction in Python. In particular, you can use interaction_plot() function from statsmodels.graphics to plot the Two-way ANOVA. Step 1: Prepare the data Suppose that there are two categorical variables, namely city (city 1 and city 2) and store (store 1 and store 2). The dependent variable … Read more

Python: Type I, Type II, and Type III ANOVA

1. Introduction Type I, Type II, and Type III ANOVA are 3 different ways of calculating sum of squares in ANOVA. Type I ANOVA: SS(A) for factor A SS(B | A) for factor B SS(AB | A, B) for interaction AB Type II ANOVA: SS(A | B) for factor A SS(B | A) for factor … Read more

Outer Merge in Pandas

Introduction Outer Merge returns all records from both the left or right dataframes. When rows in one dataframe do not match another dataframe, the merged dataframe will have NaN for the cells. We can use how=’outer’ in merge() to outer merge two dataframes in Pandas. The basic syntax is as follows, in which df_1 and df_2 … Read more

Left Merge in Pandas Python

We can use how=’left’ tells merge() to left merge two dataframes. The following is the Pandas syntax, in which df_1 and df_2 are two dataframes to be merged. df_1.merge(df_2, how=’left’, left_index=True, right_index=True) Step 1: Prepare the data to be left merged The following is the two dataframes to be left merged. df_1: Brand Location a … Read more