Category: Pandas
Python Code to Plot F-distribution Density Function
import numpy as npimport matplotlib.pyplot as pltfrom scipy import statsdef plot_f_distribution(df1, df2, alpha=0.05): # Create x values for the plot x = np.linspace(0, 5, 1000) # Calculate F-distribution values y = stats.f.pdf(x, df1, df2) # Calculate critical F-value f_crit =...
Read Full Article →
How to Avoid Including Dataframe Index when Saving as CSV file in Python
To avoid including the index column when saving as a CSV file, you can add index=False in df.to_csv() to avoid including the index clumn when saving a Python Pandas dataframe as a CSV file. df.to_csv(‘filename.csv’, index=False) Output: brands models 0...
Read Full Article →
How to Fix: Data must be 1-dimensional
You might encounter the following error when trying to convert Numpy arrays to a pandas dataframe. Exception: Data must be 1-dimensional 1. Reproduce the Error Output: Exception: Data must be 1-dimensional 2. Why the Error Happens It happens because pd.DataFrame...
Read Full Article →
How to Fix: if using all scalar values, you must pass an index
This tutorial shows how to fix the error when using Pandas. if using all scalar values, you must pass an index You encounter this error because you are trying to create a dataframe with all scalar values, but without adding...
Read Full Article →
How to Combine Multiple Numpy Arrays into a Dataframe
This tutorial will show how you can combine multiple arrays (e.g., 2 arrays of X and Y) into a Pandas dataframe. The following summarizes the two methods. Method 1: pd.DataFrame ({‘X’:X,’Y’:Y}) Method 2: combined_array=np.column_stack((X,Y))pd.DataFrame(combined_array, columns = [‘X’,’Y’]) Two Examples of...
Read Full Article →
How to Create Dummy Variable in Python
This tutorial shows two methods of creating dummy variables in Python. The following shows the key syntax. Method 1: Use Numpy.where() to create a dummy variable np.where(df[‘column_of_interest’] == ‘value’ ,1,0) Method 2: Use apply() and lambda function to create a...
Read Full Article →
Calculate Means Group by Two Columns in Pandas (3 Examples)
The following provides 3 different methods of calculating means group by two Columns in Python. Method 1: df.groupby([“column_1″,”column_2”]).mean() Method 2: df.groupby([“column_1″,”column_2”]).agg(‘mean’) Method 3: pd.crosstab(index=df[‘column_1’], columns=df[‘column_2’],values=df[‘dv’],aggfunc=’mean’) Prepare the data Output: city store sales 0 City1 store1 10 1 City1 store2 20 2...
Read Full Article →
Outer Join in Pandas
Outer Join returns all records from both the left or right dataframes. When rows in one dataframe do not match another dataframe, the joined dataframe will have NaN for cells of the unmatched rows. We can use how=’outer’ in join() to...
Read Full Article →
Left Merge in Pandas Python
We can use how='left' tells merge() to left merge two dataframes. The following is the Pandas syntax, in which df_1 and df_2 are two dataframes to be merged. df_1.merge(df_2, how=’left’, left_index=True, right_index=True) Step 1: Prepare the data to be left...
Read Full Article →
How to Save Pandas Dataframe as csv file
To save Pandas dataframe as CSV file, you can use the function of df.to_csv. The following shows the steps. df.to_csv(“file_name.csv”) Step 1: Dataframe example The following Python code is to generate the sample dataframe. The following is the print out...
Read Full Article →