What is the difference between `sep` and `delimiter` attributes in read_csv() and read_table() in Pandas

This tutorial explains the difference between sep and delimiter in read_csv() and read_table() in Pandas. In short, `sep` and `delimiter` are the same in both read_csv() and read_table() functions in Pandas. You can use either one of them. In both function description, you can see the following statement. delimiterstr, default None Alias for sep. Brand … Read more

How to Read Text (txt) Files in Pandas

This tutorial uses example Python codes to show 2 methods to read a text (txt) file into the Python programming environment. The following is the screenshot of the txt file. Method 1: Use read_csv() function to read txt You can use read_csv() function to read txt files as well. The basic syntax structure is as … Read more

How to Use groupby() in Pandas Dataframes

You can call groupby() and pass the name of the column that you want to group on in Pandas. Then, you need to specify the columns on which you want to perform the aggregation. The following is the basic syntax grammar. df.groupby(“column_name1”)[“column_name2”].function The following is to generate the sample dataframe. Brand Location Number 0 Brand … Read more

How to Create a Contingency Table in Pandas

Introduction of crosstab() function You can use the pandas.crosstab() function to create a contingency table. It computes a simple cross tabulation of two (or more) factors. The following is the sample data Brand Location Number 0 Brand 1 CA 200 1 Brand 1 CA 20 2 Brand 2 CA 300 3 Brand 1 NY 400 4 Brand … Read more

How to Combine Pandas Dataframe and Numpy Matrix

You can combine Pandas dataframes and Numpy Matrices by using the pd.concat() function in Pandas. pd.concat([df,pd.DataFrame(Matrix)],axis=1) The following are the steps to combine Pandas dataframe and Numpy matrix. Step 1: Generate a dataframe The following is to generate a dataframe and a matrix first. The following is the print out of the dataframe. Dataframe: Brand … Read more

How to Add Numpy Arrays to a Pandas DataFrame

You can add a NumPy array as a new column to Pandas dataframes by using the tolist() function. The following are the syntax statement as well as examples showing how to actually do it. df[‘new_column_name’] = array_name.tolist() Step 1: Generate a sample dataframe The following is to generate a dataframe and then print it out. … Read more

How to Use Lambda Functions in Python (Pandas)

This short tutorial aims to show how you can use Lambda functions in Python, and especially in Pandas. Introduction The following is the basic structure of Lambda functions: lambda bound_variable: function_or_expression Lambda functions can have any number of arguments but only one expression, typically in one-line expression. The following is an example of adding the number … Read more

How to Check Data Types in Pandas

You can use the function of dtype() to check the data type of columns for Pandas dataframes. You can either check a single column or all the columns. The following is the sample code. Check Data Type for All Columns in Pandas Brand Location Year DateTime 0 Tesla CA 2019 2019-03-10 1 Tesla CA 2018 … Read more

How to Drop Rows or Columns with missing data (NaN) in Pandas

You can drop rows or columns with missing data (e.g., with NaN) using dropna() in Pandas. Drop rows with NaN: df.dropna() Drop columns with NaN: df.dropna(axis=”columns”) Example of dropping rows with NaN By default, dropna() will drop rows that at least have 1 NaN. The following is an example. The following shows the original dataframe … Read more

How to Get Frequency Counts of a Column in Pandas

To get frequency counts of a column in Pandas, you can use the function of value_counts() or groupby().size(). The following shows two actual method examples. Method 1: df[“column_name”].value_counts() Method 2: df.groupby([“column_name”]).size() Data Example The following is to generate a sample dataframe in Python. The following is the sample dataframe to be used later. Brand Location … Read more