• author: The PyCoach

Working with Pandas AI: A Comprehensive Guide

Are you looking for a Python library that can integrate artificial intelligence capabilities with your data frames? If yes, then you have landed at the right place. In this article, we will introduce a newly released data science library called Pandas AI.

This library enables you to make your data frames conversational, meaning you can talk to your data set and get answers quickly using only simple prompts. Moreover, you can even automatically generate plots with prompts. In this tutorial, we will walk you through working with Pandas AI.

Before we start, we would like to inform you that this tutorial is sponsored by Brilliant.

What is Pandas AI?

Pandas AI is a Python library that integrates artificial intelligence capabilities with Pandas data frames. It allows users to ask questions in plain English and get insights from their data frames immediately, without writing any code.

Getting Started with Pandas AI

To get started with Pandas AI, you can download the official demo notebook from the GitHub repository.

Before working with Pandas AI, you need to install the library by running the following command:

!pipinstallpandas-ai

Import Libraries

After installing Pandas AI, you need to import three libraries:

  1. Pandas: the standard library for data manipulation
  2. Pandas AI: the library we are interested in
  3. Open AI: the API key that we will use to work with Pandas AI
importpandasaspdimportpandas_aiaspaiimportopenai_secret_managerassm

Working with a Sample Data Set

Here is a sample data set we will use to see the strengths and weaknesses of Pandas AI:

Sample Data Set

We will be working with three columns: Gender, Product line, and Total. The data set shows how much each gender spends on each product line.

To perform the tasks in this tutorial, you can load the data set by running the following code:

url='https://raw.githubusercontent.com/priyanka-panag/Marketing-Analysis-Using-Python/master/Sales_Data.csv'df=pd.read_csv(url,error_bad_lines=False)df=df[['Gender','Product line','Total']]df.head()

Using Open AI API Key with Pandas AI

To use Pandas AI, you need an Open AI API key. You can obtain an API key from the Open AI website by following these steps:

  1. Visit the Open AI website.
  2. Create an account and log in.
  3. Go to the API keys page.
  4. Click on "View" to get the SECRET_KEY.

After getting the API key, you can use the following code to instantiate and Open AI object and get the Pandas AI object:

secrets=sm.get_secret("openai")assertsecrets["api_key"],"No API key found in secrets manager"api_key=secrets["api_key"]pai.set_openai_key(api_key)pai.create_profile()

Asking Simple Questions

Now, we can start asking questions to Pandas AI. For instance, we can ask the following question to find unique products in the product line column:

pai.run(df,prompt="Which products are in Product line?")

And we get the answer in no time!

The Product line include:
- Health and beauty
- Electronic accessories
- Home and lifestyle
- Sports and travel
- Food and beverages
- Fashion accessories

Moreover, we can verify if the answer is correct by using the unique function:

df['Product line'].unique()

Asking Complex Questions

Pandas AI can handle complex queries as well. For example, we can ask Pandas AI to calculate the total spent by each gender by running the following code:

pai.run(df,prompt="Calculate the total spent by each gender.")

It provides us a useful result like:

The total spent by females is $101174, and the total spent by males is $108184.

But when generating a plot, the library sometimes doesn't provide the right answer. For instance, when we tried to make a bar plot of the total spend by gender, we didn't get the correct answer. Notice that the female value is incorrect.

You can use the following code to make the bar plot:

pai.run(df,prompt="Plot a bar plot that shows the total spent by each gender.")

Incorrect Bar Plot

However, Pandas AI still has great potential. It can automatically generate a plot if you give it a data frame with the data already processed. In this case, Pandas AI can generate the correct bar plot using the following code:

pt=pd.pivot_table(df,index=['Product line'],columns=['Gender'],values=['Total'],aggfunc=sum)pt.plot(kind='bar',title='Total spent by gender',xlabel='Product line',ylabel='Total')

Correct Bar Plot

Conclusion

Pandas AI is a powerful library that can save you time while working with data frames. However, it's more suitable for talking with a data frame rather than generating visualizations like pivot tables or plots. You still need to know how to code to guide Pandas AI because it sometimes makes mistakes.

To get started in data science, we recommend using Brilliant. Brilliant's interactive courses help you learn the fundamentals interactively and develop your analytical thinking. Visit brilliant.org and start learning data science interactively today.

In conclusion, Pandas AI is a great addition to your data science toolkit. If you have used it already, let us know your experience in the comments below.

Previous Post

Analyzing and Visualizing Data with Chachi PT Plus and Notable

Next Post

Google's New Chatbot, GP3 vs BART: A Comprehensive Comparison

About The auther

New Posts

Popular Post