- author: Python and Pandas with Reuven Lerner
Selecting Columns and Rows from a Pandas DataFrame in Python
In this article, we will explore how to select particular columns and rows from a Pandas DataFrame in Python using various techniques. We will load a dataset containing information about 10,000 taxi rides in New York City, and demonstrate the different methods to extract the desired data.
Loading the Dataset
First, let's load the necessary libraries and import the dataset into our Python environment:
importpandasaspddf=pd.read_csv("taxi_rides.csv")# Loading the dataset into a DataFrame
The dataset consists of several columns, such as "vendor ID," "pickup daytime," "drop-off daytime," "passenger count," "trip distance," and more. To get an overview of the dataset, we can use the head()
method:
df.head()# Displaying the first few rows of the DataFrame
Selecting Columns
To retrieve specific columns from the DataFrame, we can use square brackets []
.
For instance, if we want to select only the "passenger count" column, we can write:
df["passenger count"]
To select multiple columns, we can pass a list of column names inside the square brackets:
df[["passenger count","total amount","trip distance"]]
This will return a new DataFrame with only the selected columns.
Alternatively, we can use the filter()
method to indicate the columns we want to retrieve. For example:
df.filter(["passenger count","total amount","trip distance"])
This method is useful when we want to select columns based on specific patterns or conditions. We can filter columns that contain a certain word using a regular expression:
df.filter(like="amount")
This will return all columns that contain the word "amount" in their names.
Selecting Rows
To select particular rows from the DataFrame, we can use the filter()
method with the axis
parameter set to "rows". By default, the axis
is set to "columns".
For example, let's say we want to select all the rows with a specific pickup date, such as "2015-06-02". We can do so by executing the following code:
df.filter(like="2015-06-02",axis="rows")
This will return a new DataFrame with only the rows that match the given condition.
We can also use regular expressions to find rows with specific patterns in their values. For instance, if we want to select rows that have a specific time format, such as "11:00", we can use the following code:
df.filter(regex=r"\d{2}:\d{2}",axis="rows")
This will retrieve all rows that match the given regular expression pattern.
Conclusion
In this article, we explored various techniques to select specific columns and rows from a Pandas DataFrame in Python. We learned how to use square brackets to retrieve columns, as well as the filter()
method for more complex column selection. Additionally, we discovered how to use the filter()
method with the axis
parameter set to "rows" to select specific rows based on conditions and regular expressions.
Understanding these methods will help you efficiently extract the desired data from your DataFrame and perform further analysis. If you have any questions or need further assistance, feel free to leave a comment below. Happy coding with Python and Pandas!