How to Read SQL Queries in a Jupyter Notebook 📚
In a previous article, we explored the process of Importing CSV Data into PostgreSQL Using Python. Building upon that foundation, let’s now dive into the exciting world of SQL queries in a Jupyter Notebook.
In the field of data, SQL (Structured Query Language) is a reliable tool. What if I told you that you can use it effectively within your Jupyter Notebook (.ipynb)?
Jupyter Notebook can significantly simplify documentation. We are about to explore how documentation is not just important but can be a valuable asset. In the world of data-driven decision-making, well-documented code is crucial. Whether you work independently with data or as part of a team, documenting your SQL queries in Jupyter Notebook can greatly benefit for you.
Prerequisites
Before we dive into the tutorial, ensure you have the following prerequisites in place:
- Python and Jupyter Notebook: You should have Python installed on your system, along with Jupyter Notebook, which is an interactive development environment ideal for data analysis.
- Pandas: Pandas is a Python library that provides high-performance, easy-to-use data structures for data manipulation and analysis.
- SQLAlchemy: SQLAlchemy is a popular SQL toolkit and Object-Relational Mapping (ORM) library for Python.
- A SQL Database: You’ll need access to a SQL database. In this tutorial, we’ll assume you have a PostgreSQL database running locally, but you can adapt the code to other database systems as well.
Writing SQL Queries in Jupyter Notebook
Let’s start by writing SQL queries in a Jupyter Notebook. Here’s a Python script that demonstrates how to connect to a PostgreSQL database and execute a SQL query:
Step 1: Importing Libraries
Import the necessary libraries, including Pandas for data manipulation and SQLAlchemy for database interaction.
# Library
import pandas as pd
from sqlalchemy import create_engine
Step 2: Define Database Connection Parameters and Creating a SQLAlchemy Engine
Define the parameters required to establish a connection to the PostgreSQL database. You should replace these values with your own database connection details. Then use the provided database connection parameters to create an SQLAlchemy engine. The engine acts as a bridge between your Python code and the database.
# Define the database connection parameters
db_params = {
'host': 'localhost',
'database': 'soccer',
'user': 'postgres',
'password': 'admin',
'port': '5432' # PostgreSQL default port
}
# Create a SQLAlchemy engine
engine = create_engine(f'postgresql://{db_params["user"]}:{db_params["password"]}@{db_params["host"]}:{db_params["port"]}/{db_params["database"]}')
Step 3: Writing and Executing the SQL Query
Write the SQL query inside a triple-quoted string. Note that you can add comments within the SQL query using double hyphens (--
).
# Write and execute the SQL query
query = """
-- I want to see all data
SELECT *
FROM match;
"""
Step 4: Reading and Display SQL Query Results into a Pandas DataFrame
Use Pandas read_sql_query
function execute the SQL query using the SQLAlchemy engine and store the result in a Pandas DataFrame named result
. Finally, we display the query result as a Pandas DataFrame in the Jupyter Notebook output.
# Display the query result as a Pandas DataFrame
result = pd.read_sql_query(query, engine)
result
Result!
Conclusion
In this tutorial, we’ve explored how to read SQL queries in a Jupyter Notebook using the Pandas library. This combination of tools allows you to seamlessly integrate database queries into your data analysis workflow, making it easier to work with data stored in a SQL database. With the power of SQL and the flexibility of Jupyter Notebook, you can perform complex data analysis tasks efficiently and the most important things is well-documented code.
Feel free to adapt the provided code to your specific database and SQL query needs. Happy coding!