Supermarket Sales Report

Using Python, MySQL, and Microsoft Power BI

Data Source: https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales/

Introduction

This Supermarket Sales Report portfolio aims to demonstrate the ability to utilize Python, MySQL, and Microsoft Power BI to draw meaningful insight from the given data set.

Importing Data



To analyze and query data contained within files such as .txt, .csv, .xlsx, or other common formats, import the dataset into a MySQL database. Begin by establishing a new schema and creating an appropriately named table. Ensure that each column name and corresponding datatype is aligned with the structure of the provided dataset. To import the data, use the 'Table Data Import Wizard' accessible via a right-click on the main schema. The import process will be successful if the datatypes in the table match with that dataset. If any discrepancies are encountered, check the error log and alter the table before reattempting the import. Once imported, verify the data by executing `SELECT * FROM schema.table;`.

Python Analysis

The right image illustrates a straightforward instance of Python's capability to streamline database queries. In the provided example, a for loop with the numpy library is employed to efficiently verify the accuracy of 'cogs' and 'Total'. 

Furthermore, Python can be utilized to manipulate the MySQL database directly. The example demonstrated uses an if statement to remove the 'gross income' column should it be found identical to the 'Tax 5%' column. Upon code execution and subsequent inspection of the MySQL database, it can be observed that the 'gross income' column has been successfully dropped.



The Python module mysql.connector and the scientific library numpy (referred to as np) should be installed if not already present, using the commands `pip install mysql-connector-python` and `pip install numpy`, respectively. Python serves as a powerful tool for analyzing SQL data while preserving database integrity, reducing the risk of unintended modifications or alterations. Scientific libraries such as numpy and pandas are particularly useful for such analytical tasks, offering extensive functionality for complex operations on datasets.

Data Visualization



Upon completion of data manipulation in Python and the corresponding database adjustments within MySQL, the dataset may be transferred to Microsoft Power BI for advanced analysis and visualization. Power BI facilitates additional functionalities, such as merging diverse tables or models, which is comparatively difficult to do in MySQL or Python. For instance, Power BI streamlines processes like organizing total sales figures from Monday to Sunday by generating an extra column, a task that is cumbersome in MySQL and Python environments.

Drawing an Insight



The charts presented in the Data Visualization section offer several insights, including that products within the Health and Beauty category generate the lowest total sales and that female customers outspend male customers in aggregate. By examining the chart specific to female customers, it can be observed that females purchase fewer Health and Beauty items than males. This discrepancy may suggest a negative correlation between sales of Health and Beauty products and overall revenue generated by female customers. After drawing a hypothesis from the initial findings, further statistical analysis can be conducted using Python or other statistical software.