I have been looking for a project where I can enhance my skills as a Data Analyst and can have fun simultaneously.
So, I started researching for a project on google, youtube and found this project called YOUTUBE CHANNEL ANALYTICS. It had everything from data gathering through API’S, ETL or data cleaning(data which was coming was not upto the mark), After that we needed Data Visualization and in the end most important part of a data Analyst i.e FINDING INSIGHTS FROM THE DATA(Remember my Previous Article).
Project: Exploratory Data Analysis Using Youtube Video Data of Ankur Warikoo and Akshat Srivastava
Overview and the steps needed for the project :
- Obtain video metadata via Youtube API for the above-mentioned channels (this includes several small steps: create a developer key, request data, and transform the responses into a usable data format)
- We will be using Python For Data gathering and loading data into a CSV format.
- Preprocess data and engineer additional features for analysis.
- After loading the data into Power BI we will do all the ETL Processes in Power Query
- Exploratory data analysis.
- Visualize the data in Power BI itself using custom charts and try to find the insights.
- Conclusions and Insights ( The most Important Part)
- Lessons and Skills I learnt from this Project.(Important for me)
Data Gathering: Finding the Meta Deta for Mentioned Youtube Channels using Youtube API.
“I am assuming that you have a basic knowledge of Python and APIs. If not please find a good youtube channel and just understand the basics. All other things can be learned along with the project.”
“If you do not want to do this part of the process you can search for top trending videos on youtube from Kaggle and just do the ETL and Visualization part of the process.” Here is the link
After a lot of learning and research,I found this code and made some tweaks for indian context and used Ankur Warikoo’s and Akshat Srivastava’s Channel ID’s for analysis. (As I refer to their channels frequently).
Output of this code will be present in a pandas dataframe.After that I put the data into different csv files and uploaded the data into Power Query for ETL Analysis.
Data Manipulation: Making changes into the data itself.
I got two files which I merged together in Power Query itself. Some of the steps included in the process:
- Getting the data into Power BI and Merging both tables.
- Renaminig and Changing Data Type of the columns for better analysis.
- Filtered the data from 1 March, 2020 to till date.
- Making some measures like Correlation Coefficient, Calendar Table Etc.
Output Table Looked Like this:The Data looks Clean Now.
Exploratory Data Analysis: Finding Insights from data through data visualization.
When I first looked at the data, I wanted to find out how the videos were performing, what was the frequency of the videos published, are there any outliers in the data and why these outliers(videos) performed more or less than other videos.What words are most frequently used in the Titles.Here is what I came up with:
- As we can see, the most watched video is Complete Financial Planning in your 20’s and How to Replay Loans , Best Ways to use Credit Cards.We can assume that majority of the viewers are in there 20’s.
- The Correlation between comments and views is more for Akshat Srivastava Videos but it does not guarantee that the more the likes more are are the views.
- Consistency is key for youtube. we can see that initial videos were not performing very well and more frequent we upload videos more likely are we going to get the views.
- The most frequent used words and Ankur, Warikoo , Investing , Hindi, How which shows us that we is working for the channel.
Skills and Lessons Learned:
- The most important thing I wanted to learn from this project was data gathering through API’s from python. I knew we can get clean data from sources like Kaggle and Our World in Data. It is more rewarding to clean and gather data by ourselves.
- How API’s work.I wanted to learn how to get real data from different sources. Now I can say where the data comes from. Next I am going for webscraping in python.
- Though I had a somewhat knowledge of how overall project worked on data analytics, I now have the first hand experience in handling the project on my own. This is a game changer for me.