MediaPipe Python Tutorial [How to Install + Real-Time Hand Tracking Example]

7 min readAug 24, 2021

What is “MediaPipe” Python?

MediaPipe is Google’s open-source framework, used for media processing. It is cross-platform or we can say it is platform friendly. It is run on Android, iOS, web, and YouTube servers that's what Cross-platform means, to run everywhere.

What do you think is common in all these pictures?

Think for a while and guess what is common in all images below!

You have guessed correctly, module mediaPipe is common in all these images.

What Are the Uses of MediaPipe?

Uses of MediaPipe

Every Youtube video we watch is processed with machine learning models using MediaPipe. Google has not hired thousands of employees to watch every video people upload, because thousands of people are not enough to look after and check each published video, the amount of data Google gets daily is not easy for humans to check. Machine Learning models are developed to make our life easier, so tasks that are hard for us to complete, machine learning and deep learning models help us to do in less amount of time, on the other hand, we can save money by not hiring employees.

Yes, Google has machine learning/deep learning models to see if the videos match their policies and the content is not having copy-right issues.

Basically, MediaPipe is a framework for Computer Vision and Deep Learning that builds perception pipelines. For now, you just need to know, perception pipelines are some sort of audio, video, or time-series data that catch the process in pipelining zone.

Why Google uses MediaPipe?

Google has been using MediaPipe for so long and mainly Google uses it for two tasks.

1. Dataset preparation for Machine learning training

Pose Estimation

Pose estimation means finding a person’s or an object’s key points. A person’s key points are elbow, knee, wrist, etc so MediaPipe can be used for training the ML model to learn the key points and further use the knowledge for specific tasks, this actually can be useful for action recognition.

2. ML inference pipelines

Live Data

ML inference is the process of running live data points.

Example: We all have used Snap_chat and Instagram filters and may have recorded videos, this is what ML inference means.

What is possible with MediaPipe?

There are a number of AI problems that can be done by MediaPipe. Here some are mentioned:

Object Tracking
Box Tracking
Face Mesh
Hair Segmentation
Live Hand Tracking and many more.

Real-Time Hand Tracking Project

Here I have developed the Live Hand Tracking project using MediaPipe.

Hand Tracking uses two modules on the backend

1. Palm detection

Works on complete image and crops the image of hands to just work on the palm.

2. Hand Landmarks

From the cropped image, the landmark module finds 21 different landmarks on the hand.

How to Instal MediaPipe

For this specific task, we require three modules, cv2, MediaPipe, and time.

We can install all the modules/libraries of Python by installing pyforest in the Jupyter Notebook.

Installing Modules

Once the modules are installed and the next time when this command is run, the output will be shown that (requirements are already satisfied). See below in the image.

pyforest

If MediaPipe is still not installed and does not work, install it separately because MediaPipe is the newest module maybe it is not yet included in the pyforest, as I thought to work directly on Kaggle notebook but found out that MediaPipe was not working, I installed it and worked on Jupyter Notebook, Jupyter Notebooks do not require internet it is a plus point.

This is how MediaPipe is installed in Jupyter notebook.

How to Import Modules in Jupyter

Importing libraries

How to Create Camera Object in Python

In the below code, I have created a camera object just to check if the camera is working properly.

Here is the output.

How to Create Object from Class Hand

Created a hand object from hand class so that BGR image is converted to RGB, as hands object only uses/accepts RGB.

Extracting Information from the Object Results

Before extracting hands further details, make sure there is something in the object (results), do this simple step, Use a print statement, and print the object result to see what it holds. It just shows MediaPipe solution-based solutions nothing else even if the hand is shown.

How to Check if the Hand is Being Detected or Not

Update print statement by putting (multi_hand_landmarks), and see if the camera is detecting hands.

Now as I have updated the print statement, the information I am getting is “None” because no hand is shown.

Let's see what information is extracted when hand/ hands are shown.

So you see, when the hand is detected by the camera it gives some values.

How to Detect Landmarks and Draw points on Hand

In the below code, the drawing object is created (mp_draw), further the if statement says that if the landmarks are detected the for loop will run and draw a point wherever landmark is detected.

Interesting right! See the image.

Landmarks are detected and points are drawn

How to Draw Connections Between Landmarks

Connections are drawn by using a hand object (mp_hand.HAND_CONNECTIONS).

Frame Rate

For fps two variables are declared, p_time and c_time (previous and current time).

Extracting Value of Each Landmark

Just in case if any specific point is needed to be tracked for any purpose.

As we know there are 21 landmarks in a hand (0 to 20). The landmark information gives the x,y, and z coordinates with id which are listed in the correct order. We can use x and y coordinates to find the location of a landmark on hand.

Here, I have checked the height, width, and channels (h, w, c) of the image. In the previous code, I have got the decimal values and now I wanted exact integer values, therefore, I have converted the circle values (cx, cy) to integers.

Drawing Circle on a Specific Landmark

So for drawing, I have created a drawing object (mp_draw), further, I have declared an if condition for point 0 because I wanted a filled circle at the landmark 0.

Highlighting Fingertips

For fingertips, the landmarks are (4,8,12, 16, and 20). See the code in the below image.

This is how we can use these landmarks for different tasks. Here, I am ending the article. Though, it's not the end of the study there is still a lot to explore.

And with that, you’ve crossed another level to becoming a boss coder. GG! 👏

I hope you found this article instructional and informative. If you have any feedback or queries, please let me know in the comments below. And follow SelectFrom for more tutorials and guides on topics like Big Data, Spark, and data warehousing.

The world’s fastest cloud data warehouse:

When designing analytics experiences that are consumed by customers in production, even the smallest delays in query response times become critical. Learn how to achieve sub-second performance over TBs of data with Firebolt.

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Check out our Community Discord and join our Talent Collective.️