Using Machine Learning models to recognise Sign Language

Jahleel A., Connor R., Bobby A., Isabel H

Summary

After learning more about AI & Machine Learning technology, how it works and, what can be accomplished using it, we as a group decided to apply a Machine Learning model to recognise sign language to improve the communication between hearing impaired / deaf people and modern computers.

Introduction

In the modern age, computers and other devices on the market are catered for people without any impairments to their mobility, vision, or hearing. This means that people who struggle in these areas, lots of technology is not available to them. We wanted to bridge the gap between people with and without disabilities and create accessible tech in the process. At the core, our project's goal is to address this issue through a proof-of-concept.

Our approach to tackling this problem is divided into two main parts:

Training a Model for Sign Language Recognition : We used a pre-captured dataset encompassing 24 out of the 26 letters of the alphabet to train a model for recognizing sign language. 2 of the letters involved movement, (J and Z) so with our current model we are not able to translate them as it requires still images. We hope to enable translation of these letters and potentially signs for words in the future.

Real** -time Hand Gesture Recognition**: Our system focuses on identifying, isolating, and transforming a live frame of a hand captured by a computer webcam. This information is then fed into the trained model.

We decided to use American Sign Language (ASL) as it only uses a single hand for the alphabet signs whereas British Sign Language (BSL) uses both hands. This would mean it was easier to train the model as we would only need one hand frame. In the future, we hope to be able to translate BSL but at the moment it is too time consuming and expensive. By using experimental data, we can build guidelines for future self-created data sets.

We designed our model based on a previously successful one from the well-known MNIST database. We did this as both involves classifying images in a set of characters.

Identifying, isolating, and transforming hand frames was carried out in a separate program. We used Computer Vision algorithms and a pre-trained model by Google created specifically for hand recognition. The data is process through this pipeline and then forwarded to our model to be recognised as signs.

In conclusion, our project deals with accessibility issues in technology by training an ASL recognition model to recognise live hand gestures and translate them to text. By focusing on ASL for practical reasons, we were able to ensure we could create a high-quality model to meet our deadlines. We hope this project can help those who need ASL translation but also inspire others to create similar projects by building off our strengths and weaknesses.

Legal & Ethical

Ethical Aspects of Neural Networking

The key issues of neural networking are ethicality, as neural networks are almost incapable of generating content that can be used in any industry without raising concerns. To train a neural network, an enormous dataset is required. The more data, the more sophisticated the AI model will be. Access to this information however can create issues related to intellectual property rights and confidentiality.

Legal aspects of Neural Networking

The legal issues of this include; the artificial intelligence community, there are several approaches to modelling human intelligence. One approach applicable to the legal domain is the use of symbolic reasoning systems, which are called expert systems. These systems are called symbolic systems because they transform symbols representing things in the real world into other symbols according to explicit rules.

AI is also requiring companies to confront an evolving host of questions across different areas of law, including how it would not breach the privacy of users, how it will keep the information safe in cybersecurity.

Training the Model

Our dataset was a CSV file. The first column had an identifier for what letter the sign was. For example, '2' was assigned to 'B'. There were 784 columns containing a grey scale value between 0 – 225 which determined the brightness of the pixel in the image of the sign.

We used PyTorch data set class to create our own custom dataset. The spreadsheet was read and converted to a NumPy array. We then extracted the labels into a different array and then deleted the label in the original file, so we only had the array. The image was transformed into tensor and then normalised by 0.5 which maps the numbers to either –1 or 1.

The model was defined with these layers:

We trained and tested the model, achieving an accuracy of 87.1%

This version of the model was saved as 'Version 3'.

Finding and Recognising a Hand and Predicting the Sign

We used CV2 to handle image processing. Media pipe, a Google service, was used to recognise hands. It initialises at 30 fps and relays the images back to user on screen. A copy is taken and sent to Media Pipe.

In order for the model to be able to predict the sign, it first needs to find the hand and the points used to determine the hand position. To make hand identification easier for the model, we changed the image into grey scale so the hand would better stand out from the background. The next step was to find the important points on the hand. These are called landmarks and represent the joints in the hand. Landmarks are used in the dataset as well as live images. We can identify signs by matching landmark configurations to letters.

It takes the distance of the 2 landmarks furthest apart and creates a box around the hand based on these. The image is then flipped to be the right way around and 35 pixels of padding are added to the image on all sides which ensures none of the hand is cropped out. The image is then converted to grey scale and resized to 28x28 pixels. This image is sent for processing.

The hand must be in a certain area of the camera to be read properly. If it is too close to the edge, then it is not picked up. This prevents the model from crashing.

In the screen shot above, you can see all 3 stages of recognition. First, in the frame captured by the Webcam, you can see the box surrounding the hand with an overlayed skeleton. The white frame on the outside is the bounding box which is the boundary for reading the predicted letter. In this case, the sign is 'A', and it is being correctly identified.

Secondly, the coloured, cropped image of the hand in the top left is the isolated hand with 35 pixels of padding.

Finally, the greyscale image of the hand in the top left is resized.

Dealing with Inaccuracies

Suggested Improvements to the Model

We could improve the model by creating our own dataset. We would record our own images from scratch. This means we can pick the conditions the hand is in like lighting and background to ensure the data is varied enough. We could record it different ways such as images with different backgrounds, isolating the hand from the background or taking the coordinates of the skeleton landmarks.

Another way this could be improved is by separating the hand from the background entirely. This was attempted by using the colour difference between the hand and the background, finding contours around the and taking these two operations and applying them to each other to cut out any unnecessary pixels.

Results

The results in testing had an accuracy of 87%, However in practice it ended up following a normal distribution suggesting that the model is near random in real world conditions.

Conclusion

The project has made significant strides in developing a real time hand gesture recognition system. Key functionalities, including hand tracking, bounding box calculation were successfully implemented. However, challenges in normalising and repositioning hand landmarks need further refinement. Overcoming this challenge was essential for the success of the hand gesture recognition system, as it directly impacted the accuracy and reliability of gesture detection.

We have a functioning AI module however it cannot accurately predict what sign language letters are shown. However, future research in this field could focus on developing more advanced algorithms that can dynamically adjust to the different hand sizes and orientations. Machine learning techniques may also be used to improve accuracy and efficiency.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
models		models
testing		testing
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
hands.py		hands.py
main.py		main.py
model.py		model.py
plot.py		plot.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Machine Learning models to recognise Sign Language

Summary

Table of Contents

Introduction

Legal & Ethical

Ethical Aspects of Neural Networking

Legal aspects of Neural Networking

Training the Model

Finding and Recognising a Hand and Predicting the Sign

Dealing with Inaccuracies

Suggested Improvements to the Model

Suggested Improvements to the Reader

Results

Conclusion

About

Languages

License

JahleelAbraham/SLTranslator

Folders and files

Latest commit

History

Repository files navigation

Using Machine Learning models to recognise Sign Language

Summary

Table of Contents

Introduction

Legal & Ethical

Ethical Aspects of Neural Networking

Legal aspects of Neural Networking

Training the Model

Finding and Recognising a Hand and Predicting the Sign

Dealing with Inaccuracies

Suggested Improvements to the Model

Suggested Improvements to the Reader

Results

Conclusion

About

Topics

Resources

License

Stars

Watchers

Forks

Languages