Making an Offline Translator in Android.
- Tensorflow (Latest Version)
- Tensorflow Nightly (Latest Version)
- Android Studio
English to Bengali.
Source Language - ENGLISH.
Translated Language - BENGALI.
Use the package manager pip to install TensorFlow, Tensorflow Nightly.
pip install tensorflow
pip install tf-nightly
I have taken the file from here and edited the things which I would require. The file which I have edited is also provided on this repository.
Here I am generating two JSON files.
- word_dict_beng.json
- word_dict_eng.json
This JSON's are the Tokenized version of the Bengali and English dataset we are using.
The Model is trained and is having an accuracy of 93% and is finally is being converted into a TensorFlowLite format.
To convert into TensorflowLite Format I am using the code:-
import tensorflow as tf
filename = 'model_05_12_2019_v1_OUT_1_Beng'
model = tf.saved_model.load(filename)
concrete_func = model.signatures[
tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
concrete_func.inputs[0].set_shape([1,8])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
Tflite_model = converter.convert()
open("nmt_05_12_19_test_beng.tflite", "wb").write(Tflite_model)
The *.tf file is kept as an asset. In the "assets" folder.
-
We will be using an EditText to take the input from the user.
-
We will be separating the word from the user input sentence.
-
We will be creating a 2D float array and getting the token number from the JSON of the word. If we don't have that word we will be simply returning 0 as the token.
-
We will run the model with the generated input.
-
We will get a result after doing so. The array size is dependent on the dataset you will be training.
In my case, it is 3329
float[][][] outputVal = new float[1][8][3329];
Because if u check my model summery
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 8, 512) 963584
_________________________________________________________________
lstm (LSTM) (None, 512) 2099200
_________________________________________________________________
repeat_vector (RepeatVector) (None, 8, 512) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 8, 512) 2099200
_________________________________________________________________
dense (Dense) (None, 8, 3329) 1707777
=================================================================
Total params: 6,869,761
Trainable params: 6,869,761
Non-trainable params: 0
_________________________________________________________________
My output Dense layer is having 3329.
- After doing so we are getting the arg max value.
private static int argMax(float[] floatArray) {
float max = floatArray[0];
int index = 0;
for (int i = 0; i < floatArray.length; i++)
{
if (max < floatArray[i])
{
max = floatArray[i];
index = i;
}
}
return index;
}
-
After getting the argMax value we will be using the "word_dict_beng.json" to get the words from the JSON and making a string out of it.
-
Finally, show the translation to the user.
Here we are considering only 8 words to translate.
If the length is greater than 8 we will be ignoring that.
If it is less than 8 we will be padding it with 0.
Here I have tried to make an offline translation application that could be used where we won't be having enough network coverage.
This is just a Proof of Concept.
If this project is of any help, please add a star.
And a special thanks to Haoliang Zhang who has helped me a lot during this project.