$ pip install ellzaf_ml
If you have any papers that are not implemented in PyTorch or not yet implemented in any frameworks, you can open an issue for this.
Any model that can be use in different way from the paper will be inside Experimental tag.
🦾Data Augmentation
⚡ Models
- GhostFaceNets
- SpectFormer
- LBP and CNN Feature Fusion for face anti-spoofing
- LDnet with the combination of 2D and 3D
- SimMIM
- MixMobileNet
🛠️ Tool
An implementation of PatchSwap: Boosting the Generalizability of Face Presentation Attack Detection by Identity-aware Patch Swapping.
PatchSwap applied to same person:
PatchSwap applied to different person:
First (Image A) and third (Image B) picture from the left are the original image.
The second picture from the left is when the eyes, nose, and mouth from Image B is applied to Image A.
The fourth picture from the left is when the eyes, nose, and mouth from Image A is applied to Image B.
from ellzaf_ml.augments import PatchSwap
swapper = PatchSwap()
image_a, image_b = swapper.swap_features('path/to/face_imageA.jpg', 'path/to/face_imageB.jpg')
# you can specify facial features that you want to swap, default value is ["right_eye", "left_eye", "nose", "lips"]
image_c, image_d = swapper.swap_features('path/to/face_imageC.jpg',
'path/to/face_imageD.jpg',
features_to_swap=["left_eye", "nose",])
# optional to see the images
if image_a is not None and image_b is not None:
swapper.show_image(image_a, 'Image A with features from B', image_b, 'Image B with features from A')
# go through images in folder
input_dir = 'path/to/real_face_folder'
output_dir = 'path/to/fake_face_folder'
# Call the class method with the input and output directories
swapper.swap_features_in_directory(input_dir, output_dir)
Key differences:
- Instead of using dlib, I use MediaPipe for face landmark
- I only swap eyes instead of eyes and eye brows
If you want to follow the paper method, use input folder consisting of the same person for swap_features_in_directory
.
PyTorch version of GhostFaceNetsV1.
GhostNet code from Huawei Noah's Ark Lab.
import torch
from ellzaf_ml.models import GhostFaceNetsV1
IMAGE_SIZE = 112
#return embedding
model = GhostFaceNetsV1(image_size=IMAGE_SIZE, width=1, dropout=0.)
img = torch.randn(3, 3, IMAGE_SIZE, IMAGE_SIZE)
model(img)
#return classification
model = GhostFaceNetsV1(image_size=IMAGE_SIZE, num_classes=3, width=1, dropout=0.)
img = torch.randn(3, 3, IMAGE_SIZE, IMAGE_SIZE)
model(img)
PyTorch version of GhostFaceNetsV2.
GhostNetV2 code from Huawei Noah's Ark Lab.
import torch
from ellzaf_ml.models import GhostFaceNetsV2
IMAGE_SIZE = 112
#return embedding
model = GhostFaceNetsV2(image_size=IMAGE_SIZE, width=1, dropout=0.)
img = torch.randn(3, 3, IMAGE_SIZE, IMAGE_SIZE)
model(img)
#return classification
model = GhostFaceNetsV2(image_size=IMAGE_SIZE, num_classes=3, width=1, dropout=0.)
img = torch.randn(3, 3, IMAGE_SIZE, IMAGE_SIZE)
model(img)
In order to not use GAP like mentioned in the paper, you need to specify the image size.
You also need to have image_size>=33.
- Replicate model.
- Create training code.
Implementation of SpectFormer vanilla architecture.
Code is modified version of ViT from Vit-PyTorch.
import torch
from ellzaf_ml.models import SpectFormer
model = SpectFormer(
image_size = 224,
patch_size = 16,
num_classes = 1000,
dim = 512,
depth = 12,
heads = 16,
mlp_dim = 1024,
spect_alpha = 4, # amount of spectral block (depth - spect_alpha = attention block)
)
img = torch.randn(1, 3, 224, 224)
preds = model(img) # prediction -> (1,1000)
SpectFormer utilizes both spectral block and attention block. The amount of spectral block can be speciified using spect_alpha
and the remaining block from depth
will be attention blocks.
depth - spect_alpha = attention block
12 - 4 = 8
From the code and calculation example above, when spect_alpha
are 4 with the depth
of 12. The resulting attention block will be 8. If spect_alpha
== depth
, it will be GFNet while if spect_alpa = 0, it will be ViT.
🔬 Experimental [Click Here]
This is different ViT architecture from SimMIM Repo. I changed the architecture a bit by adding spectral gating network to each attention block.
from ellzaf_ml.models import ViTSpectral
small_vitspectral = ViTSpectral(
img_size=224,
patch_size=16,
in_chans=3,
num_classes=2,
embed_dim=368, #base is 768
depth=12,
num_heads=6, #12
mlp_ratio=4,
qkv_bias=True,
drop_rate=0.,
drop_path_rate=0.,
norm_layer=partial(nn.LayerNorm, eps=1e-6),
init_values=0.1,
use_abs_pos_emb=False,
use_rel_pos_bias=True,
use_shared_rel_pos_bias=False,
use_mean_pooling=True)
Implementation of LBP and CNN Feature Fusion for face anti-spoofing
This model is primarily used for face liveness.
import torch
from ellzaf_ml.models import LBPCNNFeatureFusion
model = LBPCNNFeatureFusion(num_classes=2)
img = torch.rand(1, 3, 224, 224)
preds = model(img) # prediction -> (1,2)
🔬 Experimental [Click Here]
I also modified it to use with other models as backbone after concatenating the features from the two blocks. You need to specify the number of classes from the backend model instead of LBPCNNFeatureFusion.You can modify the number of channels after the features are concatenated using adapt
and adapt_channels
.
In order to obtain the image size for backbone model, you need to divide your current image size with 8.
We need to use adapt=True
so that the number of channels will be 3 instead of 512.
import torch
import timm
from ellzaf_ml.models import LBPCNNFeatureFusion
mobilenetv3 = timm.create_model('mobilenetv3_large_100.ra_in1k', pretrained=True)
mobilenetv3.classifier = torch.nn.Linear(mobilenetv3.classifier.in_features, 2) #specify number of class here
model = LBPCNNFeatureFusion(backbone="mobilenetv3", adapt=True, backbone_model=mobilenetv3)
img = torch.rand(3, 3, 224, 224)
preds = model(img) # prediction -> (3,2)
You can choose to use the 512 channels from the concatenated block output or adapt like MobileNetV3.
import torch
from ellzaf_ml.models import LBPCNNFeatureFusion, SpectFormer
spect_m = SpectFormer(
image_size = 28,
patch_size = 7,
num_classes = 2, # specify amount of classes here
channels = 512, #512 channels if you want to change only the backbone
dim = 256,
depth = 12,
heads = 4,
mlp_dim = 512,
att_dropout = 0.01,
ff_dropout = 0.1,
spect_alpha = 4, # amount of spectral block (depth - spect_alpha = attention block)
)
model = LBPCNNFeatureFusion(backbone="spectformer", backbone_model=spect_m)
img = torch.rand(3, 3, 224, 224)
preds = model(img) # prediction -> (3,2)
If you prefer different number of channels instead, you can specify it using adapt_channels
.
Note: GhostFaceNets only works with image_size
higher than 32.
import torch
from ellzaf_ml.models import LBPCNNFeatureFusion, GhostFaceNetsV2
gfn_m = GhostFaceNetsV2(image_size=33, width=1, num_classes=3, channels=10, dropout=0.)
model = LBPCNNFeatureFusion(backbone="ghostfacenets", adapt=True, adapt_channels=10, backbone_model=gfn_m)
img = torch.rand(3, 3, 264, 264)
preds = model(img) # prediction -> (3,2)
I also modify further where you can choose to perform convolutional or not. Basically with this method, you will only get the combined RGB and LBP.
import torch
from ellzaf_ml.models import LBPCNNFeatureFusion, GhostFaceNetsV2
gfn_m = GhostFaceNetsV2(image_size=224, width=1, num_classes=3, channels=4, dropout=0.)
model = LBPCNNFeatureFusion(backbone="ghostfacenets", do_conv=False, backbone_model=gfn_m)
img = torch.rand(3, 3, 224, 224)
preds = model(img) # prediction -> (3,2)
Implementation of A novel Deep CNN based LDnet model with the combination of 2D and 3D CNN for Face Liveness Detection.
This model primary use is for face liveness.
import torch
from ellzaf_ml.models import LDnet
model = LDnet(image_size=64)
img = torch.rand(1, 3, 64, 64)
preds = model(img) # prediction -> (1,2)
Modified SimMIM code from the original repo but the archtecture is still the same.
from ellzaf_ml.models import ViTSpectralForSimMIM, SimMIM
encoder = ViTSpectralForSimMIM(
img_size=224,
patch_size=16,
in_chans=3,
num_classes=0,
embed_dim=384,
depth=12,
num_heads=6,
mlp_ratio=4,
qkv_bias=True,
drop_rate=0.,
drop_path_rate=0.1,
norm_layer=partial(nn.LayerNorm, eps=1e-6),
init_values=0.1,
use_abs_pos_emb=False,
use_rel_pos_bias=False,
use_shared_rel_pos_bias=True,
use_mean_pooling=False)
encoder_stride = 16
simmim = SimMIM(encoder=encoder, encoder_stride=encoder_stride)
Implementation of MixMobileNet. There are three variants: XXS, XS and S.
import torch
from ellzaf_ml.models import MixMobileNet
img = torch.randn(1, 3, 224, 224)
model = MixMobileNet(variant="S", img_size=224, num_classes=2)
model(img)
Due to the paper does not implicitly mention their approach when image height and width is non-power-of-two. I decided to pad the output from downsampler with 0 at right and bottom so that 7x7 will become 8x8.
Ease early stopping process during model training. This code is from here and I modified some of it based on the issues from that GitHub.
import torch
from torch import nn, optim
from ellzaf_ml.tools import EarlyStopping
# Assuming 'model' has been defined and modified for a 2-class output
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# initialize the early_stopping object
early_stopping = EarlyStopping(patience=10, verbose=True)
num_epochs = 100 # Define the number of epochs you want to train for
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
correct_predictions = 0
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predictions = torch.max(outputs, 1)
correct_predictions += (predictions == labels).sum().item()
epoch_loss = running_loss / len(train_loader.dataset)
epoch_acc = correct_predictions / len(train_loader.dataset)
model.eval()
val_running_loss = 0.0
val_correct_predictions = 0
with torch.no_grad():
for inputs, labels in val_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
loss = criterion(outputs, labels)
val_running_loss += loss.item()
_, predictions = torch.max(outputs, 1)
val_correct_predictions += (predictions == labels).sum().item()
val_epoch_loss = val_running_loss / len(val_loader.dataset)
val_epoch_acc = val_correct_predictions / len(val_loader.dataset)
# early_stopping needs the validation loss to check if it has decresed,
# and if it has, it will make a checkpoint of the current model
early_stopping(valid_loss, model)
if early_stopping.early_stop:
print("Early stopping")
break
print(f'Epoch {epoch+1}/{num_epochs}')
print(f'Train Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
print(f'Val Loss: {val_epoch_loss:.4f} Acc: {val_epoch_acc:.4f})