中文 | English
Predict the rotation angle of given picture through CNN. This project can be used for rotate-captcha cracking.
Test result:
Three kinds of model are implemented, as shown in the table below.
Name | Backbone | Cross-Domain Loss (less is better) | Params | MACs |
---|---|---|---|---|
RotNet | ResNet50 | 75.6512° | 24.246M | 4.09G |
RotNetR | yolo11n-cls | 15.1818° | 18.117M | 3.18G |
RotNet is the implementation of d4nst/RotNet
over PyTorch. RotNetR
is based on RotNet
, with yolo11s-cls
as its backbone and class number of 128. The average prediction error is 15.1818°
, obtained by 64 epochs of training (3 hours) on the Google Street View dataset.
The Cross-Domain Test uses Google Street View and Landscape-Dataset for training, and Captcha Pictures from Baidu (thanks to @xiangbei1997) for testing.
The captcha picture used in the demo above comes from RotateCaptchaBreak
-
Computing device with mem>=8G for training
-
Python>=3.9,<3.13
-
PyTorch>=2.0
-
Clone the repository.
git clone https://github.com/lumina37/rotate-captcha-crack.git --depth 1
cd ./rotate-captcha-crack
- Install all requiring dependencies.
This project strongly suggest you to use uv
for package management. Run the following commands if you already have uv
:
uv pip install .
The dependency resolution strategy of uv
might have some issue, so uv sync
is not recommended for environment setup.
Or, if you prefer conda
: The following steps will create a virtual env under the working directory. You can also use a named env.
conda create -p .conda
conda activate ./.conda
conda install matplotlib tqdm tomli
conda install pytorch torchvision pytorch-cuda=12.4 ultralytics -c pytorch -c nvidia -c conda-forge
Or, if you prefer a direct pip
:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install .
Download the *.zip
files in Release and unzip them all to the ./models
dir.
The directory structure will be like ./models/RotNetR/230228_20_07_25_000/best.pth
The names of models will change frequently as the project is still in beta status. So, if any FileNotFoundError
occurs, please try to rollback to the corresponding tag first.
If no GUI is presented, try to change the debugging behavior from showing images to saving them.
uv run test_captcha.py
If you do not have uv
, please use:
python test_captcha.py
- Install extra dependencies
With uv
:
uv pip install .[server]
or with conda
:
conda install aiohttp
or with pip
:
pip install .[server]
- Launch server
uv run server.py
If you do not have uv
, just use:
python server.py
- Another Shell to Send Images
Use curl:
curl -X POST --data-binary @test.jpg http://127.0.0.1:4396
Or use Windows PowerShell:
irm -Uri http://127.0.0.1:4396 -Method Post -InFile test.jpg
-
For this project I'm using Google Street View and Landscape-Dataset for training. You can collect some photos and leave them in one directory. Without any size or shape requirement.
-
Modify the
dataset_root
variable intrain.py
, let it points to the directory containing images. -
No manual labeling is required. All the cropping, rotation and resizing will be done soon after the image is loaded.
uv run train_RotNetR.py
uv run test_RotNetR.py
Most of the rotate-captcha cracking methods are based on d4nst/RotNet
, with ResNet50
as its backbone. RotNet
regards the angle prediction as a classification task with 360 classes, then uses cross entropy to compute the loss.
Yet CrossEntropyLoss
with one-hot labels will bring a uniform metric distance between all angles (e.g. [0,1,0,0] -> [0.1,0.8,0.1,0]
, CSL provides a loss measurement closer to our intuition, such that
Meanwhile, the angle_error_regression
proposed by d4nst/RotNet is less effective. That's because when dealing with outliers, the gradient leads to a non-convergence result. It's better to use a SmoothL1Loss
for regression.