Skip to content

Spatial-Data-Science-and-GEO-AI-Lab/percept-knn-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

K-Nearest-Neighbour model of Percept-generated data

Builds, tests, runs and exports a model of human perception data for street view images based on feature encodings from CLIP and the K-Nearest-Neighbour algorithm.

Please see the Percept project for the mobile web survey app to gather and generate the raw data.

Huggingface demos:

Requirements

Using PIP

pip install -r requirements.txt

clip-retrieval

Invokes the clip-retrieval tool to perform efficient encoding of images into vectors. This must be installed separately, e.g. pip --install clip-retrieval or as part of the requirements.txt mentioned above.

We use the same naming convention for CLIP models as clip-retrieval, i.e. from their docs:

--clip-model <CLIP model to load>

   "(default ViT-B/32). Specify it as "open_clip:ViT-B-32" to use the Open CLIP or "hf_clip:patrickjohncyh/fashion-clip" to use the Huggingface clip model."

Command-line usage

usage: clip_retrieval_knn [-h] [--images-dir DIRECTORY] [--embeddings-dir DIRECTORY] --clip-model MODELNAME [--other-clip-retrieval-args ARGS] --geojson FILENAME [--demographics FILENAME] [-k K] [--training-split FLOAT]
						  [--randomize] [--random-seed INT] [--stratified] [--environmental] [--environmental-method METHOD] [--environmental-text-dir DIR] [--prompt-style NUM]
						  [--results-log FILENAME] [--normalization-method METHOD] [--skip-cache] [--read-only] [--quiet] [--extra-assertions] [--gender GENDER,...] [--region REGION] [--age AGE_MIN,AGE_MAX]
						  [--education LEVEL,...] [--export FILENAME]

K-nearest neighbour on CLIP encoded vectors

options:
  -h, --help            show this help message and exit
  --images-dir DIRECTORY, -i DIRECTORY
						Directory with images to be processed with clip-retrieval tool
  --embeddings-dir DIRECTORY, -e DIRECTORY
						Directory for embeddings output of clip-retrieval tool
  --clip-model MODELNAME, -M MODELNAME
						CLIP model name (see clip-retrieval tool help)
  --other-clip-retrieval-args ARGS
						Other command line args to pass to clip-retrieval
  --geojson FILENAME, -g FILENAME
						File with GeoJSON data from survey
  --demographics FILENAME, -d FILENAME
						CSV File with demographic data per rating from survey
  -k K                  Value of K (number of nearest neighbours to include in cluster) or comma-separated list of k-values to try.
  --training-split FLOAT
						Portion of data to use for 'training', value between 0 and 1 (default: 0.8)
  --randomize           Randomly shuffle the data before splitting into training and testing sets.
  --random-seed INT     Seed for random number generator.
  --stratified          Use stratified sampling (stratified by rating).
  --environmental       Add environmental features into the model
  --environmental-method METHOD
						One of: append, average, slerp
  --environmental-text-dir DIR
						Path to dir containing prompt files for environmental vars
  --prompt-style NUM    One of: 0, 1
  --results-log FILENAME, -L FILENAME
						Append the results to this file (CSV format)
  --normalization-method METHOD
						softmax10** (default), softmax or divbysum
  --skip-cache          Do not look for or read any cached data.
  --read-only           Do not write any data to disk (cache or otherwise).
  --quiet, -q           Reduce output to minimum.
  --extra-assertions    Run additional assertions for testing purposes.
  --gender GENDER,...   Comma-separated list of surveyed people's genders to include in analysis
  --region REGION       Include in analysis only those ratings from people who claim to be from this stated region (NL, non-NL)
  --age AGE_MIN,AGE_MAX 
						Include in analysis only those ratings from people who claim to be from this stated age range
  --education LEVEL,... 
						Comma-separated list of surveyed people's education level to include in analysis (Primary, Secondary, Tertiary, University, ostgraduate)
  --export FILENAME     Instead of running KNN, export numpy arrays with CLIP vectors and scores to the given file.

Examples

python3 clip_retrieval_knn.py -g data.geojson --images-dir images/ \
    -k 20 --clip-model open_clip:ViT-B-32

Load image filename and score data from data.geojson, load the image files themselves from the directory images/, use K = 20 and the ViT-B-32 model from Open CLIP.

python3 clip_retrieval_knn.py -g data.geojson --images-dir images/ \
    -k 40 --clip-model open_clip:ViT-H-14-378-quickgelu \
    --environmental --environmental-method slerp --prompt-style 1

Load image filename and score data from data.geojson, load the image files themselves from the directory images/, use K = 40 and the ViT-H-14-378-quickgelu model from Open CLIP. Also include complementary environment variables for each image location that should also be found in data.geojson. Build prompts using Prompt Style 1, encode the resulting text into vectors and then combine it with the image vectors using Spherical Linear Interpolation (slerp).

python3 clip_retrieval_knn.py -g data.geojson --images-dir images/ \
    -k 10,20,30,40,50 --clip-model open_clip:ViT-H-14-378-quickgelu \
    --training-split 0.7 --randomize --random-seed 1000 \
    --results-log results.csv

Load image filename and score data from data.geojson, load the image files themselves from the directory images/, run the tests multiple times with different K values from 10 to 50 and use the ViT-H-14-378-quickgelu model from Open CLIP. Randomly shuffle the order of the images with a random seed of 1000. Put 70% of the (shuffled) data into the training set and the rest into the testing set. Write the results of the tests into the file results.csv (appending them to the end).

python3 clip_retrieval_knn.py -g data.geojson --images-dir images/ \
    -k 10,20,30,40,50 --clip-model open_clip:ViT-H-14-378-quickgelu \
    --demographics demo.csv --age 30,49

Load image filename and score data from data.geojson, load the image files themselves from the directory images/, run the tests multiple times with different K values from 10 to 50 and use the ViT-H-14-378-quickgelu model from Open CLIP. Filter the responses according the the demographics information found in demo.csv, keeping only those scores that were given by participants between the ages of 30 to 49.

python3 clip_retrieval_knn.py -g data.geojson --images-dir images/ \
    -k 40 --clip-model open_clip:ViT-H-14-378-quickgelu --export model.npz

Load image filename and score data from data.geojson, load the image files themselves from the directory images/, use K = 40 and the ViT-H-14-378-quickgelu model from Open CLIP. Export the resulting model to the file model.npz, do not run the tests.

Complementary environmental variables

Introduction

What do we mean by 'complementary environmental variables'?

These are environmental data (e.g. 'average street length') that are complementary to the imagery we already have. Each image is associated with a geographic location, and using that geographic location we can download from OpenStreetMap information about the street network and other surrounding points of interest or features. For example, for a given image location X, if we consider a buffer size of 300 metres, that means we take data such as 'the number of shops within 300 metres of location X' or 'the proportion of greenspace within a circle of radius 300 metres centring on location X'.

The variables

under construction

We can apply the complementary environmental variables that were generated for each image location using the --environmental option: in that case, we produce text prompts for each image that describe the complementary environmental variables and then run CLIP on the text prompts to create vectors from the text. We then combine the text vectors with the image vectors in one of three ways:

  • append: put the two vectors end-to-end and create a new vector that is twice as long as the originals
  • average: element-by-element average the vectors
  • slerp: `Spherical Linear Interpolation' finds a vector that is halfway in between the text vector and the image vector. Effectively, it means rotating both vectors towards each other at the same rate until they meet. In 3-D space we would say that this finds a point on a sphere that sits halfway between the other two points on a sphere, on the same great circle between them all. However, we are working in higher dimensional space in the case of CLIP vectors, so it is generalized.

Prompt styles

Prompt style 0

The prompts generated by style 0 have raw numbers in them and look like this:

greenspace count (within buffer of size 100m) is 9; shops count (within buffer of size 100m) is 28; public transport count (within buffer of size 100m) is 8; sustenance count (within buffer of size 100m) is 6; education count (within buffer of size 100m) is 0; [...]; street length avg (within buffer of size 300m) is 29.089045454545474; orientation entropy (within buffer of size 300m) is 2.5227772640841017; median speed (within buffer of size 300m) is 30.0

Prompt style 1

The prompts generated by style 1 have numbers rewritten as quintiles encoded as one of ('very low', 'low', 'medium', 'high', 'very high') and look like this:

greenspace count (within buffer of size 100m) is very low; shops count (within buffer of size 100m) is medium; public transport count (within buffer of size 100m) is low; sustenance count (within buffer of size 100m) is very low; education count (within buffer of size 100m) is very low; [...]; street length avg (within buffer of size 300m) is very low; orientation entropy (within buffer of size 300m) is low; median speed (within buffer of size 300m) is low

Demographics

under construction

About

K-Nearest-Neighbour model of Percept-generated data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages