Skip to content

dpaleka/stealing-part-lm-supplementary

Repository files navigation

Supplementary code for Stealing Part of a Production Language Model, Carlini et al., 2024

optimize_logit_queries

Implements logprob-free attacks on a known vector of logits (no API calls). Use run_attacks.py to access all implemented attacks, described in the paper in varying levels of detail. Running run_attacks.py without modification runs several methods on a small random vector of logits. This directory is useful as a starting point for further research on logprob-free attacks.

Other, less directly useful code

query_logprobs_emulator.

Emulates the OpenAI API to test any logprob attack on a local model, as well as possible mitigation strategies. (Could also use it on the OpenAI API before March 3 2024, but not anymore.) Defaults to the logprob attack that uses top_logprobs - 1 tokens per query. May be useful for research on mitigations and ways to bypass them.

distribution_logits

Very unpolished script investigating the distribution of logits over the vocabulary for various open-source models. Pythia seems to be an outlier with very low probabilities on the long tail of the vocabulary.

openai_api_intricacies

Verifying undocumented properties of logit_bias in the OpenAI API that are necessary for the attack to work as described in the paper.

Disclaimer

Note: this repo does not contain any parameters of OpenAI or Google proprietary models, nor any code that can directly extract weights from any API known to the authors.

About

Some code for "Stealing Part of a Production Language Model"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages