Daniel Kang

Fighting AI-generated Audio with Attested Microphones and ZK-SNARKs: the Attested Audio Experiment

2023-06-13T00:00:00+00:00

AI-generated audio is becoming increasingly indistinguishable from human-produced sound. This emerging technology, while impressive, is unfortunately increasingly misused. We’re witnessing instances where this convincingly replicated audio is being manipulated to conduct scams, perpetrate identity theft, and misused in other ways. How can we safeguard ourselves and effectively combat the misuse of this advanced technology?

In an environment where AI-generated audio can mimic human voices flawlessly, we need a reliable chain of trust stretching from the initial capture of audio to its final playback. This chain of trust can be established using cryptographic technologies: attested microphones for capturing the audio and carried through to the final playback via ZK-SNARKs.

In the remainder of the blog post, we’ll describe how to use these tools to fight AI-generated audio. We’ll also describe how the open-source framework zkml can generate computational proofs of audio edits, like noise reduction. To demonstrate this end-to-end process, we’ve simulated the process of capturing audio to performing verified edits. We’ll describe how we did this below!

Cryptographic tools for fighting AI-generated audio

Establishing a chain of trust from the audio capture to final playback requires trusting how the audio is captured and how the audio is edited. We will use cryptographic tools to establish this chain of trust.

Attested microphones for trusted audio capture

The first tool we will use are called attested microphones. Attested microphones have a hardware unit that cryptographically signs the audio signal as soon as it is captured. This cryptographic signature is unforgeable, even with AI tools. With this signature, anyone can verify that the audio came from a specific microphone. In order to verify that audio came from a specific individual, that person can publish the public key of the attested microphone.

Unfortunately, there’s two limitations of attested microphones. The first (which we will address below) is that attested microphones don’t allow you to perform edits on the audio, including edits like noise reduction or cutting out sensitive information. The second is that these attested microphones currently don’t exist, even though the technology is here. We hope that hardware manufacturers consider building attested microphones to combat AI-generated audio!

ZK-SNARKs for verified edits

Once we have the raw audio, there are many circumstances where we want to privately edit the original audio. For example, intelligence agencies can use background noise to identify your location, which compromises privacy. To preserve privacy, we may want to perform edits like removing the background noise or cutting out parts of an interview that might contain sensitive information.

In order to perform these edits, we can use ZK-SNARKs. ZK-SNARKs provide computational integrity. For audio, ZK-SNARKs allow the producer of the audio to privately edit the audio without revealing the original. Similar to cryptographic signatures, ZK-SNARKs are unforgeable, allowing us to extend the chain of trust to edits.

Demonstrating the technology

To showcase the power of attested microphones and ZK-SNARKs, we’ve constructed an end-to-end demonstration of the chain of trust for audio. In our demonstration, we recorded a short 30 second clip, where each of us (Anna, Daniel, and Kobi) recorded on our own microphone. In other words, there are three 30 second clips.

Because attested microphones don’t exist yet, we simulated the attested microphone by signing the individual audio clips with Ethereum wallets. These wallets contain private keys that would be similar to the secure hardware elements in the attested microphone. The signatures we’ve produced are also unforgeable, assuming our wallets aren’t compromised.

During the recording process, Daniel’s microphone picked up some background echo, so we wanted to cut it out and combine the clips into one. We produced a ZK-SNARK that verifies these edits were done honestly from the original audio clips. Furthermore, the ZK-SNARK hides the input audio, so you won’t be able to extract the background noise in Daniel’s clip! This helps preserve privacy.

In the following demo, the final audio file is presented coupled with a proof and a set of signatures. The verification program verifies both, ensuring we know the exact chain of operations performed on the input audio files resulting in the audio you can hear.

Technical deep dive

To understand how our demonstration works at a deeper level, we’ve done a technical deep dive below. You can skip to the conclusion without missing anything!

We’ve outlined the overall architecture below:

As we can see, the first step (after capturing the audio) is to produce the signatures. Since we don’t have attested microphones, we used Ethereum wallet addresses, which are publicly associated with us (Anna, Daniel, and Kobi), to sign hashes of the original audio. Ethereum uses ECDSA, which allows anyone to verify the signatures we produced with our public key. The private key must remain hidden. In hardware, this can be done using trusted enclaves. The hardware manufacturer can destroy the private key after it is placed on the device. By doing so, the private key is inaccessible!

Given the signed input audio, we want to be able to edit them with computational integrity while preserving the privacy of the original audio. Under the random oracle model of hashing, the hashes reveal nothing about the input. We can combine the hashes with ZK-SNARKs to preserve privacy.

ZK-SNARKs allow a prover to produce a proof that a function executed honestly while keeping parts of the input hidden (and selectively revealing certain inputs or outputs). In our setting, we can compute a function that computes the hashes of the inputs and outputs the edited audio from the inputs. By revealing the hashes, we can be assured that the inputs match the recorded audio! We’ve shown what happens within the ZK-SNARK below:

Conclusions

As we’ve seen, attested microphones and ZK-SNARKs can provide a chain of trust for audio while preserving privacy. With the rise of AI-generated audio, we’re seeing an increasing need to establish this chain of trust. We hope that our demonstration will spur hardware manufacturers to consider building attested microphones.

Stay tuned for more posts on this topic as we delve deeper into other tools to fight malicious AI-generated content. And if you’d like to discuss your idea or brainstorm with us, fill out this form and join our Telegram group. Follow me on Twitter for the latest updates as well!

Important note: the code for this demonstration has not been audited and should not be used in production.

Verified Execution of GPT, Bert, CLIP, and more

2023-05-22T00:00:00+00:00

In the current era of AI-driven applications, the use of language models like GPT and BERT is pervasive. These models are the engines behind tasks ranging from chatbots to recommendation systems. However, these AI systems are increasingly executed in opaque ways, such as behind closed APIs.

Because these models are behind closed APIs, users have no guarantees over which models/weights are used. For example, OpenAI recently changed ChatGPT’s behavior (and has on an ongoing basis), leading to speculation that OpenAI is using cheaper models to save costs. However, OpenAI’s models are trade secrets so they have strong incentives to keep them private. How can we verify that these AI models are behaving as claimed without compromising trade secrets?

A few weeks ago, we announced the open-source release of zkml, which allows for the trustless execution of ML models. In this blog post, we’ll recap the capabilities of zkml and how it can be used to verify the outputs of common natural language processing (NLP) models like GPT, BERT, and CLIP, all without revealing proprietary weights or private data.

Currently, we can produce proofs of GPT-2, MobileBert, and CLIP. We’ve released proofs of GPT-2 and CLIP here, with more models coming soon. In the rest of the post, we’ll describe how to do trustless execution of ML models with ZK-SNARKs and how to use zkml to produce these proofs!

Trustless Execution with Zero-Knowledge Proofs

Zero-knowledge proofs (ZKPs) are cryptographic protocols that allow one party to prove to another that they know a specific piece of information without revealing that information. When applied to machine learning (ML) models, ZKPs can allow a model runner to prove that they ran the model correctly without revealing any specifics about the input data or model parameters.

zkml produces ZK-SNARKs, which are succinct (i.e., short) zero-knowledge proofs. For NLP models such as GPT and BERT, zkml can produce these ZK-SNARKs. We’ve previously described how to use ZK-SNARKs to verify the Twitter algorithm and ML models more broadly. Check out those posts for more details!

Applying zkml to GPT, BERT, and CLIP

Let’s take a look at how we can apply zkml to models like GPT, BERT, and CLIP. The GPT series of models have achieved state-of-the-art performance on language tasks. When fine-tuned on human-generated text, GPT can produce state-of-the-art chatbots. Similarly, BERT can be used for a range of NLP tasks. In contrast, CLIP is typically used in conjunction with vision models.

Suppose a user interacts with a service powered by GPT-4, like a chatbot. The chatbot provider can use zkml to prove that the responses are generated by an unaltered version of GPT-4, without disclosing the actual weights of the model or the specifics of the user’s inputs. The process can be simplified:

Generating a Proof: The service provider runs the model with the user’s input and generates a proof that the model was run correctly using zkml.
Proof Verification: The user or an auditor can then verify the proof. If the proof holds, they can be sure that the model ran correctly without needing to know model weights or the input data.

This process can be easily extended to CLIP, BERT, and any other language model.

Trustless Execution in Action

To illustrate the process, let’s take an example where a user is interacting with a GPT powered chatbot. The chatbot provider can use zkml to generate a proof of correct execution.

To use zkml for GPT, we can run the following commands:

# You’ll need to install zkml as described in the instructions here: https://github.com/ddkang/zkml  
# You’ll also need to download the parameters from here: https://drive.google.com/file/d/1bhAYXOzMnAI-tB6VbUkCY7tThQ9L5K6G/view?usp=sharing   
# Place them in the directory params_kzg  
# Because the proving is resource intensive, we’ve provided a proof that you can verify as follows:  
cd examples/nlp/gpt-2  
tar -zxvf vkey.tar.gz  
tar -zxvf public_vals.tar.gz  
cd ../../../  
cargo build --release  
./target/release/verify_circuit examples/nlp/gpt-2/config.msgpack examples/nlp/gpt-2/vkey examples/nlp/gpt-2/proof examples/nlp/gpt-2/public_vals kzg

This process ensures that the chatbot provider is not manipulating the model or the user’s input in any way. Currently, we can produce proofs for GPT-2, MobileBert, and CLIP. Expect more models soon!

Toward a Future of Trustless AI

As AI models become an integral part of our lives, the need for transparency grows more important. Trustless execution of AI models using zkml represents a significant step in this direction. By allowing service providers to prove that they’re running models correctly without revealing any sensitive information, zkml brings us closer to a future where we can fully trust AI systems without compromising trade secrets or privacy.

Stay tuned for more posts on this topic, as we delve deeper into the applications of zkml and other tools for AI transparency and accountability. And if you’d like to discuss your idea or brainstorm with us, fill out this form and join our Telegram group. Follow me on Twitter for the latest updates as well!

Special thanks to Pun Waiwitlikhit, Yi Sun, Tatsunori Hashimoto, and Ion Stoica for their help.

Empowering Users to Verify Twitter’s Algorithmic Integrity with zkml

2023-04-17T00:00:00+00:00

Last week, Twitter open-sourced their algorithm for selecting and ranking the posts in the “For You” timeline. While this is a major step towards transparency, users cannot verify that the algorithm is being run correctly! One major reason is because the weights for the ML models used to rank tweets have been withheld to protect user privacy. However, these weights are a key factor in determining what tweets show up in each user’s customized feed. Without the weights it is difficult to verify what, if any, biases and censorship may have led to specific tweets being shown, leading to calls to reveal all recommendation algorithms. Can we resolve the tensions between privacy and transparency?

Today, we’ll show how Twitter can prove that the Tweets you’re shown are exactly those ranked by a specific, un-altered version of their ML model using our system zkml! Specifically, we show how to balance privacy and transparency by producing proofs that the Twitter algorithm ran honestly without releasing the model weights. In doing so, Twitter can produce trustless audits of their model.

To accomplish this, we use recently developed tools from cryptography called ZK-SNARKs. ZK-SNARKs allow an algorithm runner to produce a proof that some computation happened honestly — anyone can use the proof to check that the algorithm ran correctly without rerunning the algorithm. In our setting, we’ll focus on the key driver of the decision: the ranking ML model. We’ve previously described how to construct ZK-SNARKs for real-world vision models using our open-source framework zkml, so in this post we will discuss how to use our framework to support this core piece of the Twitter timeline.

In the rest of the post, we’ll describe how Twitter’s “For You” page is generated, how to use zkml to prove the Twitter ranking model executed honestly, and how to use our framework in the broader context of verifying the “For You” page. If you haven’t seen ZK-SNARKs before, our explainer for how ZK-SNARKs can be used for ML may be helpful for understanding the rest of the post!

How Twitter’s “For You” page works

The twitter algorithm operates by collecting data, training a model, and using the model to rank posts to show on your “For You” timeline. At a high level, the algorithm:

Generates features from user interactions with the site, such as likes and retweets.
Learns what users would engage with via the ranking model.
Generates a candidate set of tweets to show to users.
Ranks the candidate set of tweets using the learned ranking model.

Twitter produced a helpful diagram to show the overall steps:

The most important part of the algorithm is the heavy ranker (in the middle), which is the ranking model. Despite the open-source release (including the ranker architecture), users have no guarantees as to what Twitter is running! One way for Twitter to show the algorithm is run in production is to release the weights, training data, and inputs. However, this would violate users’ privacy, so Twitter can’t release the inputs or weights. In order to allow Twitter to prove that their algorithm ran honestly, we’ll turn to ZK-SNARKs.

Using zkml to ZK-SNARK Twitter’s ranking model

As we mentioned, ZK-SNARKs can allow Twitter to prove that the model was run honestly. At a high level Twitter can use ZK-SNARKS to commit to a specific version of their ranking model and then publish a proof that when the model is applied to a specific user and tweet, it produces the specific final output ranking. The model commitment (a model signature or hash), proof, as well as per-user and per-tweet input features can be shared with users. See our explainer post for more details on how zkml integrates with the traditional ML workflow.

Once the proof is made available, there is no need for additional trust for this stage of the algorithm: users can verify on their own that the computations were performed as promised. Furthermore the specific committed ranking model can be audited by third parties without revealing its weights publicly.

To do so with zkml, we can simply run the following commands (after building zkml with the instructions in the README), where model.msgpack contains the model weights and inp.msgpack is the input to the model:

# This constructs the proof, which the Twitter engineer would do.  
# For demonstration purposes, we’ve decided to withhold the model  
# weights for the time being!  
./target/release/test_circuit examples/twitter/model.msgpack examples/twitter/inp1.msgpack  
  
# This verifies the proof with only the open-source model signature,   
# which we would do to check our “For You” timeline.  
# You don’t need the model to do the verification!  
# First, we need to decompress the verification key (see our explainer post for details)  
tar -xzvf examples/twitter/vkey.tar.gz -C examples/twitter  
# You’ll also need to download the parameters to the directory `params_kzg` from here: https://drive.google.com/file/d/1vesRlcIiMkFdoiISYVUO4RQgY-6-6M6V/view?usp=share_link   
# We’ve provided the public values in the repository  
./target/release/verify_circuit examples/twitter/config.msgpack examples/twitter/vkey examples/twitter/proof1 examples/twitter/public_vals1 kzg

The actual proofs take a while to generate so we’ve pre-generated them for three examples, which you can verify with the second command. It will take some time to load the data, but the verification itself only takes 8.4ms on a standard laptop!

The final output you’ll see are the probabilities for different sub-model scores that are aggregated into a single final ranking score per tweet. In this example, we’ve converted a trained model and are testing that the circuit evaluates correctly. The actual proofs take a while to generate so we’ve computed them ahead of time here.

What’s important about our framework is that the weights can be hidden while anyone can be convinced from the proof that the execution happened honestly!

Verify your timeline

Now that we can be assured that the ranking algorithm is scoring tweets correctly with zkml, we’ll describe how to use it to verify your timeline. Although the ranking model is the most important component in the algorithm, there are other components that need to be verified as well. We won’t be modeling the other components in this post, but they could be verified by other cryptographic tools, e.g., vSQL or others.

However, there’s still one issue: we need to be assured that Twitter isn’t censoring certain tweets or manually down ranking them. In order to do this, we can “spot check” a given timeline using an interactive protocol.

Let’s say I think my timeline is suspect and that there’s a tweet that I think should rank highly. If Twitter incorporated zkml proofs into their product, I would be able to request proofs of the ranking model execution for tweets in my timeline as well as the suspected censored tweet. Then, I can check to see how the suspected censored tweet ranks compared to the other tweets in my timeline!

If I find that the proof isn’t valid, I’ll have reason to believe that the ranking model (identified by a hash signature in the diagram) Twitter supposedly committed to was altered.

Stay tuned for more!

In this post, we’ve described how to use zkml to verify your Twitter timeline. Our code is open-source and we’re excited to see how it can be used to improve transparency in social media without requiring trust in the complex and opaque systems powering them. Nonetheless, there’s still a lot of work to do to make zkml practical and accessible to everyone. We’ll be working on increasing its efficiency in the coming months.

We’ll also be describing other uses of zkml in the coming weeks — stay tuned! And if you’d like to discuss your idea or brainstorm with us, fill out this form and join our Telegram group. Follow me on Twitter for latest updates as well!

Bridging the Gap: How ZK-SNARKs Bring Transparency to Private ML Models with zkml

2023-04-12T00:00:00+00:00

ML is becoming integrated into our lives, ranging from what we see on social media to making medical decisions. However, ML models are increasingly being executed behind closed APIs. There are good reasons for this: model weights may be unable to be revealed for privacy reasons if they are trained on user data (e.g., medical data) and companies want to protect trade secrets. For example, Twitter recently open-sourced their “For You” timeline ranking algorithm but couldn’t release the weights for privacy reasons. OpenAI has also not released weights for GPT-3 or 4.

As ML becomes integrated with our lives, there’s a growing need for assurances that the ML models have desirable properties and were executed honestly. For ML, we need assurances that the model ran as promised (computational integrity) and that the correct weights were used (while maintaining privacy). For example, for the GPT and Twitter ranking models, we would like to confirm that the model results are consistently unbiased and uncensored. How can we balance the need for privacy (of the user data and model weights) and transparency?

To accomplish this, we can use a cryptographic technique called ZK-SNARKs. ZK-SNARKs have seemingly magical properties of allowing a ML model owner to prove the model executed honestly without revealing the weights!

In the rest of the post, we’ll describe what ZK-SNARKs are and how to use them to balance the goals of privacy and transparency. We’ll also describe how to use our recently open-sourced framework zkml to generate ZK-SNARKs of ML models.

ZK-SNARKs for ML

It’s important to understand one of the key cryptographic building blocks that enable privacy-preserving computations: Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (ZK-SNARK). ZK-SNARKs are a powerful cryptographic primitive that allows one party to prove the validity of a computation without revealing any information about the inputs to the computation itself! ZK-SNARKs also don’t require any interaction beyond the proof and don’t require the verifier of the computation to execute the computation itself.

ZK-SNARKs are also succinct, meaning they are small (typically constant or logarithmic size) relative to the computation! Concretely, for even large models, the proofs are typically less than 5kb. This is desirable since many kinds of cryptographic protocols require gigabytes or more of communication, saving up to up to six orders of magnitude of communication.

Given a set of public inputs (x) and private inputs (w), ZK-SNARKS can prove that a relation F(x,w) holds between the values without revealing the private inputs. For example, a prover can prove they know the solution to a sudoku problem. Here, the public inputs are the starting squares and the private inputs are the remainder of the squares that constitute the solution.

In the context of ML, ZK-SNARKs can prove that the ML model executed correctly without revealing the model weights. In this case, the model weights w are the private input, the model input features F and output O are part of the public input. In order to identify the model, we also include a model commitment C in the public input. The model commitment functions like a hash, so that with high probability if the weights were modified the commitments would differ as well. Thus x = (C,F,O). Then the relation we want to prove is that, for some private weight value w, having commitment C, the model outputs O on inputs F.

If a verifier is then given the proof π (and x), they can verify that the ML model ran correctly:

To give a concrete example, Twitter can prove they ran their ranking algorithm honestly to generate your timeline. A medical ML provider can also provide a proof that a specific regulator-approved model was executed honestly as well.

Using zkml for trustless execution of ML models

We’ve open-sourced our library zkml to construct proofs of ML model execution and allow anyone to verify these proofs. But first, let’s look at what a standard ML provider would do:

As we can see here, the ML consumer provides the inputs but has no assurances that the model executed correctly! With zkml, we can add a single step to provide guarantees that the model executed correctly! As we can see, the ML consumer doesn’t learn anything about the weights:

To demonstrate how to use zkml to trustlessly execute an ML model, we’ll construct a proof of a model that achieves 99.5% accuracy on MNIST, a standard ML image recognition dataset. zkml will generate the proof, but also the proving key and verification key which allows the prover to produce the proof (proving key) and the verifier to verify that the execution happened honestly (verification key). First, to construct the proof, proving key, and verification key, simply execute the following commands:

# Installs rust, skip if you already have rust installed  
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh  
  
git clone https://github.com/ddkang/zkml.git  
cd zkml  
rustup override set nightly  
cargo build --release  
mkdir params_kzg  
  
# This should take ~8s to run the first time and ~4s to run the second time  
./target/release/time_circuit examples/mnist/model.msgpack examples/mnist/inp.msgpack kzg

This will construct the proof, where the proving key is generated during the process. Here, model.msgpack is the model weights and inp.msgpack is the input to the model (in this case, it’s an image of a handwritten 5). The proof generation will also generate the public values x (including the model commitment), which we’ll use in the next step. It will also generate a verification key, as we described above. You’ll see the following output:

  
final out[0] x: -5312 (-10.375)  
final out[1] x: -8056 (-15.734375)  
final out[2] x: -8186 (-15.98828125)  
final out[3] x: -1669 (-3.259765625)  
final out[4] x: -4260 (-8.3203125)  
final out[5] x: 6614 (12.91796875)  
final out[6] x: -5131 (-10.021484375)  
final out[7] x: -6862 (-13.40234375)  
final out[8] x: -3047 (-5.951171875)  
final out[9] x: -805 (-1.572265625)  

The outputs you see here are the logits of the model, which can be converted to probabilities. As we can see, the 5th output is the largest, meaning the model correctly classified the 5.

Given the proof, a verifier can verify the ML model executed correctly without the model weights. Here, the vkey is the verification key, the proof is π, and the public_vals is the public output:

$ ./target/release/verify_circuit examples/mnist/config.msgpack vkey proof public_vals kzg  
  
Proof verified!

Which should show you that the proof verified correctly. Notice the verifier only needs the configuration, verification key, proof, and public values!

Stay tuned!

We’ll have more posts upcoming to describe the applications of zkml in more detail! Check out our open-source repository and if you’d like to discuss your idea or brainstorm with us, fill out this form!

Thanks to Yi Sun for comments on this blog post.

Open-sourcing zkml: Trustless Machine Learning for All

2023-04-03T00:00:00+00:00

We’re excited to announce the open-source release of zkml, our framework for producing zero-knowledge proofs of ML model execution. zkml builds on our earlier paper on scaling zero-knowledge proofs to ImageNet models but contains many improvements for usability, functionality, and scalability. With our improvements, we can verify execution of models that achieve 92.4% accuracy on ImageNet, a 13% improvement compared to our initial work! zkml can also prove an MNIST model with 99% accuracy in four seconds.

In this post, we’ll describe our vision for zkml and how to use zkml. In future posts, we’ll describe several applications of zkml in detail, including trustless audits, decentralized prompt marketplaces, and privacy-preserving face ID. We’ll also describe the technical challenges and details behind zkml. In the meantime, check out our open-source code!

Why do we need trustless machine learning?

Over the past few years, we’ve seen two inescapable trends: more of our world moving online and ML/AI methods becoming increasingly powerful. These ML/AI technologies have enabled new forms of art and incredible productivity increases… However, these technologies are increasingly concealed behind closed APIs.

Although these providers want to protect trade secrets, we want to have assurances of their models: that the training data doesn’t contain copyrighted material or that it isn’t biased. We also want assurances that a specific model was executed in high-stakes scenarios, such as medical industries.

In order to do so, the model provider can take two steps: commit to a model trained on a hidden dataset, and provide audits of the hidden dataset after training. In the first step, the model provider releases proofs of training on a given dataset and a commitment to the weights at the end of the process. Importantly, the weights can be kept hidden! By doing so, any third party can be assured that the training happened honestly. Then, the audit can be done using zero-knowledge proofs over the hidden data.

We have been imagining a future where ML models can be executed trustlessly. As we’ll describe in future posts, trustless execution of ML models will enable a range of applications:

Trustless audits of ML-powered applications, such as proving that no copyrighted images were used in a training dataset, as we described above.
Verification that specific ML models were run by ML-as-a-service providers for regulated industries.
Decentralized prompt marketplaces for generative AI, where creators can sell access to their prompts.
Privacy-preserving biometric authentication, such as enabling smart contracts to use face ID.

And many more!

ZK-SNARKs for trustless ML

In order to trustlessly execute ML models, we can turn to powerful tools in cryptography. We focus on ZK-SNARKs (zero-knowledge succinct non-interactive argument of knowledge), which are tools that allow a prover to prove an arbitrary computation was done correctly using a short proof. ZK-SNARKs also have the amazing property that the inputs and intermediate variables (e.g., activations) can be hidden!

In the context of ML, we can use a ZK-SNARK to prove that a model was executed correctly on a given input, while hiding the model weights, inputs, and outputs. We can further choose to selectively reveal any of the weights, inputs, or outputs depending on the application at hand.

With this powerful primitive, we can enable trustless audits and all the other applications we described above!

zkml: a first step towards trustless ML

As a first step towards trustless ML model execution for all, we’ve open-sourced zkml. To use zkml, consider proving the execution of an MNIST model by producing a ZK-SNARK. Using zkml, we can run the following commands:

# Installs rust, skip if you already have rust installed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

git clone https://github.com/ddkang/zkml.git
cd zkml
rustup override set nightly
cargo build --release
mkdir params_kzg

# This should take ~8s to run the first time and ~4s to run the second time
./target/release/time_circuit examples/mnist/model.msgpack examples/mnist/inp.msgpack kzg

On a regular laptop, the proving time is as little as 4 seconds and consumes ~2GB of RAM. We’re also the first framework to be able to compute ZK-SNARKs at ImageNet scale. As a sneak preview, we can achieve non-trivial accuracy on ImageNet in under 4 minutes and 92.4% in under 45 minutes of proving:

We’ve increased accuracy by 13%, decreased proving cost by 6x, and decreased verification times by 500x compared to our initial work!

Our primary focus for zkml is high efficiency. Existing approaches are resource-intensive, taking days to prove small models, many gigabytes of RAM, or producing large proofs. We’ll describe how zkml works under the hood in future posts.

We believe that efficiency is critical because it enables the future where anyone can execute ML trustlessly and we’ll continue pushing towards that goal. Currently, models like GPT-4 and Stable Diffusion are out of reach and we hope to change that soon!

Furthermore, zkml can enable trustless audits and all of the other applications we’ve mentioned! In addition to performance improvements, we’ve also been working on new features, including enabling proofs of training and trustless audits. We’ve also been adding features for models beyond vision models.

And there’s much more…

In this post, we’ve described our vision of zkml and how to use zkml. Check it out yourself!

There’s still a lot of work to be done to improve zkml. In the meantime, join our Telegram group to discuss your ideas for improving or uzing zkml. You can check out our code directly on GitHub, but we’d also love to hear your ideas for how to build with zkml. If you’d like to discuss your idea or brainstorm with us, fill out this form. We’ll be developing actively on the GitHub and we’re happy to accept contributions!

In upcoming posts, we’ll describe how to use zkml to:

Perform trustless audits of ML-powered applications
Build decentralized prompt marketplaces for generative AI
Enable privacy-preserving biometric ID

We’ll also describe the inner workings of zkml and its optimizations. Stay tuned!

Thanks to Ion Stoica, Tatsunori Hashimoto, and Yi Sun for comments on this post.

Attacking ChatGPT with Standard Program Attacks

2023-02-14T00:00:00+00:00

Large language models (LLMs) have become increasingly powerful in their capabilities: they can now pass the Bar, answer clinical questions, and write code. Their capabilities are driven by larger models but also new capabilities in instruction following, in which LLMs are trained to follow instructions. Instruction-tuned models outperform standard LLMs on a range of benchmarks. Unfortunately, as these LLMs increase in capabilities, the potential for their dual-use, or misuse, and the economic incentives for dual-use both increase.

In our recent paper, we show that instruction-following LLMs can generate malicious content effectively and economically, and that dual-use is difficult to prevent. To generate this malicious content, we demonstrate that attacks based on traditional computer security attacks can bypass in-the-wild defenses against misuse. Although our attacks modify or obfuscate prompts, we show that these LLMs can generate natural and convincing personalized malicious content. Our results suggest that the new capabilities of these LLMs increase the economic incentives for adversaries to misuse LLMs and circumvent defenses, given that the cost of operation is estimated to be substantially lower compared to human effort alone.

In addition to our work, there has been a lot of great work by folks including @goodside and others on prompt injections and other forms of attacks! Our work aims to highlight how these attacks can be connected to classic security settings and economic incentives around attacking LLM providers.

In the rest of the post, we describe our attacks from the lens of computer security and our analysis of the effectiveness and economics of malicious generations. Read our full paper for extended details!

Black box LLM setting

In our work, we study the black box LLM setting, where an API provider serves text generations from an LLM. These API providers are becoming increasingly popular due to the improved capabilities of LLMs and in particular instruction-following LLMs.

As these LLMs have improved in capabilities, so has their potential for misuse. API providers have created mitigations against such misuse, including input filters, output filters, and useless generations. We show two examples below:

An example of the input and output filter triggering.

An example of a useless generation.

Our setting raises two natural questions. Is it possible to produce prompts that bypass defenses that an API provider may put in place? And for malicious actors that are economically motivated, can these attackers use these APIs to produce malicious content economically?

Program attacks and their applicability to LLMs

We first studied if it is possible to bypass in-the-wild defenses by LLM providers. Our first observation was that instruction-following LLMs behave closer to traditional programs. So, in order to bypass in-the-wild defenses, we designed three classes of attacks inspired by traditional computer program attacks.

Obfuscation

In standard program attacks, obfuscation changes the program bytecode to evade detection mechanisms, such as hash-bashed or fingerprinting detection methods. Similarly, we can obfuscate prompts to evade input filters. For example, we can use typos or synonyms to obfuscate LLM prompts.OpenAI’s content filters aim to filter misinformation and disinformation about COVID-19. Instead of using the term “COVID-19,” an attacker could use the term “CVID.”

Code injection/payload splitting

Another standard program attack is code injection, in which the malicious code is executed by forcing the program to process invalid data. One standard way of performing code injection is to split the malicious payload and force the program to execute the malicious code by changing the instruction pointer.

Similarly, we can break the prompt into multiple pieces and have the LLM reassemble them. As a concrete example, consider the following prompt:

At the time of conducting our experiments, this prompt bypassed both the input and output filters.

Virtualization

One sophisticated attack on traditional computer programs is to embed a virtual machine in the malicious payload and execute the malicious code within the virtual machine. A recent nation-state attack on the iPhone used virtualization to target journalists via a zero-day exploit. The attack operates by embedding the payload in image metadata and forcing the execution of the virtual machine.

We can adapt this attack to LLMs by encoding the desired attack within a “virtual machine.” Similar to how a standard VM executes instructions that are emulated by the base operating system/hardware unit, an LLM can “emulate” instructions by prompts constructing fictitious scenarios. For example, we can use the following series of prompts to bypass filters:

Feasibility of defenses

Our attacks are targeted towards OpenAI’s systems. Certain defenses may be able to defend against the specific instantiations of the attacks, such as robust encodings to defend against typos. Nonetheless, our attacks raise important questions regarding the security of model providers in the wild.

For example, both our indirection and virtualization attacks reveal an important property that is apparent in standard programs: since programs are Turing complete, the behavior of sufficiently complex programs can only be predicted by executing the program. A formal version of this statement follows from the undecidability of the halting problem. In LLMs, this suggests that input filtering is limited in its ability to stop attacks.

Effectiveness of malicious generations

Attacks inspired by traditional security highlight the cat-and-mouse nature of preventing malicious use, which has been the case for standard computers for 75 years. Many traditional computer hackers dedicate substantial resources to finding new attacks due to the economic benefits of doing so. Will malicious actors similarly be motivated to bypass LLM defenses?

We believe the answer is yes, since jailbreaking LLMs allows malicious actors to derive considerable economic benefits. For example, we asked ChatGPT to generate a scam email to con an unsuspecting citizen to purchase fake tickets to a Taylor Swift concert:

As we can see, ChatGPT can generate convincing emails and be personalized to the response. We also tried four other diverse responses from the citizen and found that ChatGPT is successfully able to personalize the responses!

To further study the effectiveness of malicious generations, we picked five scams from the US government’s list of common scams and generated emails based on these scams.

In our first experiment, we generated a malicious email, wrote a reply, and responded to the reply. We used five models: ChatGPT, text-davinci-003, text-ada-001, davinci, and GPT2-XL. We measured the convincingness and consistency of the initial email and the response on a five-point Likert scale using human raters:

As we can see, the large, instruction-following models strongly outperform baselines!

We ran a similar experiment for personalized scams, where we targeted the initial email to a fictitious individual’s specific circumstance and demographic information. For the personalization experiment, we measured the personalization, convincingness, consistency, and fluency on a five-point Likert scale using human raters:

As before, the large, instruction-following models strongly outperformed baselines.

Our results show that large, instruction-following LLMs can produce convincing and personalized scams. As LLMs increase in capabilities, so do the economic incentives to use them for malicious purposes.

Instruction-Following LLMs Behave like Programs

The property of instruction-following LLMs we leverage in our attacks is their program-like behavior. This observation is also key in a class of traditional program attacks called return-oriented programming (ROP). In ROP, an attacker gains control of the call stack and executes instructions already present in memory. These instructions are often isolated into “gadgets” and can be chained together.

Similarly, new LLMs also have several gadgets including:

String concatenation
Variable assignment
Sequential composition (i.e., following a sequence of instructions)
Branching

We can illustrate the first three capabilities with the following prompt and generation:

And branching with the following prompt and generation:

These capabilities give LLMs the ability to emulate complex computation (memory, assignment, branching, and individual operations, i.e., “opcodes”). We use these capabilities in our attacks.

Economic Analysis

To better understand the economic feasibility of generating such malicious text, we estimated the cost of generating the text via human effort and via LLM APIs.

We first estimated the cost of human-generated text by comparing the cost of call center operators. In 2007, the lowest hourly wage of a call center employee is around $1.24. Adjusting for inflation gives an estimate of around $1.80. If we estimate that it takes 5–15 minutes to generate a personalized scam, we arrive at an estimate of $0.15 to $0.45 per email generation.

We then estimated the cost of generating text via ChatGPT. Surprisingly, ChatGPT is cheaper than even text-davinci-003, costing only $0.00064 per email! Using text-davinci-003 would cost $0.0064.

Although there is uncertainty in our estimates, ChatGPT is likely substantially cheaper than using human labor. This is even when our estimates exclude other expenses, including facilities, worker retraining, and management overheads! As hardware and software optimizations continue to advance, the costs of LLMs will likely drop.

Conclusions

As we have shown, improvements in LLMs allow for convincing and economically feasible generations of malicious content (scams, spam, hate speech, etc.) without any additional training. This dual use is difficult to prevent as instruction-following LLMs become closer to standard programs: attack mechanisms against standard programs begin to apply to LLMs. We hope that our work spurs further work on viewing LLMs through the lens of traditional computer security, both for attacks and defenses.

See our paper for full details!

Note: We updated our blog post with OpenAI’s new API costs.

zk-img: Fighting Deepfakes with Zero-Knowledge Proofs

2022-11-13T00:00:00+00:00

Over the past few years, the use of deepfakes for malicious purposes has exploded. Deepfakes have been used to steal funds from unsuspecting investors and even to convince soldiers to surrender. The rise of deepfakes has partially been driven by the rise of generative artificial intelligence (AI) methods, like Stable Diffusion. As these AI methods become more powerful, an important question emerges: how can we trust the validity of the images we see?

Example of a deepfake.

There have been some steps towards verifying that an image was taken by a real camera. Attested cameras can prove that an image was taken by a particular sensor by using digital signatures. However, the majority of photos released on the internet are edited to remove sensitive information and to improve legibility of the images. These edits are not attested by the camera and must be verified some other way. Recent work has proposed verifying these image edits using zero-knowledge proofs, in particular ZK-SNARKs (zero-knowledge succinct non-interactive argument of knowledge). ZK-SNARKs can prove that the edits on the image were legitimate.

Unfortunately, this work on ZK-SNARKs for images has several drawbacks. First, they require revealing the original or intermediate images.¹² This breaks the privacy of the original image and can potentially leak sensitive information! Second, they are impractical, operating on images smaller than 128x128.¹² Third, they require custom cryptographic arguments¹³ or trusted third parties.³

To address these issues, we have built zk-img, which allows image edits to be verified with no assumptions on trust while keeping the original/intermediate images private. zk-img is also efficient; it is the first library that can privately and securely verify arbitrary transformations on HD images.

In this blog post, we’ll briefly describe the internals and illustrate the usage of zk-img. For further details, please read our paper.

Using zk-img

Before we describe zk-img, we’ll first describe why zk-img is needed.

Hiding the original image

In most cases, the original image must be hidden. The original image may contain sensitive or embarrassing information, which should be edited. For example, the piglet below is carrying sensitive information in its mouth:

How can we keep the card hidden? (image from here.)

How can we keep the original image hidden while simultaneously allowing an image consumer to verify that the transformations were done honestly?

zk-img accomplishes this by computing the hash of the original image and the transformations themselves inside the ZK-SNARK. It only reveals the original image’s hash and the output image. Since ZK-SNARKs allow a verifier to be convinced the computation was done correctly, the image consumer can verify that the hash was computed correctly as well as transformations. We illustrate the process below:

Here, H1 and H2 are the hashes of the original image and intermediate image (after cropping and resizing). Since only the hash of the intermediate image is released, privacy is preserved.

Hiding the original and output image

In some cases, both the original and output image must be hidden. For example, the image producer might want to attest that they took an image before some point in time and release the hashes before releasing the output image. A biometric identification system may wish to hide the original and edited image, but still be able to prove that the images came from an attested source.

zk-img accomplishes this by computing the hashes of the original and transformed images. It only reveals the hashes, and not the images.

How zk-img works

Now that we’ve described how to use zk-img in a variety of scenarios, we’ll describe how zk-img works internally.

From an application developer perspective, zk-img takes as input an image from an attested camera, a sequence of image transformations, and whether or not the final image should be revealed. Given these inputs, zk-img produces a ZK-SNARK of the transformation.

We’ve built zk-img leveraging the recent developments in proving systems, which have dramatically improved in efficiency and usability. In particular, we built zk-img using the halo2 library. Internally, zk-img has a standardized interface for image transformations so they can be composed. This interface allows the addition of image transformations beyond what is currently implemented in zk-img.

The standardized interface for zk-img is critical as it allows a registry of valid transformations that users can trust. We have currently implemented the following transformations:

Developers can add other implementations to zk-img’s registry!

In order to implement these transformations efficiently, we leverage features from the Plonkish arithmetization, which halo2 is based on. See our paper for details.

Benchmarks

We benchmarked zk-img for image transformations on HD images, which is over 50x the size of previous work in this space. We first benchmarked zk-img on input-privacy preserving transformations:

Although the proof times are high, the verification times are as low as 4.59 ms! The verification costs are low enough to perform them on device.

The costs increase when keeping the output image hidden as well:

However, the verification times still remain small, taking at most 9.32 ms.

In our paper, we show that zk-img is orders of magnitude faster than prior work, up to 112x faster for proving and 94x faster for verifying.

Conclusion

The world is becoming increasingly connected and information is spreading faster than ever. Just this year, images from the front line in the Ukraine conflict spread rapidly over social media. Mixed in with real information, there were malicious images.

Imagine if every image on the internet was attested. Social media platforms could indicate which images were attested and which ones were not. Furthermore, anyone could see and verify on their own computer if an image came from a real camera!

As a first step towards this vision, we’ve built zk-img. zk-img can securely and privately attest to image transformations while keeping the original image hidden. By optimizing these transformations, zk-img can be orders of magnitude faster than prior work. For more details, see our paper and look out for our code release soon!

https://ieeexplore.ieee.org/document/7546506 ↩ ↩² ↩³
https://medium.com/@boneh/using-zk-proofs-to-fight-disinformation-17e7d57fe52f ↩ ↩²
https://eprint.iacr.org/2020/1579 ↩ ↩²

Trustless Verification of Machine Learning

2022-10-18T00:00:00+00:00

Machine learning (ML) deployments are becoming increasingly complex as ML increases in its scope and accuracy. Many organizations are now turning to “ML-as-a-service” (MLaaS) providers (e.g., Amazon, Google, Microsoft, etc.) to execute complex, proprietary ML models. As these services proliferate, they become increasingly difficult to understand and audit. Thus, a critical question emerges: how can consumers of these services trust that the service has correctly served the predictions?

To address these concerns, we have developed the first system to trustlessly verify ML model predictions for production-level models. In order to do so, we use a cryptographic technology called ZK-SNARKs (zero-knowledge succinct non-interactive argument of knowledge), which allow a prover to prove the result of a computation without revealing any information about the inputs or intermediate steps of the computation. ZK-SNARKs allow an MLaaS provider to prove that the model was executed correctly post-hoc, so model consumers can verify predictions as they wish. Unfortunately, existing work on ZK-SNARKs can require up to two days of computation to verify a single ML model prediction.

In order to address this computational overhead, we have created the first ZK-SNARK circuit of a model on ImageNet (MobileNet v2) achieving 79% accuracy while being verifiable in 10 seconds on commodity hardware. We further construct protocols to use these ZK-SNARKs to verify ML model accuracy, ML model predictions, and trustlessly retrieve documents in cost-efficient ways.

In this blog post, we’ll describe our protocols and how we constructed the ZK-SNARK circuit in more detail. Further details are also in our preprint.

Using ZK-SNARKs for trustless applications

Building on our efficient ZK-SNARKs, we also show that it’s possible to use these ZK-SNARKs for a variety of applications. We show how to use ZK-SNARKS to verify ML model accuracy. In addition, we also show that ZK-SNARKs of ML models can be used to trustlessly retrieve images (or documents) matching an ML model classifier. Importantly, these protocols can be verified by third-parties, so can be used for resolving disputes.

First, consider the setting where a model provider (MP) has a model they wish to serve to a model consumer (MC). The MC wants to verify the model accuracy to ensure that the MP is not malicious, lazy, and or erroneous (i.e., has bugs in the serving code). To verify model accuracy, the model provider (MP) will commit to a model by hashing its weights. The model consumer (MC) will then send a test set to the MP, on which the MP will provide outputs and ZK-SNARK proofs of correct execution. By verifying ZK-SNARKs on the test set, MC can be confident that MP has executed the model correctly. After the model accuracy is verified, MC can purchase the model or use the MP as an MLaaS provider. In order to ensure that both parties are honest, we design a set of economic incentives, with details in our preprint.

We instantiated our protocol with our ZK-SNARKs. To verify accuracy of an ML model within 5% costs $99.93. For context, collecting an expert-annotated dataset can cost as much $85,000. Our protocol adds as little as 0.1% overhead in this scenario.

Second, consider the setting where a judge has ordered a legal subpoena. This may occur when a plaintiff requests documents for legal discovery or when a journalist requests documents under the Freedom of Information Act (FOIA). When the judge approves the subpoena, the responder must divulge documents or images matching the request, which can be specified by an ML model. In order to keep the remainder of the documents private, the requester can only divulge the specific documents as follows. The responder commits to the dataset by producing hashes of the documents, the requester subsequently sends the model to the responder, and finally the responder produces ZK-SNARK proofs of valid inference on the documents. The responder will send only the documents that match the ML model classifier.

Constructing the first ImageNet-scale ZK-SNARK

As we mentioned, ZK-SNARKs allow a prover to prove the result of a computation without revealing any information about the inputs or intermediate steps of the computation. ZK-SNARKs have several non-intuitive properties that make them amenable for verifying ML models:

They have succinct proofs, which can be as few as 100 bytes (for non-ML applications) and as few as 6 kB for the ML models we consider.
They are non-interactive, so the proof can be verified by anyone at any point in time.
A prover cannot generate invalid proofs (knowledge soundness) and correct proofs will verify (completeness).
They are zero-knowledge: the proof doesn’t reveal anything about the inputs (the model weights or model inputs in our setting) beyond the information already contained in the outputs. Constructing most ZK-SNARKs involves two steps: arithmetization (turning the computation into an arithmetic circuit, i.e., a system of polynomial equations over a large prime field) and using a cryptographic proof system to generate the ZK-SNARK proof.

To construct ZK-SNARKs for ML models that work at ImageNet-scale, we use recent developments in proving systems, which have dramatically improved in efficiency and usability in the past few years. We specifically use the halo2 library. Prior work for DNN ZK-SNARKs uses the Groth16 proving system or sum-check based systems specific to neural networks. Groth16 is less amenable for DNN inference due to the non-linearities as they cannot easily be represented with quadratic constraints. Furthermore, neural network-specific proving systems do not benefit from the broader ZK proving ecosystem. They currently have worse performance than our solution of leveraging the advances in the ZK ecosystem.

Scaling out ZK-SNARKs in halo2 still requires several advances in efficient arithmetization. We design new methods of arithmetizating quantized DNNs, performing non-linearities efficiently via lookup arguments, and efficiently packing the circuits. Please see our preprint for more details!

We constructed ZK-SNARKs of MobileNet v2 on ImageNet. By varying the complexity of the models, we can trade off accuracy against verification time. We constructed ZK-SNARK proofs for inference for a variety of models and show the empirical tradeoff of accuracy and verification time in the plot below:

As we can see, our models can achieve an accuracy of 79% while taking only 10s to verify on commodity hardware!

Conclusion

In our recent paper, we’ve scaled ZK-SNARKs on ML models that achieve high accuracy on ImageNet. Our ZK-SNARKs constructions can achieve 79% accuracy while being verifiable within 10s. We’ve also described protocols for using these ZK-SNARKs to verify ML model accuracy and trustlessly retrieve documents. Please see our preprint for more information. Also be on the lookout for our open-source release!

Installing CUDA 10.2, CuDNN 7.6.5, TensorRT 7.0 on AWS, Ubuntu 18.04

2020-01-02T00:00:00+00:00

How to install CUDA 10.2, CuDNN 7.6.5, TensorRT 7.0 in the AWS T4 instance.

Step 0: AWS setup (~1 minute)

Create a g4dn.xlarge AWS instance. Attach at least 30 GB of HDD space with Ubuntu 18.04.

Step 1: Installing CUDA (~5.5 minutes)

You can also install CUDA directly from the offline installer, but this is a little easier.

sudo apt update
sudo apt upgrade -y

mkdir install ; cd install
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda-10-2

Step 2: Installing CuDNN (~2 minutes)

Download CuDNN here (BOTH the runtime and dev, deb). Use version 7.6.5.

scp the deb files to AWS 3.

sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.2_amd64.deb

Step 3: Installing TensorRT (~2 minutes)

Download TensorRT here. Use version 7.0.

scp the deb files to AWS. 3.

sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.2-trt7.0.0.11-ga-20191216_1-1_amd64.deb
sudo apt update
sudo apt install tensorrt libnvinfer7

Step 3.5: Add to .bashrc

I don’t actually know if this step is required, but it might be helpful for other things.

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=$CUDA_HOME/lib64:$DYLD_LIBRARY_PATH
export PATH=$CUDA_HOME/bin:$PATH
export C_INCLUDE_PATH=$CUDA_HOME/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=$CUDA_HOME/include:$CPLUS_INCLUDE_PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export LD_RUN_PATH=$CUDA_HOME/lib64:$LD_RUN_PATH

Installing CUDA 10.1, CuDNN 7.6.3, TensorRT 5.0.1 on AWS, Ubuntu 18.04

2019-09-19T00:00:00+00:00

How to install CUDA 9.2, CuDNN 7.2.1, PyTorch nightly on Google Compute Engine. I expect this to be outdated when PyTorch 1.0 is released (built with CUDA 10.0).

This uses Conda, but pip should ideally be as easy.

Step 0: GCP setup (~1 minute)

Create a GCP instance with 8 CPUs, 1 P100, 30 GB of HDD space with Ubuntu 16.04. Turn off host migration (GPU jobs can’t be resumed).