Home

Awesome

zkPass Proof of Accredited Investorship

<img src="https://raw.githubusercontent.com/l2iterative/zkpass-accredited-investors/main/title.png" align="right" style="margin: 20px;" alt="a card game in which players build a modern city" width="300"/>

We have a partnership with zkPass to create an interactive proof for IRS-reported taxable income from the IRS website (https://www.irs.gov), which is then used to establish the accredited investorship, through the most commonly used financial criteria for individuals:

Income over $200,000 (individually) or $300,000 (with spouse or partner) in each of the prior two years, and reasonably expects the same for the current year

This can be done with privacy and integrity by having the users interact with the data requestor as follows:

This can be illustrated with the following diagram.

flowchart LR
  A[Login to IRS] --> B{zkPass<br/>3P-TLS<br/>protocol}
  B --> C[Retrieve source-authenticated<br/>account transcripts]
  C --> D{zkPass<br/>data processing<br/>protocol}
  D --> E[Proof of the income data in<br/>the account transcripts]

Background with zkPass

zkPass is a full-stack solution for data ownership. It consists of various tools and applications that enable verifiable data sharing, with privacy and integrity guarantees.

Currently, their testnet version already supports a long list of data-feeds, including internet companies, traditional industry, governments.

This repository means to add IRS to the list, but the techniques present here (PDF proofs) can be generalized to a lot of settings.

Further development of our PDF proofs would enable such a big class of applications.

For IRS, we rely on the zkPass 3P-TLS protocol to prove the internet connections with the IRS website, which works as follows.

This protocol has been studied for many years, all the way starting from TLSNotary more than a decade ago (now, an Ethereum Foundation-funded PSE project). Academic work including BlindCA (IEEE S&P 2019), DECO (ACM CCS 2020), Oblivious TLS (CT-RSA 2021), MPCAuth (IEEE S&P 2023), and DiStefano from Brave Browser has moved this forward.

Note: zkPass also has a version of 3P-TLS protocol, implemented in their TransGate extension, that additionally secret-shares the TLS keys among the user and the validator. We found it not necessary in most of today's network environment (IP spoofing in public network is near impossible with additional security mechanisms such as cloudflare IP hiding and modern port randomization, now in Linux), and its overhead does not work well with users with slow network connections, such as users in certain firewalled regions.

RISC Zero backend for zkPass

As part of our partnership, we are working with zkPass specifically on the RISC Zero backend. Here, we briefly compare it with the existing IZK backend and the Groth16 backend.

IZK and RISC Zero are both more generalized and performant than Groth16. Especially, Groth16 is not suitable for computation that does not have a fixed pattern (such as parsing PDF) and cannot be easily parsed into a circuit. IZK and RISC Zero do not suffer from this limitation.

IZK and RISC Zero, however, are close competitors.

Nevertheless, there are two fundamental differences between IZK and RISC Zero.

We have been looking at the IZK area for a while—see here for our presentation at the Decompute conference, which was during Token2049 Singapore 2023.

<p align="center"> <a href="https://github.com/l2iterative/zkpass-accredited-investors/blob/main/readme/izktalk.pdf"><img src = "readme/izktalk_cover.png" width="400" /></a> </p>

Implementation of the PDF proofs

The PDF proof protocol in this repository requires very little domain expertise in zero knowledge. In fact, we wrote the entire thing in Rust, using existing Rust crates—md5, rc4, libflate—out of the box without any RISC-Zero-specific optimization, and then we copy-pasted the same Rust code into RISC Zero and it works. One can cross-check irs/src/test.rs and irs0/methods/guest/src/main.rs for more detail.

The current implementation of the PDF proofs consists of the following steps.

flowchart LR
  A[parsing] --> B[decryption]
  B --> C[decompression]

And the rest of the code is about walking through the body object of the IRS account transcripts. This can be illustrated with the following figure. Our example focuses on two fields. One can look up more fields if needed, which would not contribute to much overhead, as most of the RISC-V cycles are spent on key derivation and decompression.

<p align="center"> <img src = "readme/pdf-parse.png" width="500" /> </p>

Proof generation on my Mac Studio (with M2 Ultra chip) takes about 13s.

Testing the code

To run the code in irs/ or irs0/ folder, an IRS account transcript is needed. The one we used for internal testing is the author's real 2022 IRS tax transcript, and he is reluctant to include it in the public GitHub repository. US residents should find little difficulty in obtaining an IRS account transcript from the online account. If you sincerely need one for testing but could not get a version, please reach out to the author through weikeng.chen@l2iterative.com.

Future work

Future optimization over PDF proofs is very plausible. In fact, part of the proof generation can be delegated, by having the user shares a redacted version of the PDF, and the user only handles a fraction of the proof generation that is related to the sensitive data in the unredacted version of the PDF (which is like a finishing touch). We are keen to formalize this as "patchwork proofs".

License

This work is a partnership between L2 Iterative and zkPass, with a focus to integrate the RISC Zero backend into zkPass.

The code in this repository, at the moment, is specific to the demo of proofs of IRS account transcripts. We have not used much third-party code, so we would like to license it under MIT or Apache 2.0. Future development, though, with the introduction of new code, may suggest a different license, and it would be updated in future versions of this repository.