Home

Awesome

Storage With Access Negotiation (SWAN)

(yet another bird upholding privacy)

<p align="center"> <img width="60%" height="60%" src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/CygneVaires.jpg/640px-CygneVaires.jpg"><br> <a href="https://en.wikipedia.org/wiki/Swan">Image of a Swan from Wikipedia</a> </p>

We propose to generalize Turtledove and First-party sets to enable full profile based targeting while remaining compliant with the chromium privacy model. More precisely, the goal is to be able to target users based on behavioral data collected across domains and/or parties while making sure this data never leaves the browser.

Most importantly, this proposal addresses the issue of the 3rd-party data access restriction that is caused by the removal of the 3rd-party cookies which is particularly problematic for the market actors that have limited first-party data. We propose to extend the notion of first-party sets with a notion of third-party sets to enable in-browser data collaborations which are negotiated between parties.

In the following, we use the notion of "domain" and "party" as defined in the First-party sets proposal.

The capabilities of Turtledove

In a nutshell, the Turtledove framework consists of the following steps to show a targeted ad:

From our perspective, the Turtledove approach is well suited to target users for the following types of targeting:

Since interest groups can be computed using data of only one domain, the semantics of these groups are necessarily related/constrained to the activity on a domain or its content. In other words, interest groups are not well suited to model user properties that can only be computed using behavioral cross-domain data. For example, estimating if a user is male or female cannot simply be computed using data from "wereallylikeshoes.com": the group will be biased toward women and there is no way to compensate the bias by using data collected on other domains.

Generalization in a nutshell

We propose to introduce in the browser a "sandboxed private storage" that has limited read capabilities. The role of the storage is to keep a profile that consists of data collected across domains. Then, only pre-registered scripts with a constrained signature are allowed to read this data and to output interest groups. No other read/write capabilities are allowed for these scripts. The rest remains identical to Turtledove. In the following, we use the term "audience" instead of "interest group" to better reflect the increased targeting capabilities. The pre-registered script is called an audience definition script.

The following diagram gives an overview of the proposal.

<p align="center"> <img width="70%" height="70%" src="./overview.svg"> </p>

The difference to Turtledove is that partial profiles are registered in the browser, not audiences (interest groups). Instead, the audience memberships are computed directly in the browser. One partial profile consists of several items/attributes that are collected on one domain only.

The Turtledove bidding process that is used to select an ad is not affected by this proposal. Next, we discuss how the browser controls the access to partial profiles and their items.

Access to the private storage

As depicted in the diagram, the audience definition script always has access to the partial profiles that originates from the same party (the audience definition was registered on a domain of that party). Which domains belong to the same party are defined by the server side first-party-set declaration which is known to the browser as described in the proposal.

As explained so far, it is possible to perform targeting based on client-side first-party profiles. This is similar to the scenario where First-part-sets and Turtledove are implemented simultaneously with the exception that the profile is server sided (assuming first-party-sets are used to implement cross-domain first-party cookies).

To increase data sharing capabilities, we propose to naturally extend the first-party-set proposal to also support third-party declarations. Similarly to first-party sets that enable browsers to understand the first-party relationships between domains, third-party sets enable browser to understand relationships between parties. With this extension, the involved domains depicted in the diagram shall be able to serve the following resources:

https://a.com/.well-known/first-party-set
{
  "owner": "a.com",
  "version": 1,
  "members": ["b.com"],
  ...
}

https://b.com/.well-known/first-party-set
{ "owner": "a.com" }

https://c.com/.well-known/third-party-set
{
  "owner": "c.com",
  "version": 1,
  "members": ["a.com"]
}

where a.com and c.com are owners of their respective first-party set as defined in the proposal. In the context of this proposal, this makes it possible to grant the appropriate reading permissions to audience definition scripts to also access third-party items stored in the private storage. This is depicted in the following diagram:

<p align="center"> <img width="70%" height="70%" src="./overview2.svg"> </p>

Next, we briefly show how to programmatically use the private storage to perform targeting based on a cross-party profile.

API Example Flow

We describe an illustrative scenario where a cross-party profile is built in the browser as a user anonymously navigates the web, i.e., without using a login.

As I anonymously browse "weReallyLoveShopping.com" my behavior reveals that I am interested in athletic shoes. The online shop writes this information in the private storage:

window.privateStorage.setItem('interests', 'athletic-shoes');

I also regularly and anonymously visit the publisher "myLocalNewspaper.com". The domain is not owned by the same party as "weReallyLoveShopping.com" but the latter is declared as a third-party of the former. Based on the pages I read, the publisher estimates a probability of 0.3% that I am a female and writes this to the private storage:

window.privateStorage.setItem('femaleProb', 0.3);

Note that privateStorage does not provide reading capabilities at this point (i.e. the method privateStorage.getItem is not accessible).

In addition, domain owners always have the possibility to register audience definition scripts. In our example, the publisher registers an audience definition script as follows:

const maleAthleticShoes =
  {'name': 'femaleAthleticShoesAudience',
   'readers': ['first-ad-network.com',
               'second-ad-network.com'],
    'script': "\
    	return privateStorage.getItem('femaleProb') < 0.5 && \
             privateStorage.getItem('interests') == 'athletic-shoes')"
  };
window.privateStorage.addAudienceDefinition(maleAthleticShoes);

The audience definition scripts must return a boolean value to indicate audience membership and are evaluated before fetching the ad bundles. In this example, the script needs to access data that originates from two domains. But because of the declared third-party relationship and because the script is running in a sandboxed environment, both getItem calls return a value. Since audience definition scripts are only allowed to determine audience membership, no private information can leak from the private storage.

The rest of the API flow remains identical Turtledove; the browser then contacts first-ad-network.com and requests ads targeted at this audience:

curl -X GET https://first-ad-network.com/.well-known/fetch-ads?audience=femaleAthleticShoesAudience

See the Turtledove for the rest of the flow.

Privacy

The following design aspects shall guarantee that the browser does not leak private data:

Other design considerations

UI controls

In addition to the browser UI controls of Turtledove, this approach would enable the following additional controls:

Related work and impact on other proposals

This proposal generalizes Turtledove and extends the First-party sets proposal. In addition, it affects the SCAUP proposal as explained next.