Amber

Decoupling User Data from Web Applications

Jon Gjengset, Tej Chajed, Jelle van den Hooff, M. Frans Kaashoek,
James Mickens, Robert Morris, and Nickolai Zeldovich

The Web is dominated by users
sharing with each other.

But sharing is limited by application.

  • Users can't freely use their own data.
  • And can't share their data with any user.
  • Sharing is restricted; requires custom APIs.
  • Users do not care about application boundaries.

This is not the cloud we were promised!

Amber stores data by user;
you are in control of your data.

A global file system with global names.
Provider
Provider
Provider
Provider

Users pick a single provider they trust to provide storage, and be the gatekeeper for their data: an Amber provider.

What do Amber applications look like?

  • They exist only as local applications
  • They need only Amber providers as their backends
  • They can access all data, access control permitting

Amber.login(username, password)
Amber.create(object) -> handle
Amber.setACL(handle, principal...)
Amber.get(handle) -> object
Amber.subscribe(SQL)
Amber.scan(SQL, SQL) -> object...
			

The Amber universe

Bob wants to send a Tweet

His tweet application creates an object at his provider:

email = Amber.create({
  "type":    "tweet",
  "content": "I wish Alice could see this; she's so great!",
})
			

And grants the public at large access to the object:

email.setACL(["*"]);
			

Amber will transparently ensure that the object is made available to other users.

Bob wants to send Alice Amber e-mail

His e-mail application creates an object at his provider:

email = Amber.create({
  "type":    "email",
  "subject": "Hello Alice",
  "content": "You're the best!"
})
			

And grants Alice access to the object:

email.setACL(["alice@p0"]);
			

Amber will again transparently ensure that the object is made available to Alice.

Bob can easily fetch his own data, but how do applications find Bob's data?

Amber provides global queries across all data of all users at all providers.

Alice wants to get Bob's tweets

Her tweet application calls Amber.subscribe() with:


SELECT OWNER() as tweeter, content
 WHERE OWNER() = "bob@p1"
   AND type = "tweet"
			

This query finds Bob's tweet, even though it was created at a different provider than Alice issues the query to.

Alice wants to get her Amber e-mails

Her e-mail program calls Amber.subscribe() with:


SELECT subject, content
 WHERE type = "email"
			

This query finds and returns all matching objects at all providers that Alice has access to.

How can we execute such queries efficiently?

carol@p0
alice@p0
p0
bob@p1
p1

Alice subscribes to #catphotos.

carol@p0
alice@p0
p0
bob@p1
p1

p0 registers a subscription for matching photos.

carol@p0
alice@p0
p0
bob@p1
p1

p0 sends the subscription to all other providers.

carol@p0
alice@p0
p0
bob@p1
p1

p1 registers the subscription locally.

carol@p0
alice@p0
p0
bob@p1
p1

Bob creates a cat photo on his machine.

carol@p0
alice@p0
p0
bob@p1
p1

And sends it to his provider p1.

carol@p0
alice@p0
p0
bob@p1
p1

p1 stores his photo locally.

carol@p0
alice@p0
p0
bob@p1
p1

p1 then notifies p0 as the photo subscription matches.

carol@p0
alice@p0
p0
bob@p1
p1

p0 caches the photo in its local store.

carol@p0
alice@p0
p0
bob@p1
p1

p0 then sends the photo to Alice's application.

carol@p0
alice@p0
p0
bob@p1
p1

Alice's application shows her the photo.

carol@p0
alice@p0
p0
bob@p1
p1

Now assume Carol also wants to see #catphotos.

carol@p0
alice@p0
p0
bob@p1
p1

She issues the same query as Alice to p0.

carol@p0
alice@p0
p0
bob@p1
p1

p0 sees that the query is identical to Alice's.

carol@p0
alice@p0
p0
bob@p1
p1

p0 can therefore send the results directly to Carol.

carol@p0
alice@p0
p0
bob@p1
p1

And Carol's application can show the photo to Carol.

Scaling Amber: pushed objects

  • Objects are pushed at most once
  • Objects are only pushed when necessary
  • Won't every provider mirror all the data?
    • Personal data: one provider
    • E-mail: a handful of providers
    • Group chat: tens of users and providers
    • Twitter: <100 followers, some providers
    • Social graph: 100s friends, most providers

Scaling Amber: sharing queries

Exploit similar queries across users and applications to avoid duplicate work.

Subscriptions are shared; the number of subscriptions does not have to depend on the total number of users.

Can instead depend on number of involved providers.

Challenges

  • Ad-hoc browsing.
  • Users are tied to their provider.
  • All applications have access to all data.
  • Abusive applications and providers.

In conclusion, Amber provides

  • user-centric, application-agnostic cloud storage.
  • global queries for eased multi-user data access.
  • transparent, user-focused sharing.