Today, Userbase can be used in apps that don’t require end-to-end encryption, for those who like that Userbase can handle authentication, data storage and real-time syncing in a few lines of code.
We also completed a security review by an independent team , and wrote up a comprehensive specification of our architecture .
Personally I joined in working on Userbase to store end-to-end encrypted data in a performant way for an accounting app. Under the hood, each write to a Userbase database is an append-only transaction to a log stored in DynamoDB (therefore constant time), which is then pushed to connected clients over a Web Socket. Each client then decrypts and applies this transaction to its local state of the database in memory (real-time syncing is provided out of the box). In this process, the server ensures each client receives transactions in a consistent order, 100% of the time. This is unlike some of the (very awesome) decentralized alternatives that exist today (OrbitDB, GunDB, Scuttlebot), which generally rely on CRDTs to stay in sync, and CRDTs can be pushed in any order. For certain applications, the consistent ordering guarantee a central server provides may be extremely useful (such as in an accounting app), on top of the added reliability and performance.
Can read more on this process and how we optimize it when databases get large here: https://github.com/smallbets/userbase/blob/master/docs/userb...
I'm also working on an offline-first Google docs alternative that will write to IndexedDB, and stay in sync with Userbase using CRDTs. The tutorial on how to do it will be here: https://userbase.com/docs/
Hot do concurrent modification get resolved (several clients try to modify the shared stage at the same time)?
This is what clients do initially, until the database grows in size. Every time the transaction log increases 50 KB, the client takes a snapshot of the database's state at a particular point in time, compresses and encrypts it, and uploads this state to the server. We call this a "bundle". This way when clients reopen a database, they load from the bundle first, and then apply any new transactions that come after it. Rather than needing to query for the history of all transactions and decrypting them individually and reapplying.
>Hot do concurrent modification get resolved (several clients try to modify the shared stage at the same time)?
The server assigns each transaction a distinct sequence number via an atomic operation. So clients always apply transactions with the same distinct sequence number, in sequential order. The client relies on this to enforce uniqueness and versioning. Only the lowest sequence number itemId gets applied to a database if 2 clients insert with the same itemId at the same time, and similarly, only the lowest sequence number version of an item gets updated or deleted if 2 clients update or delete the same item at the same time.
With regards to bundling, it's a bit more complicated and there are layers to our approach in safely handling it under high concurrency. When a client uploads a bundle, the database records what sequence number the bundling took place at so clients can use it to retrieve the latest bundle. And the server retains copies of bundles at prior sequence numbers. This way if two clients attempt to open a database right around the moment a bundling process completes (client 1 receives a bundle at lower sequence number, and client 2 receives a bundle at a higher sequence number), both clients receive the same set of transactions regardless. The server sends all transactions in the log after the bundle sequence number, so client 1 just needs to decrypt and apply more individual transactions to rebuild the state compared to client 2.
Some may find this interesting too -- we specifically test for safe concurrent behavior across 2 clients using a makeshift testing framework that opens 2 browsers at the same time and does some neat tests: https://github.com/smallbets/userbase/tree/master/test
If you clone the repo and run `npm run test:concurrency`, it will run those tests and output test results to the consoles of the 2 browsers.
It's incredibly simple, which is its main draw for me (besides being able to offer e2e encryption to users, which is a huge win for privacy).
Hang on, so the process of retrieving the salt gives the remote client information about whether the user exists? Doesn't this mean that an attacker could take a list of possible usernames, and confirm which of them are using your service?
Seems like you could return a salt even when the user doesn't exist, and that would prevent this information disclosure.
Additionally, practically defending against user enumeration beyond rate limiting sacrifices a level of security and privacy (for example, by requiring users provide an email to sign up to your service, or through some other means that likely ties the user to an identity and storing this in our database in plaintext), rather than allowing them to use pseudonymous usernames alone.
While we do recognize username enumeration is an issue (because users tend to reuse passwords from other sites, or don’t want to be found out using a site), we concluded that properly defending from user enumeration by default would have too material of a negative impact on user experience for little gain on top of what we already provide in way of protecting user data, and instead focused on defending against potential follow-up attacks by limiting brute force login attempts, and recommending that you tell your users to use a password manager at sign up.
The most significant place defending against enumeration affects is during sign up. When a user’s account already exists, we say the username is already taken, which isn't possible when properly defending against enumeration.
We're planning to allow you to enable email verification in your app if you want to, so users will need an email to successfully create an account. Once that's in place, we'll defend against enumeration more concretely. There are other places in addition to the salt retrieval that would be modified in similar fashion. For example, password reset will need to always successfully return even if a user provided the wrong username, and sharing a database with another user will always successfully return even if the other user doesn't exist (e.g. from a typo).
You want to focus on implementing good soft and hard rate limiting on all your endpoints.
You can obfuscate the login function to return an unhelpful error message, but unless you harden every possible public API against user enumeration — and most sites do not - you are just hurting the UX for no actual security gain.
This would include constant timing for returning results when there is or isn’t a user, so for example, running your hash function even when you don’t have a password to compare it to.
Years ago there was a big push to return unhelpful error messages, but then the signup or password reset functions would act as a user exists oracle anyway. Login got harder for zero actual gain in security.
Security isn't about one feature. It's layered. You need to have layers because there is no such thing as guaranteed security.
Bank safes are my favorite analogy. Safes are given a time rating. "How long can this safe resist being broken into." A bank with a 15 minute safe means that it might take an attacker 15 minutes to open the safe.
A 15 minute safe is not secure. Infact it is guaranteed to be compromised past 15 minutes. How do you secure an insecure 15m safe? With a 5m guard duty. Now you have a safe to buy you 15 minutes and a guard to ensure that nobody has 15m worth of access to the safe.
You built a safe with no guard... and by allowing enumeration you're telling attackers where you put the safe. You are almost guaranteeing someone will compromise it eventually.
Security doesn't always mean that successful attacks are impossible. Oftentimes security just means you've made the cost of intrusion higher than the return on investment. If you allow enumeration you're giving the attacker an advantage.
Userbase is built on the assumption our entire database and server will be compromised, and the attacker would still not be able to access protected user data. Validating that we protect user data in that scenario was the goal of our security review. 
On top of this, requiring users to provide an email or some other identifiable means to sign up, which is the practical way to defend against enumeration, compromises a level of privacy AND security in the average user (since this data would be leaked in the event of a breach). So this is a significant tradeoff, not as simple as one way is secure and the other is not.
Finally, we recognize the impact of allowing user enumeration. We will offer protection from user enumeration for those who are comfortable with the tradeoffs in user experience, and with sacrificing a level of privacy and security for their users.
Apologies if this is covered somewhere in the docs, but I couldn’t find it.
Ultimately we strongly recommend that developers using the end-to-end encryption mode of Userbase recommend their users use a password manager, since losing their password means losing their data (and we try to make this extremely clear to any developers using Userbase via the admin panel and docs). A password manager randomly generating passwords makes this a non-issue.
But alas, we do recognize not everyone will, which is where scrypt comes in to play.
(pg. 14): https://www.tarsnap.com/scrypt/scrypt.pdf
Just to be clear, the scrypt paper assumes attackers use ASICs fabricated with 2002-era technology. Obviously there weren't any scrypt ASICs in 2002; but I was able to estimate what their performance and cost would have been.
And thank you for the algorithm!