j-berman 43 days ago [-]
Hi HN,

Userbase is a tool for developers to build secure and private apps. We launched 1 year ago [1], and have worked hard to widen its use cases. Userbase offers built-in user accounts and authentication, an end-to-end encrypted zero-management database, file storage, streaming and sharing, and logic to process and manage subscriptions (via Stripe). All Userbase features are accessible through a simple JavaScript SDK, directly from the client. 100% open source, and all platforms are supported (browser, iOS, Android, desktop).

Today, Userbase can be used in apps that don’t require end-to-end encryption, for those who like that Userbase can handle authentication, data storage and real-time syncing in a few lines of code.

We also completed a security review by an independent team [2], and wrote up a comprehensive specification of our architecture [3].

Personally I joined in working on Userbase to store end-to-end encrypted data in a performant way for an accounting app. Under the hood, each write to a Userbase database is an append-only transaction to a log stored in DynamoDB (therefore constant time), which is then pushed to connected clients over a Web Socket. Each client then decrypts and applies this transaction to its local state of the database in memory (real-time syncing is provided out of the box). In this process, the server ensures each client receives transactions in a consistent order, 100% of the time. This is unlike some of the (very awesome) decentralized alternatives that exist today (OrbitDB, GunDB, Scuttlebot), which generally rely on CRDTs to stay in sync, and CRDTs can be pushed in any order. For certain applications, the consistent ordering guarantee a central server provides may be extremely useful (such as in an accounting app), on top of the added reliability and performance.

[1] https://news.ycombinator.com/item?id=22145168

[2] https://userbase.com/announcements/#1-security-review

[3] https://github.com/smallbets/userbase/blob/master/docs/userb...

ajconway 43 days ago [-]
How do you deal with updating the local database when a client was offline for an extended period of time and missed a lot of transactions?
j-berman 43 days ago [-]
openDatabase loads the database's state into memory from the server, and then keeps it in sync with the server using the Web Socket. When you insert a transaction via one of the database operations, the server assigns the transaction a monotonically increasing sequence number, and broadcasts the transaction along with its sequence number out to the clients connected to a database. Clients then apply transactions in sequential order to the local state, and keep track of the latest sequence number applied. When a client goes offline and comes back on, it automatically reconnects the Web Socket and re-requests any transactions that it may have missed above its currently applied sequence number. We handle reconnection logic automatically under the hood, retrying on failure with backup delays.

Can read more on this process and how we optimize it when databases get large here: https://github.com/smallbets/userbase/blob/master/docs/userb...

I'm also working on an offline-first Google docs alternative that will write to IndexedDB, and stay in sync with Userbase using CRDTs. The tutorial on how to do it will be here: https://userbase.com/docs/

ajconway 43 days ago [-]
What happens when a new client joins? Does it download the entire history of all transactions and replays them into the local database?

Hot do concurrent modification get resolved (several clients try to modify the shared stage at the same time)?

j-berman 43 days ago [-]
>Does it download the entire history of all transactions and replays them into the local database?

This is what clients do initially, until the database grows in size. Every time the transaction log increases 50 KB, the client takes a snapshot of the database's state at a particular point in time, compresses and encrypts it, and uploads this state to the server. We call this a "bundle". This way when clients reopen a database, they load from the bundle first, and then apply any new transactions that come after it. Rather than needing to query for the history of all transactions and decrypting them individually and reapplying.

>Hot do concurrent modification get resolved (several clients try to modify the shared stage at the same time)?

The server assigns each transaction a distinct sequence number via an atomic operation. So clients always apply transactions with the same distinct sequence number, in sequential order. The client relies on this to enforce uniqueness and versioning. Only the lowest sequence number itemId gets applied to a database if 2 clients insert with the same itemId at the same time, and similarly, only the lowest sequence number version of an item gets updated or deleted if 2 clients update or delete the same item at the same time.

With regards to bundling, it's a bit more complicated and there are layers to our approach in safely handling it under high concurrency. When a client uploads a bundle, the database records what sequence number the bundling took place at so clients can use it to retrieve the latest bundle. And the server retains copies of bundles at prior sequence numbers. This way if two clients attempt to open a database right around the moment a bundling process completes (client 1 receives a bundle at lower sequence number, and client 2 receives a bundle at a higher sequence number), both clients receive the same set of transactions regardless. The server sends all transactions in the log after the bundle sequence number, so client 1 just needs to decrypt and apply more individual transactions to rebuild the state compared to client 2.

Some may find this interesting too -- we specifically test for safe concurrent behavior across 2 clients using a makeshift testing framework that opens 2 browsers at the same time and does some neat tests: https://github.com/smallbets/userbase/tree/master/test

If you clone the repo and run `npm run test:concurrency`, it will run those tests and output test results to the consoles of the 2 browsers.

jamescampbell 43 days ago [-]
I had a similar setup using redis and zsets so this type of use case and implementation makes sense to me.
jamescampbell 43 days ago [-]
I also love the security write up.
j-berman 43 days ago [-]
Thanks for all the kind words :)
alexobenauer 43 days ago [-]
Big fan of Userbase here — I started using it just under a year ago when it launched, and have built a number of different things with it.

It's incredibly simple, which is its main draw for me (besides being able to offer e2e encryption to users, which is a huge win for privacy).

nickodell 43 days ago [-]
>First, the client makes an unauthenticated request to the server to retrieve the password salts associated with the username. If no user is found, an error is returned to the client. If a user is found, the server sends the client the user's password salt and password token salt, which the client uses to rebuild the password token. The password token is then passed to the server for authentication. To prevent brute force password guesses, clients get 25 incorrect attempts in a row before the server locks the user out of their account for 24 hours (Note we are aware this introduces a DoS vulnerability. Our first priority is to protect user data. We plan to implement a more sophisticated lockout mechanism in the future).

Hang on, so the process of retrieving the salt gives the remote client information about whether the user exists? Doesn't this mean that an attacker could take a list of possible usernames, and confirm which of them are using your service?

Seems like you could return a salt even when the user doesn't exist, and that would prevent this information disclosure.

j-berman 43 days ago [-]
Userbase is built on the assumption that in the event an attacker compromises the Userbase server and database, the attacker would not be able to access protected user data. We chose this assumption to build on because we figure that users and developers alike should assume that data stored at rest in cloud-based databases will eventually be leaked, as we've seen countless examples at almost every major company. Thus, we figured the default assumption is that usernames would not be expected to be private (and so yes, to answer your question, user enumeration is currently possible).

Additionally, practically defending against user enumeration beyond rate limiting sacrifices a level of security and privacy (for example, by requiring users provide an email to sign up to your service, or through some other means that likely ties the user to an identity and storing this in our database in plaintext), rather than allowing them to use pseudonymous usernames alone.

While we do recognize username enumeration is an issue (because users tend to reuse passwords from other sites, or don’t want to be found out using a site), we concluded that properly defending from user enumeration by default would have too material of a negative impact on user experience for little gain on top of what we already provide in way of protecting user data, and instead focused on defending against potential follow-up attacks by limiting brute force login attempts, and recommending that you tell your users to use a password manager at sign up.

The most significant place defending against enumeration affects is during sign up. When a user’s account already exists, we say the username is already taken, which isn't possible when properly defending against enumeration.

We're planning to allow you to enable email verification in your app if you want to, so users will need an email to successfully create an account. Once that's in place, we'll defend against enumeration more concretely. There are other places in addition to the salt retrieval that would be modified in similar fashion. For example, password reset will need to always successfully return even if a user provided the wrong username, and sharing a database with another user will always successfully return even if the other user doesn't exist (e.g. from a typo).

zaroth 43 days ago [-]
I know this goes contrary to “best practice” but I am very much in favor of this approach.

You want to focus on implementing good soft and hard rate limiting on all your endpoints.

You can obfuscate the login function to return an unhelpful error message, but unless you harden every possible public API against user enumeration — and most sites do not - you are just hurting the UX for no actual security gain.

This would include constant timing for returning results when there is or isn’t a user, so for example, running your hash function even when you don’t have a password to compare it to.

Years ago there was a big push to return unhelpful error messages, but then the signup or password reset functions would act as a user exists oracle anyway. Login got harder for zero actual gain in security.

nickodell 42 days ago [-]
That's a thoughtful answer; thank you.
zelon88 43 days ago [-]
WordPress works the same way and it's awful. No offense. They don't see the impact of allowing user enumeration.

Security isn't about one feature. It's layered. You need to have layers because there is no such thing as guaranteed security.

Bank safes are my favorite analogy. Safes are given a time rating. "How long can this safe resist being broken into." A bank with a 15 minute safe means that it might take an attacker 15 minutes to open the safe.

A 15 minute safe is not secure. Infact it is guaranteed to be compromised past 15 minutes. How do you secure an insecure 15m safe? With a 5m guard duty. Now you have a safe to buy you 15 minutes and a guard to ensure that nobody has 15m worth of access to the safe.

You built a safe with no guard... and by allowing enumeration you're telling attackers where you put the safe. You are almost guaranteeing someone will compromise it eventually.

Security doesn't always mean that successful attacks are impossible. Oftentimes security just means you've made the cost of intrusion higher than the return on investment. If you allow enumeration you're giving the attacker an advantage.

j-berman 43 days ago [-]
>You built a safe with no guard... and you're telling attackers where you put the safe. You are almost guaranteeing someone will compromise it eventually.

Userbase is built on the assumption our entire database and server will be compromised, and the attacker would still not be able to access protected user data. Validating that we protect user data in that scenario was the goal of our security review. [1]

On top of this, requiring users to provide an email or some other identifiable means to sign up, which is the practical way to defend against enumeration, compromises a level of privacy AND security in the average user (since this data would be leaked in the event of a breach). So this is a significant tradeoff, not as simple as one way is secure and the other is not.

Finally, we recognize the impact of allowing user enumeration. We will offer protection from user enumeration for those who are comfortable with the tradeoffs in user experience, and with sacrificing a level of privacy and security for their users.

[1]: https://userbase.com/announcements/#1-security-review

j-berman 43 days ago [-]
Adding to this, properly defending against enumeration also sacrifices a level of security in addition to privacy, since the average user would likely need to store some additional identifiable data (such as an email) in our database in plaintext that would be compromised in a breach.
jamescampbell 43 days ago [-]
The pricing model is amazing. Either free up to 100 users or a modest under $100 USD per year for unlimited. Perfection.
jhunter1016 43 days ago [-]
Just rolled my own auth...yet again. Next time, I’m using this. Been following Userbase since it was first announced and it is now at a point where I’m comfortable using it on a project. Nice work!
alfongj 42 days ago [-]
Would e2ee really be guaranteed if a user sets an 8 char password? Because if so an attacker with control of the server could brute forcedly decrypt the encryption key, and in turn all DB contents for a user, no?

Apologies if this is covered somewhere in the docs, but I couldn’t find it.

j-berman 41 days ago [-]
We use scrypt for password hashing. From the scrypt paper (which keep in mind is assuming hardware from 2002, and isn't assuming an attacker is using ASICs which have been developed since then), the estimated cost of hardware to brute force guess an 8 char password in 1 year is $4.8 million with our chosen parameters. [1]

Ultimately we strongly recommend that developers using the end-to-end encryption mode of Userbase recommend their users use a password manager, since losing their password means losing their data (and we try to make this extremely clear to any developers using Userbase via the admin panel and docs). A password manager randomly generating passwords makes this a non-issue.

But alas, we do recognize not everyone will, which is where scrypt comes in to play.

[1](pg. 14): https://www.tarsnap.com/scrypt/scrypt.pdf

cperciva 41 days ago [-]
From the scrypt paper (which keep in mind is assuming hardware from 2002, and isn't assuming an attacker is using ASICs which have been developed since then)

Just to be clear, the scrypt paper assumes attackers use ASICs fabricated with 2002-era technology. Obviously there weren't any scrypt ASICs in 2002; but I was able to estimate what their performance and cost would have been.

j-berman 41 days ago [-]
Should have been clearer, thank you!

And thank you for the algorithm!

chaz6 42 days ago [-]
Can this be self-hosted, or does it depend on a 3rd party service? I am extremely anxious about depending on a company that could go out of business, or maybe even worse, acquired by someone else and subject to bait and switch price rises, and the data privacy issues.
DVassallo 42 days ago [-]
It can be self-hosted easily on AWS (it uses EC2, S3 and DynamoDB). It’s fully open source, MIT license.
d33lio 43 days ago [-]
Wow this is awesome!