For Aleo, Startup Time is Table-stakes

When I was running Mozilla Firefox engineering, mobile apps had recently been launched by Apple. The Android store was emerging. Android was quickly gaining traction.

I kept in touch with Mark Zuckerberg and Andy Rubin (guy who started Android, sold it to Google, etc.) to keep my finger on the pulse of key adopters we must win to keep the Web as a first class citizen of the Internet.

A big threat for Firefox was closed markets like app stores which locked out the Web as a platform from mobile devices completely. To put it in perspective, fast forward to today: The Web has been taken over by mobile apps. People spend more time now in native apps than the Web itself. This is now the case, and it was what I had to keep from happening. We all failed. The Web took a bullet.

So, I went to the number one app on the planet at the time: Facebook. I met with Mark Zuckerberg, and I asked him my most important question:

“What would it take for you to use the Web as a platform for the Facebook app instead of native libraries?”

His answer:

“Startup time is terrible. The most important thing for us is to get the content the user cares about in front of their eyeballs as fast as possible, and starting up a Web app on mobile is 50 times slower than just using a native app.”

He was totally right. The overhead of just loading up a Web page, with all the the styling and displaying CSS, and the dynamic elements that Javascript creates was a huge amount of overhead that slowed everything down. And, I also knew at that very moment that the Web was at least two years behind in performance, even with 500 engineers on my team building the Web. I was crushed.

Facebook developers just refused to use the Web because it took too much time to start the app, test the content (i.e., timelines and photos), and then shut down the app (not to mention crashes on mobile which happens all the time).

App developers (all software devs really) have a critical set of tests called T/s. Which means Time for Startup. They often have thousands of specific test cases that are run automatically every time a code change is made, and T/s is checked every single time for regressions. If someone adds in an animation to a startup splash screen for an app, T/s goes up, and that’s a regression.

Every time T/s goes up, thousands and thousands of users are lost. Not kidding. This is real.

What’s the first thing a potential Aleo developer will do? They will download Aleo software and run it. What’s the only software they can run today? snarkOS. That’s right. snarkOS is the ONLY thing they can run, and it’s the FIRST impression that will be made. And their FIRST impression: T/s.

Before a developer can do anything on Aleo, they will have to run a node. And what contributes to startup time for a node? SYNC. It has to pull down the chain before anything will work. Ever.

We just celebrated on Twitter our “One Million Blocks!” and when I read that tweet, I felt a cold chill. That means, before anyone can use Aleo, they have to download 1,000,000 blocks.

Imagine taking that requirement to a Facebook dev whose main priority is app performance, and say, “Hey, add this feature into Facebook, oh, you’ll need to check out their node first and run it.” And when they do, they see that it literally takes DAYS to start. Aleo is dead. Right. There.

Step one for Aleo: Create a T/s goal of “Sub-sixty second startup time.”

And, really, that’s not enough. It should be sub-ten second, but let’s start with something I think we could hit.

In my proposed Aleo Technical Roadmap: That goal is encapsulated here in Phase 1: Stability and Scalability, Item 1. Network Optimization and Performance Enhancements, Bullet One: Improve Node Synchronization. It’s table-stakes.

Important: When we were testing Aleo network mainnet requirements, we had to create reproducible tests, and that meant that we had to work around the whole startup time problem entirely. We couldn’t run a test over and over again and each time wait for a sync to finish so we could run the test again. Just wasn’t feasible. So, what did we do?

We built tooling which eliminated the sync problem entirely: We started the entire network from scratch each time using special tools which generated a new genesis block (which could be re-used) and spawned a new topology.

We maniacally chiseled down to just what was required to rapidly start a network, deploy an app, test it, and stop. Rinse repeat. All done in seconds.

This enabled us to create other test suites which tested our specific zkApp functionality reliably and repeatably.

Today, without that tooling, a potential Aleo developer would just give up and leave. And Aleo’s network wouldn’t be live on mainnet because that’s what it took for us to find the critical threading issue that was crippling all canary nodes and killing them in seconds from startup, over and over and over again.

This is why developer tools are critical.

For Aleo, Startup Time is Table-stakes

Leave a Reply Cancel reply