Startup Engineers and Our Mistakes with MongoDB

MongoDB got rave reviews for its usability. But other features mattered too when choosing a database for a growing startup.

This is part two of a three part series. You can see Hacker News comments on this post or read the rest of the three part series.

Building a startup is like eating nails. It requires insane levels of perseverance, tolerance for ambiguity, and a strong work ethic. There are critical internal debates to ensure the team is on the right track. What feature should we build next? How will we attract more users? How will we get revenue or funding to keep the lights on?

In early 2012, I heard a much different startup debate: should we switch our database to MongoDB? This company had chosen Postgres, a “traditional” relational database. Now, their lead engineer was adamant that switching to MongoDB was critical for success.

Source: Podeschin, Oleg. Lean and mean MongoDB (Slideshare)

A number of top companies (Etsy, Foursquare, and many other startups) were “using”¹ it, and his recent meetups and friends all indicated that MongoDB was the future. The developer especially cited future scaling needs, the coming obsolescence of SQL, and engineer recruiting. He proposed to integrate MongoDB at the cost of several weeks of the roadmap for his tiny team.

Sadly, he misestimated how much work the still evolving MongoDB would take to learn and integrate, costing much more time than he realized. And even if the company was tremendously successful, the scale provided by most NoSQL databases would be immaterial, given his company’s product. Just as importantly, he didn’t know the tradeoffs he was making by choosing MongoDB.

In this case, the cost was relatively painless: simply a few wasted weeks and thousands of investor dollars. On the other hand, I know a number of teams that had much more challenging issues.

In Part 1, we looked at how the NoSQL hype enabled the early success of MongoDB. Now, let’s look at MongoDB’s benefits and the mistakes some startup developers made when choosing MongoDB in 2012. Understanding past issues can give us context about how to make better engineering decisions in the future.

The Benefits of Mongo in the early 2010s

10gen’s key contributions to databases — and to our industry — was their laser focus on four critical things: onboarding, usability, libraries and support. For startup teams, these were important factors in choosing MongoDB — and a key reason for its powerful word of mouth.

Their engineers and product marketers thought deeply about how to make MongoDB dead simple to install and get started. Their team smartly chose a Javascript-like DSL and a JSON-like data structure which increasingly mattered with the hyper growth of Node JS. The community supported 10gen, with Mongoose becoming one of the top database layers in the early Node ecosystem — and a critical differentiator.

Their support team was smart, responsive and plain nice, with 10gen correctly seeing the quality of their support as a key strength (“Support is the new marketing”). Typifying their support mindset and desire to understand user concerns, MongoDB’s CTO and cofounder Eliot Horowitz was gracious in answering questions as part of this series. And their engineers, technical writers, and designers created documentation and libraries that were a pleasure to read and use.

Supported by 10gen sponsorships, MongoDB became a default in many hackathons as the lack of schema made it simple to quickly iterate. Not having to worry about migrations made life significantly easier in the early stages of a project. The Javascript-like DSL and great Node support made it well suited to those who didn’t have much experience with SQL. For coding academies, this also meant you could teach a single language and — with Node — enable front end developers to quickly work across the stack.

And yet, when choosing a database for a growing startup, startup engineers should have also weighed other considerations.

Mistakes with Mongo

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Feel free to remix or share it.

The importance of schemas, relationships, and transactions

In 2012, critical needs that startup engineers often ignored included:

Schemas: Schemaless does not mean no schema; instead, it means an implicit schema in the app (a particularly challenging misnomer for anyone outside our industry)
Relational Structures: Relational data is common and generally suited for relational databases (MongoDB’s CTO disagrees with this statement arguing that nearly 90% of database installations today would benefit from being replaced with MongoDB)²
Transactions: A number of use cases, especially financial ones, do benefit from transactions

By far the most consistent mistake was choosing a non-relational database, when your data was strongly relational. Mongoose’s ODM made this mistake surprisingly easy to make, which led to issues down the line.

To pick on another, transactions are critical in a number of financial use cases, which caught a number of teams by surprise. In the case of Flexcoin, an attacker created many concurrent withdrawals of a single balance bankrupting the company – though there are debates about whether this was due to a mistaken use of MongoDB. UC Berkeley’s Professor Joseph Hellerstein suggests that rather than thinking NoSQL, we think of “No transactions” instead.

MongoDB also had some key constraints in 2012 that many teams were unaware of such as the global lock, “unsafe” writes, or the challenges with getting sharding right. (MongoDB’s CTO acknowledges the challenges with the “unsafe” writes — it would be the key thing he would do differently — but also notes that few teams had issues with them; early 10gen customers explicitly valued this behavior, even though it was different than most other databases)

For startups, the database decision makers were application developers (not ops veterans) who favored immediate productivity. Unfortunately, short-term productivity was outweighed by many medium-term issues.

Over time, many startups created application code to impose an implicit schema, which would require ever greater maintenance costs. And optimistic visions of switching over to a relational database when the company grew were often shelved due to the challenges of migrating the data store.

I’m empathetic to the desire to use a single data structure across one’s stack. One 10gen engineer made this point in analogizing SQL to Cobol, arguing that “SQL is Annoying”:

Source: Source: Karpov, Valeri. The MEAN Stack: MongoDB, ExpressJS, AngularJS and Node.js (Slideshare)

Still, this is dangerous — and frankly daft — if it’s the primary decision criteria for a growing company. Some liken MongoDB to the dynamic in static vs dynamic languages. And yet, a dynamic language in beta can be a problematic choice until it matures, especially if it targets a different use case than the one you are facing.³

MongoDB did provide tremendous value to some, but the cost for many early stage startups often outweighed the very real short-term benefits. MongoDB’s CTO disagrees, arguing that many startups were only able to survive and grow due to the value MongoDB provided in the early 2010s, and that these successes far outstripped the very few who had issues.

Someone Else’s Solution for Your Problem

Solutions that benefit others may not work for your own, widely different, environment. Startup engineers need to differentiate between what is good for a side project or a company with internet scale versus what’s appropriate for their startup.

Large tech companies have a unique set of challenges, and their solutions — like NoSQL to manage the huge data flows they face — may not apply to you. And yet, much of the excitement in technical circles will be about technology these companies pioneered for their own unique needs.

Professor Hellerstein makes this point in the context of MapReduce, though he could well be speaking about NoSQL⁴:

The thing is there are five companies in the world that run jobs that big … People got … Google mania in the 2000s: we’ll do everything the way Google does … The reasoning was … they’re supposed to be smart, but they had a different problem than most people had. They were optimizing for different things than most people should be optimizing for.

According to 10gen, the majority of MongoDB users still don’t have large enough datasets to shard — though some do value the option to.

Even when customers use a tool, marketing testimonials neglect to mention that they are often used in less important systems or are simply an experiment. Often, prestigious tech startups will try new tools in a non-core system — and publicize the experiment partially because it helps recruiting. The broader message that startup engineers can hear is “X company believes strongly in Y tool — and you should consider switching.”

Another common mistake for first-time startup engineers (especially for those who’ve previously worked at products with massive scale) is to overweight performance or scalability in tool choices. Solving these issues early — such as by using a brand new NoSQL tool with still maturing norms — is often costly and doesn’t generally increase the odds of startup success (MongoDB’s CTO disagrees with me here, arguing that the JSON-like data store was transformational for their users). By the time they are a huge bottleneck, you often have huge financial and engineering resources to attack them.⁵ Premature scaling is a common technical trap in startups.

On the flip side, choices made in a prototype have a habit of persisting far beyond their expected life. As such, there’s a balance between favoring myopically early productivity and — at the other extreme — trying to prematurely solve problems that might not cause issues for years.

Early usability and later scalability were both key explanations startup teams gave for choosing MongoDB.

Taking the Time to Learn a New Tool

Whenever a new tool appears, we apply old ways of thinking to the new tool. You may try to use NoSQL databases exactly how you used your SQL database, which is a recipe for failure. You can’t use new tools without setting aside the time to truly learn them.

For example, a common mistake was not understanding how data models should be different in document databases.

To help others, Russell Smith⁶ created a great overview of other gotchas in late 2012, including:

Manually turning on authentication and encryption (not on by default) and turning off remote connections
Ensuring an odd number of replica sets (not an even number)
Being aware of the asynchronous, “unsafe” writes
Understanding the 2 GB data limit of the 32-bit MongoDB
Using the official repository, rather than package managers given how quickly MongoDB was changing
(and many others)

His list should have been required reading for anyone planning to use MongoDB in production — and yet, many of the teams I saw didn’t know these critical details. (MongoDB’s CTO mentions that the documentation was always up to date)

Server Density’s David Mytton goes through his own long list of mistakes seen in the wild and then highlights the broader issue:

[Many] of the problems cited in these posts seem like basic mistakes in deployment and understanding.

With any product, if you decide to deploy it to production you need to be sure you fully understand its architecture and scaling profile.

He and Russell argued that the community concerns about MongoDB in 2012 were “misguided”: after the initial hype, there were a number of angry posts on HN. But it isn’t unfair to expect saner defaults and better upfront communication before starting to pitch a product at hackathons or to junior developers as the future of web development.⁷ It also might cause issues for some junior engineers if you pitch a database lacking transactions at a FinTech hackathon without appropriate caveats.⁸

Any new foundational technology takes time for to learn — which makes it a challenging choice for consistent developer productivity. This was especially true if your startup had limited technical resources and time to truly learn how to use MongoDB.

Other Issues

Beyond these core failures, there were others:

Choosing modern technologies simply because that’s what engineering leads thought it took to recruit “good” engineers⁹
Spending precious time learning new dev tools, rather than focusing on what technical work is most critical to the success of a startup
Choosing a database for one use case, and then not changing when the company pivoted to a wholly different use case (see early Bitcoin exchanges)

While this story is about MongoDB, the broader message is that fashionable choices made without deep reflection can put startups at risk.

Making Better Engineers

I’ll highlight a few broader points that underly these engineering mistakes.

First, it’s challenging to make engineering decisions based on what is popular on Hacker News or Reddit, such as the posts around NoSQL, the MEAN stack, or even the later anger at MongoDB. Social networks are a key input into engineering decisions early in your career or when you’re working on a tiny team (such as a startup). As engineers, we need to understand the issues with doing this.

On one side, many different types of software engineers congregate together in social media — and what is right for one is not right for the other. We don’t consistently discuss our use cases (or our experience/background) when commenting on a technology choice, instead often arguing that a technology choice is good or bad.

Think about how the decision of the “right” database changes if you’re using it for just a hackathon/side project, or for logging, or for a webscale product at Google. Or how about if you knew a blog post writer was knocking a technology they had barely used or was opining on a part of the stack where they had little experience or worked at a competing company?

Many of the best engineers I know spend their time on engineering problems, not blogging — which limits the amount of great content available. It’s also hard to blog about failures when it impacts your company’s reputation or your own job prospects, which further dictates what content is available. Few of the teams I talked to with MongoDB issues were willing to go on record.

On the other side, it’s easy to game social media, which dev tool vendors and engineering marketers know. A passionate few can get favored topics upvoted, and vendors will reach out to proxies to write or share their message (as we’ll see in detail next time¹⁰. Cornell’s Professor Emi̇n Gün Si̇rer, who contributes to his own competing database, explicitly blames NoSQL vendor marketing strategy in engineering social media for the issues with MongoDB: “Engineers did what anyone would do after reading one too many astroturf articles on Hacker News.”¹¹

Democracy (upvotes or commenting, with everyone’s vote treated equally) is not the right choice for many forms of learning, even though it is the standard for Reddit-type networks. Can you imagine for example, if we taught high school and university engineering students based on student upvotes alone? We’d naturally favor sexy technology and make it easy for organized parties that personally benefit to set our agenda, not favor the content that makes the most thoughtful engineer. And yet, for some, this is a key part of how engineering is learned.

Second, coding bootcamps and online programs increasingly teach a substantial minority of developers — with nearly a third of new software engineers from bootcamps in 2016. They play a critical role in determining how engineers think and what technologies succeed, and I’m heartened by the new perspectives that their students bring.

My worry is that some bootcamps themselves favor technology choices (which change quickly), rather than core concepts. Learning concepts — rather than quickly changing DSLs — lets you reason from first principles. They can inoculate you from weak engineering arguments, while future-proofing your career so you can navigate future changes.

For example, there’s value to learning about basic build automation concepts generally (and Make specifically), even in this Webpack/Gulp/Grunt era. Anyone who’s had the time to understand the basics of build automation and survey the tradeoffs of the various tools, will then find it much easier to assess future tools. This is also a reason why coding bootcamps should be spending at least some time teaching SQL and transactional vs. non-transactional, as they are important paradigms that their students will likely be exposed to.

Even though university curriculums are often pilloried for lagging behind, the flip side is that they can more effectively withstand hype — and professors aren’t as easily swayed by marketing or focused on placement rate. And even then, the best university programs focus on computer science, without spending adequate time on the art of software engineering.¹²

There are broader lessons that need to be taught to all software engineers from thinking in tradeoffs rather than good/bad to valuing boring technology. Cloudflare’s CTO John Graham-Cumming notes, “The only useful piece of advice I can give a younger developer is… be careful when drinking the newtech koolaid.” Every single senior engineer I know would attest to his statement – and all of us have the scars to show where these lessons came from.

Finally, some commenters on HN were scared to challenge technology decisions because of the risk to their job prospects. In my experience, the best engineering managers¹³ look for critical thinkers, not those who thoughtlessly extoll or hate new technology. I’ll also argue that great engineers make thoughtful assessments, and revisit assumptions when the data changes. Our teams are not well served by either keeping quiet or having strongly held views that can’t be revisited (including a view like “never use MongoDB”, once many issues were fixed).

Next Time

In the final part of this series, we’ll look into a critical part of MongoDB’s success in the early 2010s: marketing and testimonials.

Follow me on Twitter. You can see comments on this post on Hacker News. You can read the rest of the three part series here.

This essay is based on several years of informal discussions, interviews with key stakeholders, parsing countless blog posts/presentations, and reading 3,000 HN comments. To dig in more, you can see select commentary excerpts and my other thoughts. All opinions are solely my own. I welcome feedback (email mongodb at this domain).

Thanks to Mathieu Jouhet for countless hours spent on design and to Shay Maunz for edits. I especially have to thank the many software engineers who shared their experiences and provided feedback.

As we’ll see, “using” a database in a side project or a non-critical system is far different from using the same database as the primary data store. ↩
And if you duplicate data in a document database, maintaining is consistency up to you. ↩
If there was a Maslow’s hierarchy for databases, usability would be trumped by many other more critical needs (such as ACID guarantees). Still, a critical lesson for open source developers is the growing demand for usability in dev tools. ↩
Hat tip to Ozan Onay for pointing me to this lecture. ↩
Sam Altman makes a similar point in his Startup Playbook:
> One is that if the company is growing like crazy but everything seems incredibly broken and inefficient, everyone worries that things are going to come unraveled. In practice, this seems to happen rarely (Friendster is the most recent example of a startup dying because of technical debt that I can point to.) Counterintuitively, it turns out that it’s good if you’re growing fast but nothing is optimized—all you need to do is fix it to get more growth! My favorite investments are in companies that are growing really fast but incredibly un-optimized—they are deeply undervalued. ↩
Russell and I were in the same Y Combinator batch, but I don’t recall us chatting at the time or afterward. ↩
Marketers also are effective at pitching their content in ways that their audience wants to read (care about blockchains or microservices? MongoDB has you covered). ↩
And of course, there are a number of fintech use cases where you don’t need transactional databases. But at the same time, if you’re a fintech junior developer iterating on ideas, you have to know the difference between transactional and non-transactional databases. ↩
In fact, by favoring the shiniest technology, you may be recruiting developers who will consistently switch when newer technology appears. ↩
Here’s one recent, unrelated example on Reddit. ↩
While there was a lot of anger at MongoDB on HN, there was also a lot of content around NoSQL and the MEAN stack ↩
Thanks to Ozan Onay for pointing this out. That said, the university model of computer science teaching has challenges of staying current when secular shifts have happened too ↩
Though recruiters may ignore this. ↩