AI is going to get really, really good at writing code. It is also thousands of times cheaper than humans. What does this mean for the software profession?
This is a kind of companion piece to my recent CACM article, The End of Programming. In that article, I asked the question of what happens when writing code gets replaced by training AIs. In this one, I’m looking at the near-term implications of AI getting really good at writing software for us.
It seems clear that AI for generating, debugging, and refactoring code is only going to get much, much better in the next couple of years. I don’t think this is controversial, but to bolster the argument a bit: (1) CoPilot is already friggin’ amazing at generating new code given appropriate context and prompting; and (2) There is nothing but more data and bigger models standing in the way of CoPilot getting an order of magnitude better.
Before using CoPilot regularly, I thought it was only going to parrot back solutions to toy undergraduate homework assignments. What I have found is that, in practice, CoPilot reads my mind way more often than it ought to. I’ll be thinking of the next code to write — say a unit test or the next few lines of some code manipulating some data structure or whatever — and CoPilot just magically writes the code that I was about to type. It’s nuts.
So, what happens when AI code generation gets 100x better than it is today?
From PRD to Code in 0.001 seconds
I’ve heard a tremendous amount of skepticism that AI can do a good job writing code, doing code reviews, or fixing bugs— it’s just too hard of a problem and AI tools currently kinda suck and will continue to suck. I think this is a classic example of humans being really bad at extrapolating from recent data points.
My guess is that in the not-too-distant future — maybe 3 years — it will be possible to instruct an AI to take a high-level, natural language spec of a piece of software — a PRD, or a bug report, or a Slack thread, say — and generate “perfectly fine” code from it. What do I mean by “perfectly fine”? Well, first of all, let’s get serious for a minute about the quality of code produced by typical human dev teams. I’ve worked at places like Google and Apple, and let me tell you, the code quality there is, ahem, not always the best. Even at Google, where code review is a religion, style guides are the gospel, and every team’s code is open to scrutiny and sometimes exploitation by every other engineer at the company — I have to say, there is a lot of shitty code. (Maybe less than when I left, not that there is, er, any correlation.)
We need to get away from this idea that human software teams are somehow capable of writing correct and performant code, if only they put enough time and energy into it. This has never been true in any non-trivial software project. In the real world, we tolerate a huge amount of slop and error in human-produced software, so why set expectations for AI-generated code any differently?
By 2026, do we fully trust AI to the point where we can just let it spew out code without any input from red-blooded humans? Probably not for a while, but fortunately, we can have humans review the AI-generated code, and iterate (with AI assistance) on it. Even if the AI-generated code is more difficult to scrutinize than human-written code, this is going to be orders of magnitude more cost-effective and time-efficient than relying on slow humans to do the coding.
AI is way, way, way cheaper than humans.
How much would an AI-powered “software team” cost?
Let’s use the current pricing for the GPT-3
davinci model — $0.02/1K tokens — as a conservative benchmark (this price is bound to go way, way down over time). Let’s say that a typical human software engineer produces about 100 lines of checked in new or changed code every day. (I’m not counting all of the abandoned or experimental code that never lands on the main branch.) Yes, this number is completely made up, but even if I’m off by a factor of 10, or even 100, my point is still valid. A typical source file line is around 10 tokens, so that equates to about 1,000 tokens a day.
(To ground this a bit, I took a typical source file from our codebase consisting of a few hundred lines of pretty tight Python code and tokenized it using OpenAI’s Codex tokenizer. This yielded an average of 9.55 tokens per line. So my guess of 10 tokens a line seems like a reasonable estimate.)
GPT-3 bills by both the input and the output tokens, so let’s assume for sake of argument that the input context to a future CoPilot-powered software creation agent would be, say, 5x the size of the eventual code output (again, spitballing here). This equates to 5,000 tokens of input plus the aforementioned 1,000 tokens of output, 6,000 in all. In other words, using GPT-3, with its current pricing, costs a whopping $0.12 to generate the same amount of code as a human engineer would in a day.
How much does a human software engineer cost per day? Well, if they drink as many Diet Cokes as I do, that is already way more than 12 cents. Conservatively assuming an all-in annual cost of $300K (including salary, benefits, Diet Cokes, the lot), with about 250 working days in a calendar year, a typical software engineer costs about $1200 a day, in other words, 10,000x more than an AI doing the same job.
I would take an AI that generates code roughly on par with a human engineer, which costs 10,000x less, and produces results roughly 10,000x faster — a cost and efficiency savings of 100 million times — any day of the week. And that’s just to replace one human engineer. (Sorry, Rob on the frontend team, you’ve been automated!) You’d be crazy to hire humans to do this job, once the AI gets good enough.
What happens to the art of programming?
If you believe even a tenth of the above, it seems clear that software engineers are an endangered species. But what happens to the art and science of developing computer software when humans are no longer in the loop?
In any software system, there are tradeoffs to be made in terms of code complexity, generality, performance, and time-to-completion. So even if we fire all of our devs and replace them with CoPilot 2030, we’ll still need a way for the (probably still human?) PMs and the AI-based dev teams to have a dialog about the tradeoffs and options when building a software solution.
Wait… What the hell am I talking about?!? If an AI is writing the code, who cares how long it will take to write — the result will be instantaneous, no matter what you ask the AI to do. You can get a fully-general, high-performing solution in the same amount of time that it would take to generate a rough-and-ready prototype. And it no longer really matters how complex the resulting code is, since it no longer needs to be maintainable in the conventional sense. If the code you shipped on Tuesday ends up not doing the job very well on Wednesday, just fire up the ol’ AI and get 1000 new versions in a matter of seconds.
The point is that all of our human-centric notions of The Art of Software Engineering go out the window when humans are no longer writing and maintaining the code. So much of what we do as programmers — the stuff that takes the most time — is contrived work to make things easier for slow and error-prone humans: writing comments, structuring code to facilitate later refactoring, making the code more general-purpose so it can be reused. The thing is, all of this “extra” work is totally unnecessary if code is primarily generated and maintained by machines.
Clearly, AI coding tech has a way to go to get to this point, but I don’t think it’s crazy to imagine AI-generated code being the norm within the next few years. The same K-T Event that hit the art world with Stable Diffusion is likely to hit the software world before very long, given the huge cost and time savings. What we need to figure out is what the post-AI software industry looks like and what we can do now to get ready for it.