9 Dec

Turning AI Toward Software Itself

I have been a long time Nachi Nagappan fanboy. Nachi’s research work in the Empirical Software Engineering Group (ESE) in Microsoft Research has been to study code itself. Years ago, when I was a budding PM at Microsoft with a feature idea, he explained to me that his team already had bots crawling large piles of code and recording various data points about the code to create maps of data features of the code itself. Cool, huh?

This article looks at the notion of software forensics, patterns, empirical software excellence, and the emergence of AI to help study these topics more deeply than a human is capable.

Software Forensics

I have always wanted to take Nachi’s ideas a step further and correlate code metrics, especially those that are quality-related and map them to points in time wherein we can see outside-the-editor influences on our code. I call this software forensics.

A simple example is my very real hypothesis that code quality on a right-sized team (3-9 members) will go down upon the hiring of a new manager. If the manager is a good one, the code quality metrics will come back up and quantitatively surpass the measures taken of prior performance. We could also see how long that improvement would take, comparing the best team leaders, allowing us to identify and learn from team managers who are:

  1. Positively impactful to the code of the team
  2. Quickest at helping their new team improve
  3. Perhaps not best suited to be managers

I’m not claiming we can boil all the good and bad of being a team leader into empirical data points, but I am asserting that organizational behavior impacts code. Things happening outside-the-editor deeply impact what is happening in the editor. This is an extrapolation of Conway’s Law, which you should know, so follow that link.

Patterns – Nature or Nurture?

It’s a pretty hard problem to look for desired patterns in code. For example, let’s say I have 500,000-line codebase in a statically typed language. We know a “good” pattern to find would be the factory pattern, which we use when creating complex objects. We program code crawlers to look for specific implementations of the factory pattern and when we find patterns like this one and others we already know, we call it “well-formed” code. That is to say, this code is considered good in terms of maintainability and quality through the use of known standard patterns.

The gang of four book was the first to describe many of the formal patterns still being used today. What it described, along with the many me-too books and training that cme after, was a way to get code under cognitive control easily and quickly — “Ah yes! A service broker, I know what that is!”

This march toward design patterns has been invaluable and has evolved into higher and higher levels of abstraction so that we now have cloud design patterns. As effective as patterns are to wrangling code and system configurations, how do we know which patterns are the best for the problems they seek to solve? What if there are patterns that exist, but we haven’t discovered them yet? Maybe the patterns are across too large a code base to get in our minds around all the contributing factors to a large mega-pattern, (more than 11 dimensions of thought)? Normally this is time for an abstraction layer, but we build these to help our human brains, not the end software or the compiler.

Exploring Code with Software

Higher end IDE’s typically provide refactoring tools that recognize opportunities to refine code so it is in a “better” form. These are very helpful tools and I depend on them to help teach me to be a better developer. But what does the compiler care?

Another simple example of exploring code to help the creators is a feature on GitHub I’ve found useful in my JavaScript applications hosted on the site. I may go to my project page and see a message, “You’ve got a dependency on a risky package.” That’s a feature that required looking for patterns in code and definitely counts as software forensics as far as I’m concerned. I don’t know exactly how GitHub extracts this data (seems like a simple pattern match at scale), but it’s hugely useful and was enabled by software reading code.

Finally, there is a fantastic tool, SonarQube, to put into your build/deployment pipelines. This tool finds code smells, anti-patterns, potential exception states, and other good and bad things about code. It also provides a dashboard to visualize patterns in the code and potentially gamifies code excellence. I digress.

Adding a Dash of AI

These tools are a great start, but what will happen as we begin to apply AI to our code?

What if the patterns we don’t know about are out there, waiting to be exploited, but we just don’t know about them? What a great opportunity to apply Machine Learning or some other AI to find those patterns of effectiveness, both in stagnant code bases and at runtime. An AI might be able to show the negative impact of a valued design pattern when it is fat with data or during costly instantiation, or what have you.

I have no doubt that organic patterns of software design exist and have yet to be discovered, at varying levels of software abstraction. And, AI can find them, perhaps redefining what we think of as code quality.

I am not an AI expert, nor do I pretend to be one, but I am lucky enough to have a manager who is an AI genius and has been doing this for many years. Through conversations with him, I can imagine a compelling future for us developers as we apply AI to our work, perhaps even the code of the AI itself (spooky B-movie idea here).


Microsoft’s recent acquisition of GitHub is a prime opportunity to marry Dr. Nagappan’s passion with what may well be the largest single repository of code in the world. Adding an AI expert and data scientist to this space could find amazingly deep intelligence and different ways to slice and dice the data features of the code itself.


Ownership only has so much value anyway, given an outside app could clone all public repos on GitHub and run analytics on the code they contain.

We are undoubtedly in the age of software helping humans better understand software construction and runtime patterns and anti-patterns. This is exciting and I look forward to automated cataloging of currently unknown patterns and anti-patterns of software creation.

Given the depth of visualization capable by a sufficiently purposeful AI, I believe we are also on the verge of correlating these findings to organizational development, capitalizing on Conway’s Law. What if we managed our organizations according to what is effective in the end product? That could produce some interesting organizational structures, using AI-discovered “best practices”.

Surely someone is already working on this. Who is it? What are they finding? Would love to learn more from your comments.