Many moving parts had to come together for Sourcery to come to life. Founders Nick Thapen and Brendan Maginnis spent the majority of their careers working for others as programmers, but entrepreneurship was in their blood. Nick grew up with parents who owned and operated a printing and photocopying shop, and Brendan’s father was self-employed as an optician. It was a matter of time before they would go out on their own, and the software world would be glad they did.
In 2007 Nick and Brendan met at work, at their first job out of university. With Nick having graduated from Oxford and Brendan from Cambridge, they were meant to be rivals but instead became friends and have been ever since. While at work, they would often talk over lunch about ideas of what they could build together, especially given the many issues they experienced on the job. The company they were working for had a codebase that started in the 80s. It was part of an IBM mainframe language called RPG (Report Program Generator). Remember those old school, green screen terminals?
The code wasn’t great so they experienced firsthand how difficult it was to make changes, how often bugs were introduced, and how much of the business’s time was spent on issues and slow development. They eventually came to the realization that these issues didn’t only exist at their workplace. It was endemic anywhere that had legacy code.
Since 2012 they became fascinated by artificial intelligence and machine learning and kept up with the latest trends. Brendan was eager to start a company but Nick was reluctant. They ended up going their separate ways professionally, but maintained their close friendship.
Nick attended Imperial College, a top Artificial Intelligence/Machine Learning institution, to work on his masters and eventually graduated with distinction in computer science with a focus on artificial intelligence. Upon graduation he continued at Imperial working as a research associate for the next six years.
Brendan took a more practical approach to learn more about machine learning. He attended a weekly deep learning reading group at Imperial College, then would go home and implement the papers he read. Gaining a strong interest in reinforcement learning, he implemented many of the cutting edge algorithms for playing Atari games, and created a version of AlphaGo for playing Connect 4.
The idea was to see if there was a way to automatically refactor code, in other words, improve the quality of the code without changing what it does. This could help programmers code more quickly and bring about stability to their work with less time and effort.
Brendan initially tried a pure machine learning approach applied to Clojure code. It was a tricky problem of using AI to turn source code back on itself. The biggest challenge was to get it to make changes that were correct and changes that were useful. He loved the idea of tackling a challenging problem, the problem of changing the code in the most bounded way, in this really restricted way. He spent six months trying to make it work.
When it didn’t, he played around with a few other ideas and ended up going in a different direction. This new approach felt more convincing to Nick so Brendan asked him once again to join him. Nick hesitated. He had never had the guts to do it before, but then he thought, why not?
They decided to narrow their focus and worked on the Gilded Rose Refactoring Kata which is a well known coding exercise meant to help programmers hone their refactoring skills. The code in the Gilded Rose is hard to understand, and any changes would likely introduce a lot of bugs. It takes a human, a really experienced programmer, about an hour to turn it into a nice form that’s understandable right away. They thought this was a great challenge to test their idea against.
They knuckled down with an approach of writing small refactoring changes they knew were correct and decided to switch to Python which is an easier to understand language with a lot of libraries and a lot of machine learning support. They also started out with simple code metrics which are tools that provide developers insights into the code they’re writing. They tried all the existing code metrics they could find. When they couldn’t find any more, they started writing their own. They tried to get something, anything, to work in Gilded Rose. Then after about five months, Nick found the final piece of the puzzle. He showed Brendan a demo of it improving the code. That’s when they knew it would work. Together, they were able to get something working fairly quickly, but that was just the beginning.
The nearly two years since they started have been difficult, but they’ve found it really rewarding to be self-directed. Along the way, they’ve learned a lot about business and all the things there are to do that are not related to programming.
What Is Sourcery?
In short, they describe Sourcery as “Grammarly for code.” Meaning it takes your code and automatically reworks and improves it. Improving readability cuts down on issues and helps programmers write new code faster. Programmers can spend about 70% of their time trying to understand existing code with only 5% of their time writing new code (Minelli et al., 2015). If there’s code with complicated logic, badly named variables, and deeply nested structures, it can take a long time to understand how the code works even for the programmer who wrote it themselves.
With Sourcery, instead of taking hours to make a change that’s satisfactory, it takes a few minutes. It turns code into something that, as Nick says in his English accent, “reads like a nice bit of prose.”
Sourcery aims to improve productivity by making the code as easy to read as possible.
How Sourcery Ensures Existing Code Doesn’t Break
The goal is to ensure the changes made by Sourcery don’t affect the job the code is meant to do. There are three key pieces to achieve this, static analysis, automated tests, and open source library testing. Essentially Nick and Brendan have programmed rules and tests for what should and shouldn’t be done, also ensuring when it’s safe to make changes and which changes are safe to make. According to Maginnis, “the key innovation is a search algorithm that composes the small refactorings, guided by a probabilistic model; accepting or rejecting proposed refactorings based on whether they improve code quality as determined by our custom suite of metrics. Together these mean that Sourcery is able to make large, correct improvements to the structure of existing code.”
They currently add new rules manually, but within the next year will have a project that uses machine learning. As a backup, open source libraries are downloaded, and Sourcery is run over them to check that all the tests pass. Occasionally things will slip past, but are fixed as soon as possible. Trust is the primary focus for Nick and Brendan. To learn about how it works in greater detail, check out Brendan’s blog post, How do you test code written by code?
Who is Sourcery for?
This version is considered an MVP (minimal viable product). It’s currently most useful for beginner and intermediate programmers. Aside from making their code more readable and more stable, it’s also helping them understand how refactoring works. Educating programmers was an unexpected benefit of Sourcery so another area of focus is improving the documentation to explain why it’s doing what it’s doing.
This is timely with covid and everyone working remotely; there’s less of a chance for new programmers to look over each other’s shoulders. That’s where they think Sourcery might be able to help, acting as sort of an AI assistant like the person sitting next to you who can help make you.
As much as Sourcery is helping beginner and intermediate programmers, they are helping Sourcery. Beginning programmers are the most active with giving feedback. There’s an organic, symbiotic community growing which is exciting to see.
At first Sourcery was going to be completely cloud based. That was the alpha version. Initially programmers said the concept was interesting, but they couldn’t use it since their company forbade sending their code to a third party. Some people were even hostile to the idea because of a predecessor in the space, Kite. Their shady practices had soured the atmosphere.
Nick and Brendan had to convert to running everything locally, meaning on the computer and not in the browser. This was the biggest challenge of the surprising issues. It took a long time, two to three months, to convert everything. Running locally rather than through the cloud is more challenging as it needs to be able to run great on linux, windows, and mac.
The good thing that came out of this change was performance. Previously there was a lag between typing something in and it getting sent to the server then coming back with a suggestion. With running locally, it got snappier and shaved seconds off the response time. Now that they’re in beta they believe they’ve handled most of the major challenges.
So far the feedback from users has been positive. They like that it suggests changes they can learn from and feel the sessions are robust and improve their code. In fact, Sourcery is so effective at refactoring that a teaching assistant who would try to show students an example of bad code in order to teach them how to refactor, couldn’t because Sourcery would automatically refactor it. So he had to uninstall it.
A few customer reviews:
As an intermediate programmer, what I like about Sourcery is it hasn’t broken anything.
As an intermediate programmer, what I like about Sourcery is it hasn’t broken anything.
I am very impressed with Sourcery, it is able to do some very complex refactorings I haven't seen other tools do. And it does it fast, the Pycharm plugin works perfectly too.
Thanks for this kick-ass tool!
In its current form it’s most useful for beginner and intermediate programmers. Business owners and team leaders will be pleased about this. There are risks and costs associated with hiring junior developers. Sourcery helps reduce and mitigate that risk.
There’s also an exciting trend emerging of the “citizen developer”. These are typically people in organizations, who are not developers professionally but use low code and cloud solutions to create business applications to help their team perform. For example, they might have used excel for a specific task but then start doing it in Python to better meet their needs.
Surprisingly, citizen developers are often in a management role. They’re typically new to programming and don’t have computer science degrees. Their solutions become part of the process for their teams but still needs to be maintainable. Sourcery enables these citizen developers to focus on their core job.
Sourcery is a great opportunity for someone looking to invest in a talented, entrepreneurial technical team with founders who have an excellent track record of working together. They have a deep understanding of the problem space since they witnessed firsthand the challenges organizations face with the expenses associated with low quality code and slow development. Their deep and diverse experience combining theory and practice with machine learning is being applied in a methodical, practical way that goes beyond the hype to build a desirable solution in an expanding market.
Sourcery will be pitching at our May 22, 2022 Virtual Demo Day. Register to hear about how Fast closed their funding round during Covid-19, and learn how companies and investors are navigating it.
For more information, Nick and Brendan can be reached at
Minelli, Roberto, Andrea Mocci, and Michele Lanza. “I know what you did last summer: an investigation of how developers spend their time.” Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension. IEEE Press, 2015
The developer coefficient. (2018). Retrieved from https://stripe.com/reports/developer-coefficient-2018
Shah, R. (2019, December 19). Citizen development is here to stay. Retrieved from https://www.mendix.com/blog/citizen-development-is-here-to-stay/