NEW RESEARCH: A Concordia study looks at the risks of using open-source coding blocks
Remember when one programmer nearly broke the internet in 2016?
He deleted 11 lines of code. Those lines made up left-pad, a small, open-source package of code depended on by big websites like Reddit, Twitter, Spotify and Facebook. The ensuing panic inspired a Concordia researcher to investigate what we can learn from the incident.
“When the original programmer who created left-pad decided to delete his package, it almost brought the internet to its knees,” says Emad Shihab, Concordia University Research Chair in Analytics for Quality Mobile Software and associate professor in the Department of Computer Science and Software Engineering.
“I wanted to know how common it is for developers to use these trivial, ready-made packages and why they use them.”
Shihab delivered his findings at the joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE) in Germany in September, 2017.
He co-wrote the paper with PhD student Rabe Abdalkareem, Master’s students Sultan Wehaibi and Suhaib Mujahid, and undergraduate software engineering student Olivier Nourry.
Popular web apps depend on “trivial packages”
Shihab and his team were among the first to conduct an empirical study investigating the prevalence of code that depends on “trivial packages,” or ready-made short code that performs a simple task. Their findings caught the research team by surprise.
“More than 38,000 npm packages — or 16.8 per cent of the packages on npm — are trivial,” says Shihab, director of Concordia’s Data-driven Analysis of Software (DAS) Lab. (npm is a large software registry of reusable code.)
“These trivial packages are also very popular. In fact, 113 of the 1,000 most downloaded npm packages are trivial. This can be problematic, since these dependencies can break popular web apps by leaving them vulnerable to things like updates, leading to extra work for developers and even slowing down some web apps.”
Take, for example, an online retailer. If their web developer used a trivial package to process credit card payments — a very common occurrence — then their website is dependent on a trivial package. Everything is fine until that package updates and causes a glitch that suspends credit card purchases.
The drawbacks of a coding shortcut
“Why risk using trivial packages, then? Because developers don’t want to reinvent the wheel. If there’s a really popular trivial package used in part of the credit card payment processing, it probably works fine — until it doesn’t,” says Shihab.
“But should retailers really build code from scratch to process credit card payments, or do other relatively simple tasks? It’s safer, yes, but time consuming.”
Shihab calls his research one of the largest empirical studies in software engineering. His team of students mined more than 230,000 npm packages and 38,000 JavaScript applications. They also conducted developer surveys with 88 JavaScript developers to understand why they used these trivial packages.
“Our findings illustrate that the way we develop software has evolved, for good and bad, to include the widespread use of trivial packages — and that has inherent risks. There’s no free lunch,” says Shihab.
“The public cannot assume that the software they depend on is 100 per cent perfect. In fact, the more we study complex software systems, the more we realize systems are becoming increasingly brittle. Our findings on the wide-spread use of trivial packages is proof that the public cannot assume these software systems are bulletproof.”
Read the cited conference paper: “Why Do Developers Use Trivial Packages? An Empirical Case Study on npm”
Learn more about Concordia’s DAS Lab and the Department of Computer Science and Software Engineering.