Balancing speed vs. security in the shadow software supply chain

Originally published at: Balancing speed vs. security in the shadow software supply chain - OpenSource.net

Modern package management is one of the most transformative aspects of contemporary applications. Application teams have achieved historic efficiency as libraries deal with the repetitive grunt work, freeing corporate developers to focus on high-level domain-specific logic. The explosion of Javascript, Python, Golang and other languages is tightly linked to the vast ecosystems of Open Source libraries available.

These package managers make tracking dependencies for security and licensing concerns easy. With relative ease, software owners can assemble a dependency tree with nested dependencies and so on. While changes in dependencies can inadvertently have significant downstream effects (such as the shift of a library that Ruby on Rails depends on switching to GNU Library General Public License version 2.0 (GPL v2) causing license incompatibility for most Rails projects), the fact that these downstream effects can be tracked is evidence of a mature software supply-chain management system.

The management of these interlocking components is conditional on efficient collection and notification. The problem is that a shadow software supply chain, with unclear licensing and risky code, is almost undetectable from other contributions.

Stack Overflow and the modern software engineer

Stack Overflow is a valuable resource for today’s software engineers. It’s a great place to learn how to solve problems that you may not otherwise know how to solve. However, younger programmers tend to over-rely on third-party snippet websites.

It’s not just Stack Overflow; countless blogs and YouTube tutorials cover most of what most programmers need. A person can get a software engineering undergraduate degree with relative ease if they know what to for search online. Copying and pasting from online sources has become central to young coder’s meme culture. While these engineers should try to read through and re-implement code (adhering to security best practices), copying and pasting is so easy that there’s little perceived need to understand what the code does, provided that it works.

Traceability and fragmented code

What distinguishes shadow source from Open Source and why is it a concern? Shadow source is “shadow” because it’s untrusted, untraceable, indiscernible code from internally developed code. In contrast, first-party code involves at least one person who comprehends what’s being implemented. When first-party code is used, at least one person internally understands it. Teams can incorporate changes into their software development lifecycle (SDLC) practices in order to comply with security standards. These can include threat modeling and development best practices to mitigate malformed data bugs.

Developers prioritize the inputs and outputs of methods when utilizing third-party code, whether it’s obtained from an Open Source library or an online source. As long as the inputs and outputs match what they expect, not much more thought is given to the internals, overlooking the risks with malformed queries.

Large Open Source libraries reduce risk by undergoing intense security scrutiny, especially from large adopters. Like any software, there will be bugs and exploits, but they are much more difficult to find. Package management makes it relatively easy to update if a bug is found by security researchers or through a bug bounty program. There will be some remediation nightmares (like Log4Shell), but those are relatively rare.

The fundamental difference between large-scale Open Source and shadow source? Shadow source has the additional disadvantage of being less visible to security researchers and lacks a proper notification infrastructure. Online examples are often as straightforward as possible while relying on as few libraries as possible. This is a dangerous combination.

Security isn’t a concern, nor a responsibility, for online contributors. Writing secure code is unnecessary and a time-sink for people on unpaid forums. Online, simple solutions drive engagement. Chasing clicks and shares, people with blogs or YouTube channels are actively incentivized to publish samples that aren’t production-ready.

The artificial intelligence problem

Shadow source is nothing new, but there’s another factor that will cause it to explode at scale: generative AI. I first used GitHub Copilot in the early public-beta period. While I often still write code by hand to prevent my skills from atrophying, it’s a fantastic tool when I need to code quickly.

At the same time, generative AI enables subpar development practices. Developers can easily generate code, which means that achieving a working solution can be quick and effortless. As a result, the advantages of standardizing development processes and tools may become less apparent or noticeable.

Standardization brings a host of benefits for security, reliability and quality. These benefits include easier remediation across a codebase if a bug is found, shorter onboarding when contributors switch teams and faster development. The speed of development is the most convincing reason to invest in standardization.

AI removes that specific benefit while leaving the others in place. In fact, I’ve stopped using Copilot for projects where long-term sustainability matters after noticing that my software devolves into unmanageable spaghetti code with lots of repeated (but slightly different) blocks.

Widespread rapid remediation, if a bug or vulnerability is discovered remains a concern, and small and needless variability increases the risk of a bug or vulnerability occurring. So, with this in mind, how do you shrink shadow code and promote standardization when the self-interested incentives are diminishing?

A path forward

Application security teams must adapt their developer relations strategies to organizational culture and objectives. However, generally, I believe that AppSec teams should try to support and guide developers to a more secure codebase rather than a heavy-handed approach. (More on this in a future post, lest I digress too much here.) For most teams, the best approach is to make the path of least resistance the one that also brings the most valuable security benefits.

Do it once, do it right

I’ve long supported a “do it once, do it right” development philosophy. I coined this phrase on my high-school robotics team to describe our investment in high-quality and reliable code, even if it comes with more upfront investment. If you do something the right way the first time, you’ll end up with a better solution in less time than if you cobble together a quick solution that requires replacing later.

Security and software leadership should push for a culture that values taking longer for a high-quality product rather than a “move fast and break things” attitude. Moving fast is good, but users have little tolerance for things breaking. Instead, teams should emphasize building a solid foundation that allows for faster iteration on top of it. That way, development teams can continue to think at a high level but have trust in the underlying technologies.

If a problem arises in one of the underlying technologies, it can be remediated once and pushed across the various applications. Since more effort can be devoted to developing some of the core technologies, the individual risk of an incident occurring is much less than if several different variations of a desired end goal are deployed.

A robust set of libraries reduces the need for shadow sources by providing an easy and trusted way to interact with common operations. For example, a generative AI might build Structured Query Language (SQL) queries using string concatenation. If the app uses an object–relational mapping (ORM,) then the ORM reduces the need to interact with direct SQL and provides a unified place for input sanitization.

Moving to higher-level abstractions, developers get a better developer experience and security teams have more opportunities to influence data flows. A developer would then interact with the preferred methods out of self-interest because it’s more convenient than using a shadow source.

Driving adoption

Service-oriented security can be achieved through API proxies, third-party cloud providers, Open Source libraries, or first-party internal libraries. It simplifies developer experience through the use of libraries and services, while also aligning security teams with development teams.

Open Source libraries

Open Source libraries provide many useful abstractions and for most teams, they are a very good starting point. Most of the functionality needed for apps does not provide any meaningful competitive advantage. Thus, redundant development work is simply unnecessary duplication.

While Open Source has seen widespread success in powering enterprise applications, further investments in adoption, creation and standardization of libraries can help improve security and productivity. This still requires proper software supply chain management, but if developers can be given even easier interfaces without having to rely on shadow sources, it will increase traceability.

Cloud providers

Cloud providers can be an easy way to quickly standardize certain technologies and offload much of the security work to a third party. For small companies or companies that want to keep small IT and security teams, cloud services are one of the best ways to build quickly on secure code. They take responsibility for some aspects of security (though a secure implementation is still required) and make a very smooth developer experience.

Cloud providers are not an ideal solution for long-term growth, however. They create significant lock-in, which can create headaches if a product is ever deprecated or becomes prohibitively expensive. Cost is another major concern, especially for very large organizations. Compute resources are a very cost-competitive area, so the profit margins are small. Instead, the cloud service providers justify their value through services and APIs.

The costs can add up quickly, and the bureaucratic world of enterprise procurement means that you may end up paying more for an inferior product with a preferred vendor than you could otherwise get. There’s certainly a value argument for cloud services in some contexts. However, as costs continue to increase, it will become less justifiable for many organizations.

Internal abstractions

Internal abstractions provide value when a company has many different products being developed or is in a regulated environment. These abstractions can be better tailored for specific business constraints, creating a better developer experience. However, they require much greater investment than open-source or cloud solutions.

A good use of internal abstractions is as a wrapper around trusted open-source libraries. With this approach, there can be both a well-tested Open Source library and additional opportunities for security engineering teams to funnel data flow and add their own controls. The developer experience for the wrapper libraries is likely better than that of the core library.

Code reviews

Automated and manual code reviews are an important step for software security. As part of the merge approval process, reviewers should look for functions that may be generalizable but were instead put inline. Extra concern should be given to inline, high-risk functionality that is repeated in multiple parts of the codebase (such as database queries).

Security-oriented generative AI

I don’t pretend to understand how generative AI works. However, since there are security-oriented partners dominating the industry (Microsoft and Google), code generation could be pushed to emphasize strong security practices and permissive licenses.

In short tests, ChatGPT and Google Bard tend to output highly insecure results, but radically increase security when adding “securely” to the prompt. For example, in a test of ASP.NET code, adding “securely” shifted from raw string concatenation in SQL queries to language-specific SQL construction functions that have built-in sanitization.)

They should also place heavier weights for training on large, permissively-licensed projects. Much of the problem with shadow source is that the code provided has not been tested and the licensing is unclear. Licensing for AI generation is still unclear (and sources like blogs and forums don’t help,) but tracking attribution and prioritizing permissive licensing would help as the licensing issues get resolved in courts.

What’s next

Shadow source code is pervasive in online forums and will continue to grow with generative AI. The mainstream adoption of generative AI will make it easier to produce poorly structured and written code, leading to much more vulnerable applications and lengthening remediation. As maintaining poor codebases becomes easier, teams will need to be more intentional about adhering to best practices.

The best way to encourage best practices is to make the most secure path also the path of least effort for developers, leading to the natural adoption of best practices.

Photo by Zac Ong on Unsplash