Newsgather
BackVisualizing Massive Malware Datasets: A Staggering Comparison
Visualizing Massive Malware Datasets: A Staggering Comparison
Tecnología
TechCrunch13.05.2026Tecnología2 dk okumaUnited States

Visualizing Massive Malware Datasets: A Staggering Comparison

En resumen

Malware repositories vx-underground (30TB) and VirusTotal (31PB) are compared in scale by hypothetically stacking 1TB hard drives, revealing VirusTotal's data would form a structure nearly as tall as the Burj Khalifa.

Resumen generado por IA

Por qué importa

Malware research and storage play a critical role in cybersecurity for training detection models.

Tamaño de fuente

Malware research group vx-underground, which says it has the largest collection of malware source code, said in a post on X that its archive of data amounts to about 30 terabytes. A reply by Bernardo Quintero, founder of VirusTotal, an online service that scans files for malware across multiple antivirus engines at once, said his service has about 31 petabytes of malware samples that users have contributed to date. (A petabyte is ~1,000-times larger than a terabyte.) In both cases, that’s a lot of data. For context, cybersecurity companies, AI researchers, and threat intelligence firms treat repositories like these as critical for training detection models and understanding how attacks evolve. But this had us wondering: What would these enormous datasets actually look like stacked as hard drives one on top of the other and side-by-side? And how would they compare to, say, the Eiffel Tower? Someone in our newsroom asked an AI chatbot this question, and it got it incredibly wrong. Instead, we did some rough back-of-a-napkin math to figure out how tall these data banks would be. Since vx-underground and VirusTotal both have “about” that much data each, “about” is good enough for us in this case. Let’s say we’re using 1 terabyte capacity internal hard drives, since these are generally designed to be the same physical size to fit inside any computer. These standardized 3.5-inch internal hard drives are 1 inch in height, which for the sake of stacking one on top of the other is really what we want to know here. We’re also assuming that the hard drives we’re using in this example are exactly 1 terabyte, because in reality the total usable file capacity of a hard drive is generally somewhat less. Using this online conversion tool, it looks like vx-underground’s 30 terabytes of malware data could fill 30 hard drives stacked on top of one another, reaching 30 inches, or about 2.5 feet tall. For reference, this reporter is 6 feet tall. (See visual below, and yes, terrible opsec, I know.) With that same logic, VirusTotal’s 31 petabytes of submitted data would fill 31,744 hard drives, which stacked on top of another would reach about 2,645 feet. The world’s tallest building, the Burj Khalifa in Dubai, is slightly taller at 2,722 feet. The Eiffel Tower is 1,083 feet tall. By that logic, VirusTotal has about two-and-a-half Eiffel Towers’ worth of data.

Preguntas abiertas

  • How do these datasets impact AI training efficiency?

Temas relacionados

This article was originally published by TechCrunch.

Noticias relacionadas

Apple Unveils Numerous App and Service Upgrades at WWDC Beyond Siri
En desarrollo·8 sa önce

Apple Unveils Numerous App and Service Upgrades at WWDC Beyond Siri

Apple announced significant updates to its core apps and services at WWDC, including enhanced Apple Maps with 'Local Lists' and improved 'Flyover,' more flexible location sharing in Find My, and advanced bill splitting in Apple Wallet powered by Apple Intelligence. Other updates include redesigned Apple Pay checkout, expanded Apple Music features like lyrics translation, new search capabilities in Apple Podcasts, improved iCloud Shared Albums, and a new Fitness+ program for menopause.

TechCrunch
Más sobre este temamalware