Stephanie Doyle, Author at Backblaze Blog | Cloud Storage & Cloud Backup

Your AI Toolbox: 16 Must Have Products

Stephanie Doyle — Wed, 21 Feb 2024 17:05:00 +0000

Folks, it’s an understatement to say that the explosion of AI has been a wild ride. And, like any new, high-impact technology, the market initially floods with new companies. The normal lifecycle, of course, is that money is invested, companies are built, and then there will be winners and losers as the market narrows. Exciting times.

That said, we thought it was a good time to take you back to the practical side of things. One of the most pressing questions these days is how businesses may want to use AI in their existing or future processes, what options exist, and which strategies and tools are likely to survive long term.

We can’t predict who will sink or swim in the AI race—we might be able to help folks predict drive failure, but the Backblaze Crystal Ball () is not on our roadmap—so let’s talk about what we know. Things will change over time, and some of the tools we’ve included on this list will likely go away. And, as we fully expect all of you to have strong opinions, let us know what you’re using, which tools we may have missed, and why we’re wrong in the comments section.

Tools Businesses Can Implement Today (and the Problems They Solve)

As AI has become more accessible, we’ve seen it touted as either standalone tools or incorporated into existing software. It’s probably easiest to think about them in terms of the problems they solve, so here is a non-inclusive list.

The Large Language Model (LLM) “Everything Bot”

LLMs are useful in generative AI tasks because they work largely on a model of association. They intake huge amounts of data, use that to learn associations between ideas and words, and then use those learnings to perform tasks like creating copy or natural language search. That makes them great for a generalized use case (an “everything bot”) but it’s important to note that it’s not the only—or best—model for all AI/ML tasks.

These generative AI models are designed to be talked to in whatever way suits the querier best, and are generally accessed via browser. That’s not to say that the models behind them aren’t being incorporated elsewhere in things like chat bots or search, but that they stand alone and can be identified easily.

ChatGPT

In many ways, ChatGPT is the tool that broke the dam. It’s a large language model (LLM) whose multi-faceted capabilities were easily apparent and translatable across both business and consumer markets. Never say it came from nowhere, however: OpenAI and Microsoft Azure have been in cahoots for years creating the tool that (ahem) broke the internet.

Google Gemini, née Google Bard

It’s undeniable that Google has been on the front lines of AI/ML for quite some time. Some experts even say that their networks are the best poised to build a sustainable AI architecture. So why is OpenAI’s ChatGPT the tool on everyone’s mind? Simply put, Google has had difficulty commercializing their AI product—until, that is, they announced Google Gemini, and folks took notice. Google Gemini represents a strong contender for the type of function that we all enjoy from ChatGPT, powered by all the infrastructure and research they’re already known for.

Machine Learning (ML)

ML tasks cover a wide range of possibilities. When you’re looking to build an algorithm yourself, however, you don’t have to start from ground zero. There are robust, open source communities that offer pre-trained models, community support, integration with cloud storage, access to large datasets, and more.

TensorFlow: TensorFlow was originally developed by Google for internal research and production. It supports various programming languages like C++, Python, and Java, and is designed to scale easily from research to development.
PyTorch: PyTorch, on the other hand, is built for rapid prototyping and experimentation, and is primarily built for Python. That makes the learning curve for most devs much shorter, and lots of folks will layer it with Keras for additional API support (without sacrificing the speed and lower-level control of PyTorch).

Given the amount of flexibility in having an open source library, you see all sorts of things being built. A photo management company might grab a facial recognition algorithm, for instance, or use another to help order the parameters and hyperparameters of the algorithm. Think of it like wanting to build a table, but making the hammer and nails instead of purchasing your own.

Building Products With AI

You may also want or need to invest more resources—maybe you want to add AI to your existing product. In that scenario, you might hire an AI consultant to help you design, build, and train the algorithm, buy processing power from CoreWeave or Google, and store your data on-premises or in cloud storage.

In reality, most companies will likely do a mix of things depending on how they operate and what they offer. The biggest thing I’m trying to get at by presenting these scenarios, however, is that most people likely won’t set up their own large scale infrastructure, instead relying on inference tools. And, there’s something of a distinction to be made between whether you’re using tools designed to create efficiencies in your business versus whether you’re creating or incorporating AI/ML into your products.

Data Analytics

Without being too contentions, data analytics is one of the most powerful applications of AI/ML. While we measly humans may still need to provide context to make sense of the identified patterns, computers are excellent at identifying them more quickly and accurately than we could ever dream. If you’re looking to crunch serious numbers, these two tools will come in handy.

Snowflake: Snowflake is a cloud-based data as a service (DaaS) company that specializes in data warehouses, data lakes, and data analytics. They provide a flexible, integration-friendly platform with options for both developing your own data tools or using built-out options. Loved by devs and business leaders alike, Snowflake is a powerhouse platform that supports big names and diverse customers such as AT&T, Netflix, Capital One, Canva, and Bumble.
Looker: Looker is a business intelligence (BI) platform powered by Google. It’s a good example of a platform that takes the core functionalities of a product we’re already used to and layering on AI to make them more powerful. So, while BI platforms have long had robust data management and visualization capabilities, they can now do things like use natural language search or get automated data insights.

Development and Security

It’s no secret that one of the biggest pain points in the world of tech is having enough developers and having enough high quality ones, at that. It’s pushed the tech industry to work internationally, driven the creation of coding schools that train folks within six months, and compelled people to come up with codeless or low-code platforms that users of different skill levels can use. This also makes it one of the prime opportunities for the assistance of AI.

GitHub Copilot: Even if you’re not in tech or working as a developer, you’ve likely heard of GitHub. Started in 2007 and officially launched in 2008, it’s a bit hard to imagine coding before it existed as the de facto center to find, share, and collaborate on code in a public forum. Now, they’re responsible for GitHub Copilot, which allows devs to generate code with a simple query. As with all generative tools, however, users should double check for accuracy and bias, and make sure to consider privacy, legal, and ethical concerns while using the tool.

Customer Experience and Marketing

Customer relationship management (CRM) tools assist businesses in effectively communicating with their customers and audiences. You use them to glean insights as broadly as trends in how you’re finding and converting leads to customers, or as granular as a single users’ interactions with marketing emails. A well-honed CRM means being able to serve your target and existing customers effectively.

Hubspot and Salesforce Einstein: Two of the largest CRM platforms on the market, these tools are designed to make everything from email to marketing emails to lead scoring to customer service interactions easy. AI has started popping up in almost every function offered, including social media post generation, support ticket routing, website personalization suggestions, and more.

Operations, Productivity, and Efficiency

These kinds of tools take onerous everyday tasks and make them easy. Internally, these kinds of tools can represent massive savings to your OpEx budget, letting you use your resources more effectively. And, given that some of them also make processes external to your org easier (like scheduling meetings with new leads), they can also contribute to new and ongoing revenue streams.

Loom: Loom is a specialized tool designed to make screen recording and subsequent video editing easy. Given how much time it takes to make video content, Loom’s targeting of this once-difficult task has certainly saved time and increased collaboration. Loom includes things like filler word and silence removal, auto-generating chapters with timestamps, summarizing the video, and so on. All features are designed for easy sharing and ingesting of data across video and text mediums.
Calendly: Speaking of collaboration, remember how many emails it used to take to schedule a meeting, particularly if the person was external to your company? How about when you were working a conference and wanted to give a new lead an easy way to get on your calendar? And, of course, there’s the joy of managing multiple inboxes. (Thanks, Calendly. You changed my life.) Moving into the AI future, Calendly is doing similar small but mighty things: predicting your availability, detecting time zones, automating meeting schedules based on team member availability or round robin scheduling, cancellation insights, and more.
Slack: Ah, Slack. Business experts have been trying for years to summarize the effect it’s had on workplace communication, and while it’s not the only tool on the market, it’s definitely a leader. Slack has been adding a variety of AI functions to its platform, including the ability to summarize channels, organize unreads, search and summarize messages—and then there’s all the work they’re doing with integrations rumored to be on the horizon, like creating meeting invite suggestions purely based on your mentioning “putting time on the calendar” in a message.

Creative and Design

Like coding and developer tools, creative of all kinds—image, video, copy—has long been a resource intensive task. These skills are not traditionally suited to corporate structures, and measuring whether one brand or another is better or worse is a complex process, though absolutely measurable and important. Generative AI, again like above, is giving teams the ability to create first drafts, or even train libraries, and then move the human oversight to a higher, more skilled, tier of work.

Adobe and Figma: Both Adobe and Figma are reputable design collaboration tools. Though a merger was recently called off by both sides, both are incorporating AI to make it much, much easier to create images and video for all sorts of purposes. Generative AI means that large swaths of canvas can be filled by a generative tool that predicts background, for instance, or add stock versions of things like buildings with enough believability to fool a discerning eye. Video tools are still in beta, but early releases are impressive, to say the least. With the preview of OpenAI’s text-to-video model Sora making waves to the tune of a 7% drop in Adobe’s stock, video is the space to watch at the moment.
Jasper and Copy.ai: Just like image generation above, these bots are also creating usable copy for tasks of all kinds. And, just like all generative tools, AI copywriters deliver a baseline level of quality best suited to some human oversight. As time goes on, how much oversight remains to be seen.

Tools for Today; Build for Tomorrow

At the end of this roundup, it’s worth noting that there are plenty of tools on the market, and we’ve just presented a few of the bigger names. Honestly, we had trouble narrowing the field of what to include so to speak—this very easily could have been a much longer article, or even a series of articles that delved into things we’re seeing within each use case. As we talked about in AI 101: Do the Dollars Make Sense? (and as you can clearly see here), there’s a great diversity of use cases, technological demands, and unexplored potential in the AI space—which means that companies have a variety of strategic options when deciding how to implement AI or machine learning.

Most businesses will find it easier and more in line with their business goals to adopt software as a service (SaaS) solutions that are either sold as a whole package or integrated into existing tools. These types of tools are great because they’re almost plug and play—you can skip training the model and go straight to using them for whatever task you need.

But, when you’re a hyperscaler and you’re talking about building infrastructure to support the processing and storage demands of the AI future, it’s a different scenario than when other types of businesses are talking about using or building an AI tool or algorithm specific to your business’ internal strategy or products. We’ve already seen that hyperscalers are going for broke in building data centers and processing hubs, investing in companies that are taking on different parts of the tech stack, and, of course, doing longer-term research and experimentation as well.

So, with a brave new world at our fingertips—being built as we’re interacting with it—the best thing for businesses to remember is that periods of rapid change offer opportunity, as long as you’re thoughtful about implementation. And, there are plenty of companies creating tools that make it easy to do just that.

The post Your AI Toolbox: 16 Must Have Products appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Data Storage Beyond the Hardware: 4 Surprising Questions

Stephanie Doyle — Thu, 18 Jan 2024 17:57:03 +0000

We’ve gathered you together here today to address some of weirdest questions (and answers) about everyone’s favorite topic: data storage.

From the outside looking in, it’s easy to think it’s a subject that is as dry as Ben Stein in “Ferris Beuller’s Day Off”. But, given that everyday functions are increasingly moving to the internet, data storage is, in some ways, the secret backbone of modern society.

Today it’s estimated that there are over 8,000 data centers (DCs) in the world, built on a variety of storage media, connected to various networks, consuming vast amounts of power, and taking up valuable real estate. Plus, the drive technology itself brings together engineering foci affected by (driving?) everything from clean room technology to DNA research.

Fertile ground for strange, surprising questions, certainly. So, without further ado, here are some of our favorite questions about data storage.

1. Does a Hard Drive Weigh More When It’s Full?

Short answer: for all practical purposes, no. Long answer: technically yes, but it’s such a miniscule amount that you wouldn’t be able to measure it. Shout out to David Zaslavsky for doing all the math, and here’s the summary.

As Einstein famously hypothesized, e = mc². If it’s been a while since you took physics, that formula defined is that energy is equal to mass multiplied by the speed of light squared. Since energy is defined by mass, then, we can infer that energy has a weight, even if it’s negligible.

Now, hard drives record data by magnetizing a thin film of ferromagnetic material. Basically, you’re forcing the atoms in a magnetic field to align in a different direction. And, since magnetic fields have differing amounts of energy depending on whether they’re aligned or antialigned, technically the weight does change. According to David’s math, it’d be approximately 10^-14 g for a 1TB hard drive.

2. How Loud Is the Cloud?

In the past, we’ve talked about how heavy the Backblaze Storage Cloud is, and we’ve spent some ink on how loud a Backblaze DC is. All that noise comes from a combination of factors, largely cooling systems. Back in 2017, we measured our DCs at approximately 78dB, but other sources report that DCs can reach up to 96dB.

When you’re talking about building your own storage, my favorite research data point was one Reddit user’s opinion:

But, it’s still worth investing in ways to reduce the noise—if not for worker safety, then to reduce the environmental impact of DCs, including noise pollution. There are a wealth of studies out there connecting noise pollution to cardiovascular disease, hypertension, high stress levels, sleep disturbance, and good ol’ hearing loss in humans. In our animal friends, noise pollution can disrupt predator/prey detection and avoidance, echolocation, and interfere with reproduction and navigation.

The good news is that there are technologies to keep data centers (relatively) quiet when they become disruptive to communities.

3. How Long Does Data Stay Where You Stored It?

As much as we love old-school media here at Backblaze, we’re keeping this conversation to digital storage—so let’s chat about how long your data storage will retain your media, unplugged, in ideal environmental conditions.

We like the way Enterprise Storage Forum put it: “Storage experts know that there are two kinds of drive in this world—those that have already failed, and those that will fail sooner or later.” Their article encompasses a pretty solid table of how long (traditional) storage media lasts.

However, with new technologies—and their consumer applications—emerging, we might see a challenge to the data storage throne. The Institute of Physics reports that data written to a glass memory crystal could remain intact for a million years, a product they’ve dubbed the “Superman crystal.” So, look out for lasers altering the optical properties of quartz at the nanoscale. (That was just too cool not to say.)

4. What’s the Most Expensive Data Center Site?

And why?

One thing we know from the Network Engineering team at Backblaze is that optimizing your connectivity (getting your data from point A to point B) to the strongest networks is no simple feat. Take this back to the real world: when you’re talking about what the internet truly is, you’re just connecting one computer to every other computer, and there are, in fact, cables involved.

The hardware infrastructure combines with population dispersion in murky ways. We’ll go ahead and admit that’s out of scope for this article. But, working backwards from the below image, let’s just say that where there are more data centers, it’s likely there are more network exchanges.

Source.

From an operational standpoint, you’d likely assume it’s a bad choice to have your data center in the middle of the most expensive real estate and power infrastructures in the world, but there are tangible benefits to joining up all those networks at a central hub and to putting them in or near population centers. We call those spaces carrier hotels.

Here’s the best definition we found:

There is no industry standard definition of a carrier hotel versus merely a data center with a meet-me room (MMR). But, generally, the term is reserved for the facilities where metro fiber carriers meet long-haul carriers—and the number of network providers numbers in the dozens.
—Data Center Dynamics

Some sources go so far as to say that carrier hotels have to be in cities by definition. Either way, the result is that carrier hotels sit on some of the most expensive real estate in the world. Citing DGTL Infra from April 2023, here are the top 25 U.S. carrier hotels:

Let’s take #12 on this list, the NYC listing. According to PropertyShark, it’s worth $1.15 billion. With a b. That’s before you even get to the tech inside the building.

Source.

If you’re so inclined, flex those internet research skills and look up some of the other property values on the list. Some of them are a bit hard to find, and there are other interesting tidbits along the way. (And tell us what you find in the comments, of course.)

Bonus Question: Is It Over Already?

Look, do I want it to be over? No, never. But, the amount of weird and wonderful data storage questions that I could include in this article is infinite. Here’s a shortlist that other folks from Backblaze suggested:

How broken is too broken when it comes to restoring files from a hard drive? (This is a whole article in and of itself.)
When I send an email, how does it get to where it goes? (Check out Backblaze CEO Gleb Budman’s Bookblaze recommendation if you’re curious.)
What happens to storage drives when we’re done with them? What does recycling look like?

So, the real question is, what do you want to know? Sound off in the comments—we’ll do our best to research and answer.

The post Data Storage Beyond the Hardware: 4 Surprising Questions appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Bookblaze: The Second Annual Backblaze Book Guide

Stephanie Doyle — Wed, 27 Dec 2023 17:00:00 +0000

It’s that time again—cozy season is upon us and your Backblaze authors are eager to share some of their favorite reads. Feel free to use them as a gift guide (if you still have gifts to give, that is), as a list of recs to start your New Year’s resolutions off right, or just some excellent excuses to take some much-needed solo time away from the family.

So, whether the weather outside is frightful, or, like at our home office in San Mateo, weird and drizzly, we hope you enjoy! And, as always, feel free to let us know what you thought in the comments.

Tech Expertise and Whimsical Reads, All in One List

Pat Patterson, Chief Technical Evangelist

Too Much Too Young: The 2 Tone Records Story, by Daniel Rachel

In 1979, a clutch of young, multiracial bands burst onto the music scene in the UK, each offering their own take on ska, the precursor to reggae that originated in 1950’s Jamaica. “Too Much Too Young”, named after The Specials’ 1980 UK number one hit, tells the fascinating story of how bands such as The Specials, The Selecter, and The Beat (ok, “The English Beat” in the U.S.) took punk’s do-it-yourself ethic, blended it with reggae rhythms, and, as the 70s turned into the 80s, released a string of singles and albums that dominated the pop charts.

Looking back from 2023, it’s astonishing to realize that this was the first time many audiences had seen black and white musicians on stage together, and musician-turned-author Daniel Rachel does a great job telling the 2 Tone story in the context of the casual racism, economic recession, and youth unemployment of the time. Highly recommended for any music fan, whether or not you remember moonstomping back in the day!

Vinodh Subramanian, Product Marketing Manager

Build: An Unorthodox Guide To Making Things Worth Making, by Tony Fadell

I picked up this book while waiting for a flight at an airport and it quickly became a source of inspiration. Authored by Tony Fadell, who played a significant role in building successful tech products like iPod, iPhone, and the Nest thermostat, the book provides insights and strategies on how to build yourself, build your career, and ultimately build products that users love. What I love about the book is how it creates a practical roadmap for building things in life and business, and it makes those things seem more possible and achievable regardless of what stage of career (or life) you’re in. I’d highly recommend this for anyone who loves to build things, but is not sure what to focus on in what order.

nathaniel wagner, Senior Site Reliability Engineer

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, by Martin Kleppmann

Backblaze has created several data intensive applications, and while normally I am not a fan of deeply technical books because I am a learn-by-doing type of person, I think this book does a fantastic job at explaining the strengths and weaknesses of various strategies to handling large amounts of data. It also helps that I am a big fan of the freedom/speed of NoSQL, and here at Backblaze we use Cassandra to keep our index of over 500 billion Backblaze B2 files. :)

Nicole Gale, Marketing Operations Manager

Before the Coffee Gets Cold, by Toshikazu Kawaguchi

It’s probably the shortest book I read this year, but the one that stuck with me the most. “Before the Coffee Gets Cold” is a new take (at least for me) on time traveling that dives into what would you do if you could go back in time, but it doesn’t change anything (or does it?). Each chapter is a short story following a different character’s journey to decide to sit in the chair and drink the coffee. You won’t regret picking up this book!

Andy Klein, Principal Cloud Storage Storyteller

A Brief History of Time, by Stephen Hawking

I reread “A Brief History of Time” by Stephen Hawking this past year. I read it years ago to understand the science. This time as I read it I felt an appreciation for the elegance that is the universe. The book is an approachable scientific read, but it does demand your full attention while reading, and if you slept through your high school and college physics classes, the book may not be for you.

Molly Clancy, Senior Content Editor

Demon Copperhead, by Barbara Kingsolver

“Demon Copperhead” is the book that brought me back to reading for pleasure after having a baby. Some perspective for new parents—he’s almost one and a half, so… go easy on yourselves. Anyway, about this book: you probably never thought you wanted to get inside the head of a teenage boy from the hollers of coal country, but you do. Trust me, you do. Barbara Kingsolver doesn’t hold back when it comes to, let’s say, the authenticity of what a teenage boy from the hollers of coal country thinks about, and she somehow manages to do it without being cringe. It’s a damning critique of social services, the foster care system, the school system to some extent, Big Pharma to a huge extent, and even Big City Liberals in a way that’s clarifying for this Big City Liberal who now lives …in the hollers of coal country.

Troy Liljedahl, Director, Solutions Engineering

Radical Candor: Be a Kick-Ass Boss Without Losing Your Humanity, by Kim Scott

The book that really stuck with me this year is “Radical Candor” by Kim Scott. This was the best book on leadership and management I’ve ever read, and I’ve been recommending it to my friends and colleagues who are looking for ways to improve in those skills. I love how Scott gives you actionable items to take with you into the workplace rather than generalized advice that’s less applicable to specific situations. I loved the book so much I started listening to the Radical Candor podcast, which has quickly become a favorite of mine as well.

Kari Rivas, Senior Product Marketing Manager

The Grace Year, by Kim Liggett

For fans of “The Handmaid’s Tale”, “Hunger Games”, and any other books where women are badasses (can I say that?) fighting a dystopian empire, “The Grace Year” will not disappoint. This book examines the often fraught and complex relationships between women, with a magical bent. Think Lady of the Flies. Just like the mentioned references, this thrilling read will leave you feeling both hopeful and sad—exactly the mix of feelings we’re all looking for at the end of the year, amIright?

Yev Pusin, Senior Director, Marketing

The Aeronaut’s Windlass, by Jim Butcher

I do not feel like I need to sell this book too hard. Here’s the gist. Jim Butcher (of Dresden Files and Codex Alera fame) wrote this book. It’s about an airship-filled steampunk society that’s divided into living habitats they call spires. It has air ship battles. Magic. Snarky characters. And possibly most important of all: TALKING CATS AS A MAIN CHARACTER. Enjoy.

Mark Potter, Chief Information Security Officer

To Shape a Dragon’s Breath: The First Book of Nampeshiweisit, by Moniquill Blackgoose (and some other bonus books!)

I don’t really have a book recommendation, but I have a few books that I’m reading at the moment: “To Shape a Dragon’s Breath” (a recommendation from a fellow Backblazer that I’m only a couple of chapters into) and Robert Jordan’s “The Eye of the World” (has been on my list for over a decade, so far I’m underwhelmed).

Gleb Budman, Chief Executive Officer

Tubes: A Journey to the Center of the Internet, by Andrew Blum

The idea that the internet is “a series of tubes” may have been widely mocked when former Senator Ted Stevens of Alaska famously described it. But he wasn’t entirely wrong. I love how Blum starts with a simple question: “Where does this cord that comes out of my modem actually go?” and then that takes him on a journey of exploration around the world.

Alison McClelland, Senior Technical Editing Manager

Packing for Mars: The Curious Science of Life in the Void, by Mary Roach

Mary Roach presents a unique view of the challenges of space, investigating the comical side of planetary exploration, from zero-gravity hijinks to the surprisingly practical challenges of personal hygiene in orbit. Forget packing trendy outfits in your stylish carry-on; in the cosmos, it’s all about zero-gravity hairstyles and toothpaste that doesn’t float away mid-brush.

Stephanie Doyle, Associate Editor and Writer

All the Birds in the Sky, by Charlie Jane Anders

This book is a wonderful mashup of near-future sci fi, magical realism, strong character arcs, and so much more. It’s brilliant at taking things that seem familiar—urban San Francisco for example, or science as a concept—and inserting chaos and whimsy in ways that challenge our base assumptions and create a totally unexpected, but absolutely believable, universe. It’s so matter-of-fact in tone, that you may just question whether magic does exist. And, with all that, the book ends by delivering a poignant and thoughtful ending that turns all that quirkiness inside out, and forces you to wonder about the world you’re living in right now, and how you can change things. It’s one of my go-to recommendations for fans of all kinds of fiction.

Patrick Thomas, Senior Director, Publishing

Mr. Penumbra’s 24-Hour Bookstore, by Robin Sloan

So, full disclosure—I continue to struggle with being a toddler dad when it comes to reading. (Evidence: I’ve read “The Grinch”10 times in the last 24 hours and my heart is feeling three sizes too small). So this isn’t a new recommendation, but rather a recommendation I’m realizing not enough people in tech have received yet. “Mr. Penumbra’s 24-Hour Bookstore” brings together my two worlds: books and tech… and, well, fantasy and mystery sort of (not my worlds, but I like to dwell in the idea that there’s a near-real fantasy world at the edge of our experience). If you like data and narrative structure, or if you like a spooky adventure, or if you like dusty old bookshops, Robin Sloan has you covered with this one. And, once you’ve read this, get on his email lists, he writes about history, fiction, and technology (and olive oil) beautifully. P.S.: I don’t know why Picador insists on this terrible cover, it does little to convey the world inside the book—don’t make my mistake and judge this book by its cover).

Happy Reading From Backblaze

We hope this list piques your interest—we may be a tech company, but nothing beats a good, old fashioned book (or audiobook) to help you unwind, disconnect, and lose yourself in someone else’s story for a while. (Okay, we may be biased on the Publishing team.)

Any reading recommendations to give us? Let us know in the comments.

The post Bookblaze: The Second Annual Backblaze Book Guide appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

10 Holiday Security Tips for Your Business

Stephanie Doyle — Tue, 05 Dec 2023 16:26:23 +0000

’Tis the season—for ransomware attacks that is. The Federal Bureau of Investigation (FBI) and the Cybersecurity and Infrastructure Security Agency (CISA) observed increases in cyber attacks on weekends and holidays. Several of the largest ransomware attacks in 2021 happened over holiday weekends, including Mother’s Day, Memorial Day, and the Fourth of July. This tactic may be attractive because it gives cyber attackers a head start to map networks and propagate ransomware throughout networks when organizations are at limited capacity.

The reason for this is simple: one of the easiest and most effective ways for bad actors to gain access to secure networks is by targeting the people who use them through phishing attacks and other social engineering techniques. Employees are already behind the eight ball so to speak, as email volume can increase up to 100x during the holiday season. Add to the equation that businesses often have increased workloads with fewer folks in office, or even office closures, and you have an ideal environment for a ransomware attack.

Phew! Aren’t we supposed to be celebrating this time of year? Absolutely. So, let’s talk about ten things you can do to help protect your business from cyberattacks and organized crime during the holiday season.

Get the Ransomware Ebook

There’s never been a better time to strengthen your ransomware defenses. Get our comprehensive guide to defending your business against ransomware this holiday season.

10 Security Tips for Your Business This Holiday Season

1. Update Your Tech

Teams should ensure that systems are up to date and that any new patches are tested and applied as soon as they are released, no matter how busy the company is at this time. This is, of course, important for your core applications, but don’t forget cell phones and web browsers. Additionally, personnel should be assigned to monitor alerts remotely when the business is closed or workers are out of the office so that critical patches aren’t delayed.

2. Review Your Company Security Policy With All of Your Employees

All businesses should review company security policies as the holiday season approaches. Ensure that all employees understand the importance of keeping access credentials private, know how to spot cybercrime, and know what to do if a crime happens. Whether your staff is in-office or remote, all employees should be up to date on security policies and special holiday circumstances.

3. Conduct Phishing Simulation Training

Another important step that organizations can take to ensure security over the holidays is to conduct phishing simulation training at the beginning of the season, and ideally on a monthly basis. This kind of training gives employees a chance to practice their ability to identify malicious links and attachments without a real threat looming. It’s a good opportunity to teach workers not to share login information with anyone over email and the importance of verifying emails.

4. Then, Make Sure Recommended Measures Are Set Up, Especially MFA

Multifactor authentication (MFA) fatigue happens when workers get tired of logging in and out with an authenticator app, push notification, or with a text message—but it’s one of the single best tools in your security arsenal. During the holidays, workers might be busier than usual, and therefore, more frustrated by MFA requirements. But, MFA is crucial for keeping your business safe from ransomware and domain denial of service (DDoS) attacks.

5. Have an Offline Backup

It’s easy to forget, in our ever-more-connected world, that taking business data offline is one of the best protections you can offer. You still need to have a process to make sure those offline backups are regularly updated, so set a cadence. But, particularly with your business-critical data, offline backups represent a last line of defense that can make all the difference.

6. Adjust Property Access Privileges

You might be surprised to know that physical security is a cybercrime prevention tool as well. Doors and devices should be the most highly protected areas of your space. Before the holidays, be sure to do a thorough review of your business’ access privileges so that no one has more access than is necessary to perform their duties. And, before shutting down for a much-needed break, check all exterior doors, windows, and other entry points to ensure they are fully secured. Don’t forget to update any automated systems to keep everything locked down before your return to work.

7. Don’t Advertise That You Will Be Closed

It’s common practice to alert customers when your business will be closed so that you can avoid any inconvenience. However, this practice could put your business at risk during times of the year when the crime rate is elevated, including the holiday season. Instead of posting signage or on social media declaring that no one will be in the building for a certain period, it’s better to use an automated voice or email response to alert customers of your closing. This way, crime opportunists will be less tempted.

8. Check In on Your Backup Strategy

For years, the industry standard was the 3-2-1 backup strategy. A 3-2-1 strategy means having at least three total copies of your data, two of which are local but on different media, and at least one off-site copy (in the cloud). These days, the 3-2-1 backup strategy is table stakes: still necessary, but there are now even more advanced approaches. Consider a cyber resilience stance for your company.

9. Consider Cyber Insurance

Cyber insurance adoption rates are hard to track, but all data points to an increase in businesses getting coverage. Cyber insurance can cover everything from forensic post-breach reviews to litigation expenses. It also forces us all to review security policies and bring everything up to industry best practices.

10. Test Your Disaster Recovery Strategy

If you don’t have a disaster recovery strategy, this is the time to create one. If you do have one, this is also a great time to put it to the test. You should know going into the holidays that you can respond quickly and effectively should your company suffer a security breach.

Protecting Business Data During the Holidays

Here’s the secret eleventh tip: The best thing you can do for your security is, ironically, the same thing that cyber criminals do—to treat your employees as humans. Studies have shown that one the long-term costs of ransomware is actually employee stress. We can’t expect humans to be perfect, and a learning-based (versus punitive) approach will help you in two ways: you’ll be setting up processes with the real world in mind, and your employees won’t feel disincentivized to report incidents early and improve when they make mistakes in training (or even in the real world).

While it may be impossible to prevent all instances of data theft and cybercrime from happening, there are steps that companies can take to protect themselves. So, train, prepare, back up your data, and then celebrate knowing that you’ve done what you can.

The post 10 Holiday Security Tips for Your Business appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AI 101: Training vs. Inference

Stephanie Doyle — Thu, 09 Nov 2023 17:17:29 +0000

What do Sherlock Holmes and ChatGPT have in common? Inference, my dear Watson!

“We approached the case, you remember, with an absolutely blank mind, which is always an advantage. We had formed no theories. We were simply there to observe and to draw inferences from our observations.”
—Sir Arthur Conan Doyle, The Adventures of the Cardboard Box

As we all continue to refine our thinking around artificial intelligence (AI), it’s useful to define terminology that describes the various stages of building and using AI algorithms—namely, the AI training stage and the AI inference stage. As we see in the quote above, these are not new concepts: they’re based on ideas and methodologies that have been around since before Sherlock Holmes’ time.

If you’re using AI, building AI, or just curious about AI, it’s important to understand the difference between these two stages so you understand how data moves through an AI workflow. That’s what I’ll explain today.

The TL:DR

The difference between these two terms can be summed up fairly simply: first you train an AI algorithm, then your algorithm uses that training to make inferences from data. To create a whimsical analogy, when an algorithm is training, you can think of it like Watson—still learning how to observe and draw conclusions through inference. Once it’s trained, it’s an inferring machine, a.k.a. Sherlock Holmes.

Whimsy aside, let’s dig a little deeper into the tech behind AI training and AI inference, the differences between them, and why the distinction is important.

Obligatory Neural Network Recap

Neural networks have emerged as the brainpower behind AI, and a basic understanding of how they work is foundational when it comes to understanding AI.

Complex decisions, in theory, can be broken down into a series of yeses and nos, which means that they can be encoded in binary. Neural networks have the ability to combine enough of those smaller decisions, weigh how they affect each other, and then use that information to solve complex problems. And, because more complex decisions require more points of information to come to a final decision, they require more processing power. Neural networks are one of the most widely used approaches to AI and machine learning (ML).

What Is AI Training?: Understanding Hyperparameters and Parameters

In simple terms, training an AI algorithm is the process through which you take a base algorithm and then teach it how to make the correct decision. This process requires large amounts of data, and can include various degrees of human oversight. How much data you need has a relationship to the number of parameters you set for your algorithm as well as the complexity of a problem.

We made this handy dandy diagram to show you how data moves through the training process:

As you can see in this diagram, the end result is model data, which then gets saved in your data store for later use.

And hey—we’re leaving out a lot of nuance in that conversation because dataset size, parameter choice, etc. is a graduate-level topic on its own, and usually is considered proprietary information by the companies who are training an AI algorithm. It suffices to say that dataset size and number of parameters are both significant and have a relationship to each other, though it’s not a direct cause/effect relationship. And, both the number of parameters and the size of the dataset affect things like processing resources—but that conversation is outside of scope for this article (not to mention a hot topic in research).

As with everything, your use case determines your execution. Some types of tasks actually see excellent results with smaller datasets and more parameters, whereas others require more data and fewer parameters. Bringing it back to the real world, here’s a very cool graph showing how many parameters different AI systems have. Note that they very helpfully identified what type of task each system is designed to solve:

So, let’s talk about what parameters are with an example. Back in our very first AI 101 post, we talked about ways to frame an algorithm in simple terms:

Machine learning does not specify how much knowledge the bot you’re training starts with—any task can have more or fewer instructions. You could ask your friend to order dinner, or you could ask your friend to order you pasta from your favorite Italian place to be delivered at 7:30 p.m.

Both of those tasks you just asked your friend to complete are algorithms. The first algorithm requires your friend to make more decisions to execute the task at hand to your satisfaction, and they’ll do that by relying on their past experience of ordering dinner with you—remembering your preferences about restaurants, dishes, cost, and so on.

The factors that help your friend make a decision about dinner are called hyperparameters and parameters. Hyperparameters are those that frame the algorithm—they are set outside the training process, but can influence the training of the algorithms. In the example above, a hyperparameter would be how you structure your dinner feedback. Do you thumbs up or down each dish? Do you write a short review? You get the idea.

Parameters are factors that the algorithm derives through training. In the example above, that’s what time you prefer to eat dinner, which restaurants you enjoy after eating, and so on.

When you’ve trained a neural network, there will be heavier weights between various nodes. That’s a shorthand of saying that an algorithm will prefer a path it knows is significant, and if you want to really get nerdy with it, this article is well-researched, has a ton of math explainers for various training methods, and includes some fantastic visuals. For our purposes, here’s one way people visualize a “trained” algorithm:

Source.

The “dropout method” is essentially adding weight to the relationships an AI algorithm has found to be significant for the dataset it’s working on. It can then de-prioritize (or sometimes even eliminate) the other relationships.

Once you have a trained algorithm, then you can use it with a reasonable degree of certainty that it will give you good results, and that leads us to inference.

What Is AI Inference?

Once you’ve trained your algorithm, you can send it out in the world to do its job (and make yours easier). When you present a trained AI algorithm with a problem and it gives you an answer, that’s called inference. It’s using the way it was trained to draw conclusions or make predictions, depending on how it was built, and once an algorithm is in the “inference stage”, it’s no longer learning (usually).

Here’s our diagram for how data might move through an inference process:

As you can see, if we want to add more training data to the algorithm, we’d do so as a separate input/output flow. The goal of this process is to output data.

As we’ve seen in the real world, this does not mean that the AI model will give you a perfect answer every time—depending on, for example, the AI’s access to new or current information, the complexity of training, etc., the algorithm might give you a better or worse (more or less accurate) answer. But, it will be able to look at a new piece of information or dataset, and return a response.

When I asked our friend ChatGPT about this, it broke down inference and defined it in relation to the logical underpinnings of different types of algorithms. (As a mere human, how could I resist using inference while writing about inference?) Sure, ChatGPT may be merely mimicking human-like dialogue based on its ingestion of the whole internet, then giving me the most likely answer based on its specialized neural network that predicts the likelihood of the next word in response to my conversational query and without access to data after January 2022, but here’s what our AI friend had to say for itself:

Machine Learning Inference: In machine learning, after a model has been trained on a dataset, it is deployed to make predictions or classifications on new, unseen data. During inference, the model takes the input data, processes it, and produces an output or a prediction based on the patterns and relationships it learned during training. For example, a trained machine learning model might be used to classify images, predict stock prices, or identify spam emails.
Deep Learning Inference: Deep learning, a subfield of machine learning, involves neural networks with multiple layers. In deep learning, inference means feeding an input through a trained deep neural network and obtaining an output. This is commonly used in applications like image recognition, natural language processing, and speech recognition. Inference in deep learning can be computationally intensive, and specialized hardware, such as GPUs and TPUs, are often used to accelerate the process.
Rule-Based Systems Inference: In rule-based AI systems, inference is the process of applying logical rules to make decisions or draw conclusions. These systems use a set of predefined rules and facts to reason and infer new information. For example, in an expert system for medical diagnosis, inference involves applying a set of rules to patient symptoms and medical knowledge to arrive at a diagnosis.
Probabilistic Inference: In probabilistic AI, inference is used to estimate probabilities and uncertainty. Bayesian networks and Markov decision processes are examples of probabilistic models that perform inference to compute probabilities of events or make decisions based on probabilities.

You’ll notice that each of these are saying basically the same thing: the AI algorithm applies its decision-making paradigm to a problem.

Why Stop Learning During the Inference Stage?

In general, it’s important to keep these two stages—training and inference—of an AI algorithm separate for a few reasons:

Efficiency: Training is typically a computationally intensive process, whereas inference is usually faster and less resource-intensive. Separating them allows for efficient use of computational resources.
Generalization: The model’s ability to generalize from training data to unseen data is a key feature. It should not learn from every new piece of data it encounters during inference to maintain this generalization ability.
Reproducibility: When using trained models in production or applications, it’s important to have consistency and reproducibility in the results. If models were allowed to learn during inference, it would introduce variability and unpredictability in their behavior.

There are some specialized AI algorithms that want to continue learning during the inference stage—your Netflix algorithm is a good example, as are self-driving cars, or dynamic pricing models used to set airfare pricing. On the other hand, the majority of problems we’re trying to solve with AI algorithms deliver better decisions by separating these two phases—think of things like image recognition, language translation, or medical diagnosis, for example.

Training vs. Inference (But, Really: Training Then Inference)

To recap: the AI training stage is when you feed data into your learning algorithm to produce a model, and the AI inference stage is when your algorithm uses that training to make inferences from data. Here’s a chart for quick reference:

Training	Inference
Feed training data into a learning algorithm.	Apply the model to inference data.
Produces a model comprising code and data.	Produces output data.
One time(ish). Retraining is sometimes necessary.	Often continuous.

The difference may seem inconsequential at first glance, but defining these two stages helps to show implications for AI adoption particularly with businesses. That is, given that it’s much less resource intensive (and therefore, less expensive), it’s likely to be much easier for businesses to integrate already-trained AI algorithms with their existing systems.

And, as always, we’re big believers in demystifying terminology for discussion purposes. Let us know what you think in the comments, and feel free to let us know what you’re interested in learning about next.

The post AI 101: Training vs. Inference appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AI 101: Do the Dollars Make Sense?

Stephanie Doyle — Thu, 28 Sep 2023 16:04:26 +0000

Welcome back to AI 101, a series dedicated to breaking down the realities of artificial intelligence (AI). Previously we’ve defined artificial intelligence, deep learning (DL), and machine learning (ML) and dove into the types of processors that make AI possible. Today we’ll talk about one of the biggest limitations of AI adoption—how much it costs. Experts have already flagged that the significant investment necessary for AI can cause antitrust concerns and that AI is driving up costs in data centers.

To that end, we’ll talk about:

Factors that impact the cost of AI.
Some real numbers about the cost of AI components.
The AI tech stack and some of the industry solutions that have been built to serve it.
And, uncertainty.

Defining AI: Complexity and Cost Implications

While ChatGPT, DALL-E, and the like may be the most buzz-worthy of recent advancements, AI has already been a part of our daily lives for several years now. In addition to generative AI models, examples include virtual assistants like Siri and Google Home, fraud detection algorithms in banks, facial recognition software, URL threat analysis services, and so on.

That brings us to the first challenge when it comes to understanding the cost of AI: The type of AI you’re training—and how complex a problem you want it to solve—has a huge impact on the computing resources needed and the cost, both in the training and in the implementation phases. AI tasks are hungry in all ways: they need a lot of processing power, storage capacity, and specialized hardware. As you scale up or down in the complexity of the task you’re doing, there’s a huge range in the types of tools you need and their costs.

To understand the cost of AI, several other factors come into play as well, including:

Latency requirements: How fast does the AI need to make decisions? (e.g. that split second before a self-driving car slams on the brakes.)
Scope: Is the AI solving broad-based or limited questions? (e.g. the best way to organize this library vs. how many times is the word “cat” in this article.)
Actual human labor: How much oversight does it need? (e.g. does a human identify the cat in cat photos, or does the AI algorithm identify them?)
Adding data: When, how, and what quantity new data will need to be ingested to update information over time?

This is by no means an exhaustive list, but it gives you an idea of the considerations that can affect the kind of AI you’re building and, thus, what it might cost.

The Big Three AI Cost Drivers: Hardware, Storage, and Processing Power

In simple terms, you can break down the cost of running an AI to a few main components: hardware, storage, and processing power. That’s a little bit simplistic, and you’ll see some of these lines blur and expand as we get into the details of each category. But, for our purposes today, this is a good place to start to understand how much it costs to ask a bot to create a squirrel holding a cool guitar.

Still not quite there on the guitar. Or the squirrel. How much could this really cost?

First Things First: Hardware Costs

Running an AI takes specialized processors that can handle complex processing queries. We’re early in the game when it comes to picking a “winner” for specialized processors, but these days, the most common processor is a graphical processing unit (GPU), with Nvidia’s hardware and platform as an industry favorite and front-runner.

The most common “workhorse chip” of AI processing tasks, the Nvidia A100, starts at about $10,000 per chip, and a set of eight of the most advanced processing chips can cost about $300,000. When Elon Musk wanted to invest in his generative AI project, he reportedly bought 10,000 GPUs, which equates to an estimated value in the tens of millions of dollars. He’s gone on record as saying that AI chips can be harder to get than drugs.

Google offers folks the ability to rent their TPUs through the cloud starting at $1.20 per chip hour for on-demand service (less if you commit to a contract). Meanwhile, Intel released a sub-$100 USB stick with a full NPU that can plug into your personal laptop, and folks have created their own models at home with the help of open sourced developer toolkits. Here’s a guide to using them if you want to get in the game yourself.

Clearly, the spectrum for chips is vast—from under $100 to millions—and the landscape for chip producers is changing often, as is the strategy for monetizing those chips—which leads us to our next section.

Using Third Parties: Specialized Problems = Specialized Service Providers

Building AI is a challenge with so many moving parts that, in a business use case, you eventually confront the question of whether it’s more efficient to outsource it. It’s true of storage, and it’s definitely true of AI processing. You can already see one way Google answered that question above: create a network populated by their TPUs, then sell access.

Other companies specialize in broader or narrower parts of the AI creation and processing chain. Just to name a few, diverse companies: there’s Hugging Face, Inflection AI, and Vultr. Those companies have a wide array of product offerings and resources from open source communities like Hugging Face that provide a menu of models, datasets, no-code tools, and (frankly) rad developer experiments to bare metal servers like Vultr that enhance your compute resources. How resources are offered also exist on a spectrum, including proprietary company resources (i.e. Nvidia’s platform), open source communities (looking at you, Hugging Face), or a mix of the two.

A comic generated on Hugging Face’s AI Comic Factory.

This means that, whichever piece of the AI tech stack you’re considering, you have a high degree of flexibility when you’re deciding where and how much you want to customize and where and how to implement an out-of-the box solution.

Ballparking an estimate of what any of that costs would be so dependent on the particular model you want to build and the third-party solutions you choose that it doesn’t make sense to do so here. But, it suffices to say that there’s a pretty narrow field of folks who have the infrastructure capacity, the datasets, and the business need to create their own network. Usually it comes back to any combination of the following: whether you have existing infrastructure to leverage or are building from scratch, if you’re going to sell the solution to others, what control over research or dataset you have or want, how important privacy is and how you’re incorporating it into your products, how fast you need the model to make decisions, and so on.

Welcome to the Spotlight, Storage

And, hey, with all that, let’s not forget storage. At the most basic level of consideration, AI uses a ton of data. How much? Going knowledge says at least an order of magnitude more examples than the problem presented to train an AI model. That means you want 10 times more examples than parameters.

Parameters and Hyperparameters

The easiest way to think of parameters is to think of them as factors that control how an AI makes a decision. More parameters = more accuracy. And, just like our other AI terms, the term can be somewhat inconsistently applied. Here’s what ChatGPT has to say for itself:

That 10x number is just the amount of data you store for the initial training model—clearly the thing learns and grows, because we’re talking about AI.

Preserving both your initial training algorithm and your datasets can be incredibly useful, too. As we talked about before, the more complex an AI, the higher the likelihood that your model will surprise you. And, as many folks have pointed out, deciding whether to leverage an already-trained model or to build your own doesn’t have to be an either/or—oftentimes the best option is to fine-tune an existing model to your narrower purpose. In both cases, having your original training model stored can help you roll back and identify the changes over time.

The size of the dataset absolutely affects costs and processing times. The best example is that ChatGPT, everyone’s favorite model, has been rocking GPT-3 (or 3.5) instead of GPT-4 on the general public release because GPT-4, which works from a much larger, updated dataset than GPT-3, is too expensive to release to the wider public. It also returns results much more slowly than GPT-3.5, which means that our current love of instantaneous search results and image generation would need an adjustment.

And all of that is true because GPT-4 was updated with more information (by volume), more up-to-date information, and the model was given more parameters to take into account for responses. So, it has to both access more data per query and use more complex reasoning to make decisions. That said, it also reportedly has much better results.

Storage and Cost

What are the real numbers to store, say, a primary copy of an AI dataset? Well, it’s hard to estimate, but we can ballpark that, if you’re training a large AI model, you’re going to have at a minimum tens of gigabytes of data and, at a maximum, petabytes. OpenAI considers the size of its training database proprietary information, and we’ve found sources that cite that number as anywhere from 17GB to 570GB to 45TB of text data.

That’s not actually a ton of data, and, even taking the highest number, it would only cost $225 per month to store that data in Backblaze B2 (45TB * $5/TB/mo), for argument’s sake. But let’s say you’re training an AI on video to, say, make a robot vacuum that can navigate your room or recognize and identify human movement. Your training dataset could easily reach into petabyte scale (for reference, one petabyte would cost $5,000 per month in Backblaze B2). Some research shows that dataset size is trending up over time, though other folks point out that bigger is not always better.

On the other hand, if you’re the guy with the Intel Neural Compute stick we mentioned above and a Raspberry Pi, you’re talking the cost of the ~$100 AI processor, ~$50 for the Raspberry Pi, and any incidentals. You can choose to add external hard drives, network attached storage (NAS) devices, or even servers as you scale up.

Storage and Speed

Keep in mind that, in the above example, we’re only considering the cost of storing the primary dataset, and that’s not very accurate when thinking about how you’d be using your dataset. You’d also have to consider temporary storage for when you’re actually training the AI as your primary dataset is transformed by your AI algorithm, and nearly always you’re splitting your primary dataset into discrete parts and feeding those to your AI algorithm in stages—so each of those subsets would also be stored separately. And, in addition to needing a lot of storage, where you physically locate that storage makes a huge difference to how quickly tasks can be accomplished. In many cases, the difference is a matter of seconds, but there are some tasks that just can’t handle that delay—think of tasks like self-driving cars.

For huge data ingest periods such as training, you’re often talking about a compute process that’s assisted by powerful, and often specialized, supercomputers, with repeated passes over the same dataset. Having your data physically close to those supercomputers saves you huge amounts of time, which is pretty incredible when you consider that it breaks down to as little as milliseconds per task.

One way this problem is being solved is via caching, or creating temporary storage on the same chips (or motherboards) as the processor completing the task. Another solution is to keep the whole processing and storage cluster on-premises (at least while training), as you can see in the Microsoft-OpenAI setup or as you’ll often see in universities. And, unsurprisingly, you’ll also see edge computing solutions which endeavor to locate data physically close to the end user.

While there can be benefits to on-premises or co-located storage, having a way to quickly add more storage (and release it if no longer needed), means cloud storage is a powerful tool for a holistic AI storage architecture—and can help control costs.

And, as always, effective backup strategies require at least one off-site storage copy, and the easiest way to achieve that is via cloud storage. So, any way you slice it, you’re likely going to have cloud storage touch some part of your AI tech stack.

What Hardware, Processing, and Storage Have in Common: You Have to Power Them

Here’s the short version: any time you add complex compute + large amounts of data, you’re talking about a ton of money and a ton of power to keep everything running.

Just flip the switch, and you have AI. Source.

Fortunately for us, other folks have done the work of figuring out how much this all costs. This excellent article from SemiAnalysis goes deep on the total cost of powering searches and running generative AI models. The Washington Post cites Dylan Patel (also of SemiAnalysis) as estimating that a single chat with ChatGPT could cost up to 1,000 times as much as a simple Google search. Those costs include everything we’ve talked about above—the capital expenditures, data storage, and processing.

Consider this: Google spent several years putting off publicizing a frank accounting of their power usage. When they released numbers in 2011, they said that they use enough electricity to power 200,000 homes. And that was in 2011. There are widely varying claims for how much a single search costs, but even the most conservative say .03 Wh of energy. There are approximately 8.5 billion Google searches per day. (That’s just an incremental cost by the way—as in, how much does a single search cost in extra resources on top of how much the system that powers it costs.)

Power is a huge cost in operating data centers, even when you’re only talking about pure storage. One of the biggest single expenses that affects power usage is cooling systems. With high-compute workloads, and particularly with GPUs, the amount of work the processor is doing generates a ton more heat—which means more money in cooling costs, and more power consumed.

So, to Sum Up

When we’re talking about how much an AI costs, it’s not just about any single line item cost. If you decide to build and run your own models on-premises, you’re talking about huge capital expenditure and ongoing costs in data centers with high compute loads. If you want to build and train a model on your own USB stick and personal computer, that’s a different set of cost concerns.

And, if you’re talking about querying a generative AI from the comfort of your own computer, you’re still using a comparatively high amount of power somewhere down the line. We may spread that power cost across our national and international infrastructures, but it’s important to remember that it’s coming from somewhere—and that the bill comes due, somewhere along the way.

The post AI 101: Do the Dollars Make Sense? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AI 101: GPU vs. TPU vs. NPU

Stephanie Doyle — Wed, 02 Aug 2023 14:31:27 +0000

This article is part of an ongoing content arc about artificial intelligence (AI). The first article in the series is AI 101: How Cognitive Science and Computer Processors Create Artificial Intelligence. Stay tuned for the rest of the series, and feel free to suggest other articles you’d like to see on this content in the comments.

It’s no secret that artificial intelligence (AI) is driving innovation, particularly when it comes to processing data at scale. Machine learning (ML) and deep learning (DL) algorithms, designed to solve complex problems and self-learn over time, are exploding the possibilities of what computers are capable of.

As the problems we ask computers to solve get more complex, there’s also an unavoidable, explosive growth in the number of processes they run. This growth has led to the rise of specialized processors and a whole host of new acronyms.

Joining the ranks of central processing units (CPUs), which you may already be familiar with, are neural processing units (NPUs), graphics processing units (GPUs), and tensile processing units (TPUs).

So, let’s dig in to understand how some of these specialized processors work, and how they’re different from each other. If you’re still with me after that, stick around for an IT history lesson. I’ll get into some of the more technical concepts about the combination of hardware and software developments in the last 100 or so years.

Central Processing Unit (CPU): The OG

Think of the CPU as the general of your computer. There are two main parts of a CPU, an arithmetic-logic unit (ALU) and a control unit. An ALU allows arithmetic (add, subtract, etc.) and logic (AND, OR, NOT, etc.) operations to be carried out. The control unit controls the ALU, memory, and input/output (IO) functions, which tells them how to respond to the program that’s just been read from the memory.

The best way to track what the CPU does is to think of it as an input/output flow. The CPU will take the request (input), access the memory of the computer for instructions on how to perform that task, delegate the execution to either its own ALUs or another specialized processor, take all that data back into its control unit, then take a single, unified action (output).

For a visual, this is the the circuitry map for an ALU from 1970:

From our good friends at Texas Instruments: the combinational logic circuitry of the 74181 integrated circuit, an early four-bit ALU. Image source.

But, more importantly, here’s a logic map about what a CPU does:

Image source.

CPUs have gotten more powerful over the years as we’ve moved from single-core processors to multicore processors. Basically, there are several ALUs executing tasks that are being managed by the CPU’s control unit, and they perform tasks in parallel. That means that it works well in combination with specialized AI processors like GPUs.

The Rise of Specialized Processors

When a computer is given a task, the first thing the processor has to do is communicate with the memory, including program memory (ROM)—designed for more fixed tasks like startup—and data memory (RAM)—designed for things that change more often like loading applications, editing a document, and browsing the internet. The thing that allows these elements to talk is called the bus, and it can only access one of the two types of memory at one time.

In the past, processors ran more slowly than memory access, but that’s changed as processors have gotten more sophisticated. Now, when CPUs are asked to do a bunch of processes on large amounts of data, the CPU ends up waiting for memory access because of traffic on the bus. In addition to slower processing, it also uses a ton of energy. Folks in computing call this the Von Neumann bottleneck, and as compute tasks like those for AI have become more complex, we’ve had to work out ways to solve this problem.

One option is to create chips that are optimized to specific tasks. Specialized chips are designed to solve the processing difficulties machine learning algorithms present to CPUs. In the race to create the best AI processor, big players like Google, IBM, Microsoft, and Nvidia have solved this with specialized processors that can execute more logical queries (and thus more complex logic). They achieve this in a few different ways. So, let’s talk about what that looks like: What are GPUs, TPUs, and NPUs?

Graphics Processing Unit (GPU)

GPUs started out as specialized graphics processors and are often conflated with graphics cards (which have a bit more hardware to them). GPUs were designed to support massive amounts of parallel processing, and they work in tandem with CPUs, either fully integrated on the main motherboard, or, for heavier loads, on their own dedicated piece of hardware. They also use a ton of energy and thus generate heat.

GPUs have long been used in gaming, and it wasn’t until the 2000s that folks started using them for general computing—thanks to Nvidia. Nvidia certainly designs chips, of course, but they also introduced a proprietary platform called CUDA that allows programmers to have direct access to a GPU’s virtual instruction set and parallel computational elements. This means that you can set up compute kernels, or clusters of processors that work together and are ideally suited to specific tasks, without taxing the rest of your resources. Here’s a great diagram that shows the workflow:

Image source.

This made GPUs wildly applicable for machine learning tasks, and they benefited from the fact that they leveraged existing, well-known processes. What we mean by that is: oftentimes when you’re researching solutions, the solution that wins is not always the “best” one based on pure execution. If you’re introducing something that has to (for example) fundamentally change consumer behavior, or that requires everyone to relearn a skill, you’re going to have resistance to adoption. So, GPUs playing nice with existing systems, programming languages, etc. aided wide adoption. They’re not quite plug-and-play, but you get the gist.

As time has gone on, there are now also open source platforms that support GPUs that are supported by heavy-hitting industry players (including Nvidia). The largest of these is OpenCL. And, folks have added tensor cores, which this article does a fabulous job of explaining.

Tensor Processing Unit (TPU)

Great news: the TL:DR of this acronym boils down to: It’s Google’s proprietary AI processor. They started using them in their own data centers in 2015, released them to the public in 2016, and there are some commercially available models. They run on ASICs (hard-etched chips I’ll talk more about later) and Google’s TensorFlow software.

Compared with GPUs, they’re specifically designed to have slightly lower precision, which makes sense given that this makes them more flexible to different types of workloads. I think Google themselves sum it up best:

If it’s raining outside, you probably don’t need to know exactly how many droplets of water are falling per second—you just wonder whether it’s raining lightly or heavily. Similarly, neural network predictions often don’t require the precision of floating point calculations with 32-bit or even 16-bit numbers. With some effort, you may be able to use 8-bit integers to calculate a neural network prediction and still maintain the appropriate level of accuracy.
Google Cloud Blog

GPUs, on the other hand, were originally designed for graphics processing and rendering, which relies on each point’s relationship to each other to create a readable image—if you have less accuracy in those points, you amplify that in their vectors, and then you end up with Playstation 2 Spyro instead of Playstation 4 Spyro.

Image source.

Another important design choice that deviates from CPUs and GPUs is that TPUs are designed around a systolic array. Systolic arrays create a network of processors that are each computing a partial task, then sending it along to the next node until you reach the end of the line. Each node is usually fixed and identical, but the program that runs between them is programmable. It’s called a data processing unit (DPU).

Neural Processing Unit (NPU)

“NPU” is sometimes used as the category name for all specialized AI processors, but it’s more often specifically applied to those designed for mobile devices. Just for confusion’s sake, note that Samsung also refers to its proprietary chipsets as NPU.

NPUs contain all the necessary information to complete AI processing, and they run on a principle of synaptic weight. Synaptic weight is a term adapted from biology which describes the strength of connection between two neurons. Simply put, in our bodies if two neurons find themselves sharing information more often, the connection between them becomes literally stronger, making it easier for energy to pass between them. At the end of the day, that makes it easier for you to do something. (Wow, the science between habit forming makes a lot more sense now.) Many neural networks mimic this.

When we say AI algorithms learn, this is one of the ways—they track likely possibilities over time, and give more weight to that connected node. The impact is huge when it comes to power consumption. Parallel processing runs each task next to each other, but isn’t great at accounting for the completion of tasks, especially as your architecture scales and processing units might be more separate.

Quick Refresh: Neural Networks and Decision Making in Computers

As we discuss in AI 101, when you’re thinking about the process of making a decision, what you see is that you’re actually making many decisions in a series, and often the things you’re considering before you reach your final decision affect the eventual outcome. Since computers are designed on a strict binary, they’re not “naturally” suited to contextualizing information in order to make better decisions. Neural networks are the solution. They’re based on matrix math, and they look like this:

Image source.

Basically, you’re asking a computer to have each potential decision check in with all the other possibilities, to weigh the outcome, and to learn from their own experience and sensory information. That all translates to more calculations being run at one time.

Recapping the Key Differences

That was a lot. Here’s a summary:

Functionality: GPUs were developed for graphics rendering, while TPUs and NPUs are purpose-built for AI/ML workloads.
Parallelism: GPUs are made for parallel processing, ideal for training complex neural networks. TPUs take this specialization further, focusing on tensor operations to achieve higher speeds and energy efficiencies.
Customization: TPUs and NPUs are more specialized and customized for AI tasks, while GPUs offer a more general-purpose approach suitable for various compute workloads.
Use Cases: GPUs are commonly used in data centers and workstations for AI research and training. TPUs are extensively utilized in Google’s cloud infrastructure, and NPUs are prevalent in AI-enabled devices like smartphones and Internet of Things (IoT) gadgets.
Availability: GPUs are widely available from various manufacturers and accessible to researchers, developers, and hobbyists. TPUs are exclusive to Google Cloud services, and NPUs are integrated into specific devices.

Do the Differences Matter?

The definitions of the different processors start to sound pretty similar after a while. A multicore processor combines multiple ALUs under a central control unit. A GPU combines more ALUs under a specialized processor. A TPU combines multiple compute nodes under a DPU, which is analogous to a CPU.

At the end of the day, there’s some nuance about the different design choices between processors, but their impact is truly seen at scale versus at the consumer level. Specialized processors can handle larger datasets more efficiently, which translates to faster processing using less electrical power (though our net power usage may go up as we use AI tools more).

It’s also important to note that these are new and changing terms in a new and changing landscape. Google’s TPU was announced in 2015, just eight years ago. I can’t count the amount of conversations I’ve had that end in a hyperbolic impression of what AI is going to do for/to the world, and that’s largely because people think that there’s no limit to what it is.

But, the innovations that make AI possible were created by real people. (Though, maybe AIs will start coding themselves, who knows.) And, chips that power AI are real things—a piece of silicon that comes from the ground and is processed in a lab. Wrapping our heads around what those physical realities are, what challenges we had to overcome, and how they were solved, can help us understand how we can use these tools more effectively—and do more cool stuff in the future.

Bonus Content: A Bit of a History of the Hardware

Which brings me to our history lesson. In order to more deeply understand our topic today, you have to know a little bit about how computers are physically built. The most fundamental language of computers is binary c o de, represented as a series of 0s and 1s. Those values correspond to whether a circuit is open or closed, respectively. When a circuit is closed, you cannot push power through it. When it’s open, you can. Transistors regulate current flow, generate electrical signals, and act as a switch or gate. You can connect lots of transistors with circuitry to create an integrated circuit chip.

The combination of open and closed patterns of transistors can be read by your computer. As you add more transistors, you’re able to express more and more numbers in binary code. You can see how this influences the basic foundations of computing in how we measure bits and bytes. Eight transistors store one byte of data: two possibilities for each of the eight transistors, and then every possible combination of those possibilities (2^8) = 256 possible combinations of open/closed gates (bits), so 8 bits = one byte, which can represent any number between 0 and 255.

Transistors combining to create logic. You need a bunch of these to run a program. Image source.

Improvements in reducing transistor size and increasing transistor density on a single chip has led to improvements in capacity, speed, and power consumption, largely due to our ability to purify semiconductor materials, leverage more sophisticated tools like chemical etching, and improve clean room technology. That all started with the integrated circuit chip.

Integrated circuit chips were invented around 1958, fueled by the discoveries of a few different people who solved different challenges nearly simultaneously. Jack Kilby of Texas Instruments created a hybrid integrated circuit measuring about 7/16” by 1/16” (11.1 mm by 1.6 mm). Robert Noyce (eventual co-founder of Intel) went on to create the first monolithic integrated circuit chip (so, all circuits held on the same chip) and it was around the same size. Here’s a blown-up version of it, held by Noyce:

Image source.

Note those first chips only held about 60 transistors. Current chips can have billions of transistors etched onto the same microchip, and are even smaller. Here’s an example of what a integrated circuit looks like when it’s exposed:

Image source.

And, for reference, that’s about this big:

Image source.

And, that, folks, is one of the reasons you can now have a whole computer in your pocket in the guise of a smartphone. As you can imagine, something the size of a modern laptop or rack-mounted server can combine more of these elements more effectively. Hence, the rise of AI.

One More Acronym: What are FPGAs?

So far, I’ve described fixed, physical points on a chip, but chip performance is also affected by software. Software represents the logic and instructions for how all these things work together. So, when you create a chip, you have two options: you either know what software you’re going to run and create a customized chip that supports that, or you get a chip that acts like a blank slate and can be reprogrammed based on what you need.

The first method is called application-specific integrated circuits (ASIC). However, just like any proprietary build in manufacturing, you need to build them at scale for them to be profitable, and they’re slower to produce. Both CPUs and GPUs typically run on hard-etched chips like this.

Reprogrammable chips are known as field-programmable gate arrays (FPGA). They’re flexible and come with a variety of standard interfaces for developers. That means they’re incredibly valuable for AI applications, and particularly deep learning algorithms—as things rapidly advance, FPGAs can be continuously reprogrammed with multiple functions on the same chip, which lets developers test, iterate, and deliver them to market quickly. This flexibility is most notable in that you can also reprogram things like the input/output (IO) interface, so you can reduce latency and overcome bottlenecks. For that reason, folks will often compare the efficacy of the whole class of ASIC-based processors (CPUs, GPUs, NPUs, TPUs) to FPGAs, which, of course, has also led to hybrid solutions.

Summing It All Up: Chip Technology is Rad

Improvements in materials science and microchip construction laid the foundation for providing the processing capacity required by AI, and big players in the industry (Nvidia, Intel, Google, Microsoft, etc.) have leveraged those chips to create specialized processors.

Simultaneously, software has allowed many processing cores to be networked in order to control and distribute processing loads for increased speeds. All that has led us to the rise in specialized chips that enable the massive demands of AI.

Hopefully you have a better understanding of the different chipsets out there, how they work, and the difference between them. Still have questions? Let us know in the comments.

The post AI 101: GPU vs. TPU vs. NPU appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Guide to How to Wipe a Mac or Macbook Clean

Stephanie Doyle — Thu, 27 Jul 2023 16:48:37 +0000

This post was originally published in 2016. We’re sharing an update to this post to provide the latest information on how to wipe your Mac.

You’re about to upgrade your Mac. Maybe you want to sell it or trade it in, and maybe you’re just throwing it out—either way, you likely still have plenty of personal data on your old computer. Getting rid of that data isn’t straightforward, and it is important. Sure, you could live out the famous printer destruction scene from the movie “Office Space” and smash the computer to pieces. As satisfying as that might be, there are better ways to wipe your Mac clean.

While there used to be two separate processes for wiping your Mac clean based on whether your computer had a hard disk drive (HDD) or a solid state drive (SSD), instructions for how to wipe your Mac are now based on your Mac’s processing chip—a Mac or an Intel-based chip.

Do You Need to Know What Type of Drive You Have?

Around 2010, Apple started moving to only SSD storage in many of its devices. That said, some Mac desktop computers continued to offer the option of both SSD and HDD storage until 2020, a setup they called a Fusion Drive. The Fusion Drive is not to be confused with flash storage, a term that refers to the internal storage that holds your readily available and most accessed data at lower power settings.

Note that as of November 2021, Apple does not offer any Macs with a Fusion Drive. Basically, if you bought your device before 2010 or you have a desktop computer from 2021 or earlier, there’s a chance you may be using an HDD.

The good news here is twofold. First, it’s pretty simple to figure out what kind of drive you have, and we’ll detail those steps below (just in case you’re one of those HDD holdouts). Second, Mac’s Help Center directions to wipe your Mac are bifurcated not around your drive type, but around what internal performance chip you’re using (Mac or Intel). Over the years, updates to the Mac operating system (macOS, or OS for general purposes) have made it much easier to wipe your Mac clean, but if you have an older OS, you may have to follow slightly different instructions.

HDDs and SSDs: What’s The Difference?

There are good reasons that Apple switched to using mostly SSDs, and good reasons they kept HDDs around for as long as they did as well. If you want to know more about the differences in drive types, check out Hard Disk Drive (HDD) vs. Solid State Drive (SSD): What’s the Difference?

So, What Kind of Drive Do You Have?

To determine what kind of drive your Mac uses, click on the Apple menu and select About This Mac.

Avoid the pitfall of selecting the Storage tab in the top menu. What you’ll find is that the default name of your drive is “Macintosh HD” which is confusing, given that they’re referring to the internal storage of the computer as a hard drive when (in most cases), your drive is an SSD. While you can find information about your drive on this screen, we prefer the method that provides maximum clarity.

So, on the Overview screen, click System Report. Bonus: You’ll also see what type of processor you have and your macOS version (which will be useful later).

Once there, select the Storage tab, then the volume name you want to identify. You should see a line called Medium Type, which will tell you what kind of drive you have.

Storage screen. " class="wp-image-109322" srcset="https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_2.png 1866w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_2-300x154.png 300w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_2-1024x527.png 1024w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_2-768x395.png 768w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_2-1536x790.png 1536w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_2-560x288.png 560w" sizes="(max-width: 1866px) 100vw, 1866px" />

Identify Your Processing Chip

In November 2020, Apple launched its first Macs equipped with M1 chips, replacing the Intel-based processors of the past. The evolution of the M-series Apple chips has been notable largely for performance enhancements, but given that (at the time of publishing) this was only three years ago, there’s a good chance that many users will have an Intel processor.

To see what kind of chip you have, follow the same instructions as above—go to your Apple menu and select About This Mac. If you have an M-series chip, you’ll see that listed as marked in the screenshot below.

overview page. " class="wp-image-109321" srcset="https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_3.png 1196w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_3-300x170.png 300w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_3-1024x580.png 1024w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_3-768x435.png 768w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_3-560x317.png 560w" sizes="(max-width: 1196px) 100vw, 1196px" />

If you have an Intel-based Mac, you will see Processor, followed by the name of an Intel processor.

Overview pane on an Intel-based Mac. " class="wp-image-109314" srcset="https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_4.png 2456w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_4-300x183.png 300w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_4-1024x625.png 1024w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_4-768x468.png 768w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_4-1536x937.png 1536w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_4-2048x1249.png 2048w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_4-560x342.png 560w" sizes="(max-width: 2456px) 100vw, 2456px" />

Now You Need to Know Your Mac OS

Great news! If you’re running Mac OS Monterey or later, it’s super easy to erase your Mac. Of course, you’ll have seen your current OS in our favorite About This Mac screen, but below is a list of all OS releases you can compare against, as well as the Apple Help article on the topic.

One Last Thing Before You Get Started—And It’s Crucial

Before you get started, you’ll want to make sure any important data on your hard drive has been backed up. The Apple OS has a built-in backup capability called Time Machine backup software.

While Time Machine is a good start, it doesn’t fulfill the requirements of a 3-2-1 backup strategy. And (as we all know) Apple devices work best with other Apple devices—so if you want to point your Time Machine backups to a non-Apple network device, you’ll have some creative setup to do. Ideally, you’d pair Time Machine with a product like Backblaze Personal Backup for maximum flexibility and cyber resilience. Note that even though backup runs on a schedule, we recommend hitting the manual backup button before you wipe your Mac to ensure you’ve got the most recent information.

How to Wipe Your Mac…Can Be Slightly Different Based on Your Computer

Once you’ve verified your data is backed up, roll up your sleeves and get to work. The key here is macOS Recovery—a part of the Mac operating system since OS 10.7 Lion. You can use the apps in macOS Recovery on a Mac with an Apple processing chip to repair your internal storage device, reinstall macOS, restore your files from a Time Machine backup, set the security policy for different volumes, transfer files between two Mac computers, start up in safe mode, and more.

Okay, so now that you know your operating system, processing chip, and drive type, we can get to the actual how-to of how to wipe your Mac. The steps will be slightly different based on each of the above variables. Let’s dig in.

Wipe a Mac With an Apple Chip and a Recent macOS Update

Assuming you’re rocking a recent macOS update, then you’re going to wipe your Mac using the Erase All Content and Settings function. (You might also see this called the Erase Assistant in Apple’s Help articles.) This will delete all your data, iCloud and Apple logins, Apple wallet information, Bluetooth pairings, fingerprint sensor profiles, and Find My Mac settings, as well as resetting your Mac to factory settings. Here’s how you find it.

If you have macOS Ventura:

Select the Apple menu.
Choose System Settings.
Click General in the sidebar.
Click Transfer or Reset on the right.
Click Erase all Content and Settings.

General screen in a computer running Mac operating system Ventura. " class="wp-image-109318" srcset="https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_6.png 1632w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_6-300x258.png 300w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_6-1024x881.png 1024w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_6-768x661.png 768w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_6-1536x1321.png 1536w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_6-560x482.png 560w" sizes="(max-width: 1632px) 100vw, 1632px" />

If you have macOS Monterey:

Select the Apple Menu.
Choose System Preferences.
Once the System Preferences window is open, select the dropdown menu in your top navigation bar. Then, select Erase All Content and Settings.

Erase All Content And Settings in a computer running Mac operating system Monterey. " class="wp-image-109319" srcset="https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_7.png 894w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_7-300x276.png 300w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_7-768x706.png 768w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_7-560x515.png 560w" sizes="(max-width: 894px) 100vw, 894px" />

After you’ve done that, then the steps will be the same for each process. Here’s what to expect.

You’ll be prompted to log in with your administrator credentials.
Next, you will be reminded to back up via Time Machine. Remember that if you choose this option, you’ll want to back up to an external device or cloud storage—because, of course, you’re about to get rid of all the data on this computer.
Click Continue to allow all your settings, data, accounts, etc. to be removed.

If you’re asked to sign out of Apple ID, enter your Apple password and hit Continue.
Click Erase all Content & Settings to confirm.

Your Mac will automatically restart. If you have an accessory like a Bluetooth keyboard, you’ll be prompted to reconnect that device.
Select a WiFi network or attach a network cable.
After joining a network, your Mac activates. Click Restart.
After your device has restarted, a setup assistant will launch (just like when you first got your Mac).

It’ll be pretty clear if you don’t meet the conditions to erase your drive using this method because you won’t see Erase All Content and Settings on the System Settings we showed you above. So, here are instructions for the other methods.

How to Wipe a Mac With an Apple Chip Using Disk Utility

Disk Utility is exactly what it sounds like: a Mac system application that helps you to manage your various storage volumes. You’d use it to manage storage if you have additional storage volumes, like a network attached storage (NAS) device or external hard drive; to set up a partition on your drive; to create a disk image (basically, a backup); or to simply give your disks a check up if they’re acting funky.

You can access Disk Utility at any time by selecting Finder > Go > Utilities, but you can also trigger Disk Utility on startup as outlined below.

Turn on your Mac and continue to press and hold the power button until the startup options window comes up. Click Options, then click Continue.
You may be prompted to login with either your administrative password or your Apple ID.
When the Utilities window appears, select Disk Utility and hit Continue.

Disk Utility on a Mac computer. " class="wp-image-109313" srcset="https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_10.png 936w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_10-300x275.png 300w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_10-768x704.png 768w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_10-560x513.png 560w" sizes="(max-width: 936px) 100vw, 936px" />

If you’d previously added other drives to your startup disk, click the delete volume button (–) to erase them.
Then, choose Macintosh HD in the sidebar.
Click the Erase button, then select a file system format and enter a name for it. For Macs with an M1 chip, your option for a file system format is only APFS.
Click Erase or, if it’s an option, Erase Volume Group. You may be asked for your Apple ID at this point.
You’ll be prompted to confirm your choice, then your computer will restart.
Just as in the other steps, when the computer restarts, it will attempt to activate by connecting to WiFi or asking you to attach a network cable.
After it activates, select Exit to Recovery Utilities.

Once it’s done, the Mac’s hard drive will be clean as a whistle and ready for its next adventure: a fresh installation of the macOS, being donated to a relative or a local charity, or just sent to an e-waste facility. Of course, you can still drill a hole in your disk or smash it with a sledgehammer if it makes you happy, but now you know how to wipe the data from your old computer with much less ruckus.

How To Wipe a Mac With an Intel Processor Using Disk Utility

Last but not least, let’s talk about how to wipe an Intel-based Mac. (Fusion Drives fall into this category as well.)

Starting with your Mac turned off, press the power button, then immediately hold down the command (⌘) and R keys and wait until the Apple logo appears. This will launch macOS Recovery.
You may be prompted to log in with an administrator account password.
When the Recovery window appears, select Disk Utility.
In the sidebar, choose Macintosh HD.
Click the Erase button, then select a file system format and enter a name for it. Your options for a file system format include Apple File System (APFS), which is the file system used by macOS 10.13 or later, and macOS Extended, which is the file system used by macOS 10.12 or earlier.
Click Erase or Erase Volume Group. You may be prompted to provide your Apple ID.
If you previously used Disk Utility to add other storage volumes, you can erase them individually using the process above.
When you’ve deleted all your drives, quit Disk Utility to return to the utilities window. You may also choose to restart your computer at this point.

Securely Erasing Drives: Questions and Considerations

Some of you drive experts out there might remember that there is some nuance to security when it comes to erasing drives, and that there are differences in erasing HDDs versus SSDs. Without detouring into the nuances of why and how that’s the case, just know that on Fusion Drives or Intel-based Macs, you may see additional Security Options you can enable when erasing HDDs.

There are four options in the “Security Options” slider. “Fastest” is quick but insecure—data could potentially be rebuilt using a file recovery app. Moving that slider to the right introduces progressively more secure erasing. Disk Utility’s most secure level erases the information used to access the files on your disk, then writes zeros across the disk surface seven times to help remove any trace of what was there. This setting conforms to the DoD 5220.22-M specification. Bear in mind that the more secure method you select, the longer it will take. The most secure methods can add hours to the process. For peace of mind, we suggest choosing the most secure option to erase your hard drive. You can always start the process in the evening and let it run overnight.

After the process is complete, restart your Mac and see if you can find any data. A quick inspection is not foolproof, but it can provide some peace of mind that the process finished without an interruption.

Securely Erasing SSDs and Why Not To

If your Mac comes equipped with an SSD, Apple’s Disk Utility software won’t actually let you zero the drive. Sounds strange, right? Apple’s online Knowledge Base explains that secure erase options are not available in Disk Utility for SSDs.

Fortunately, you are not restricted to using the standard erasure option to protect yourself. Instead, you can use FileVault, a capability built into the operating system.

FileVault Keeps Your Data Safe

FileVault is an excellent option to protect all of the data on a Mac SSD with encryption. FileVault is whole-disk encryption for the Mac. With FileVault engaged, you need a password to access the information on your hard drive. Even without it, your data is encrypted and it would be very difficult for anybody else to access.

Before you use FileVault, there is a crucial downside. If you lose your password or the encryption key, your data may be gone for good!

When you first set up a new Mac, you’re given the option of turning FileVault on. If you don’t do it then, you can turn on FileVault at any time by clicking on your Mac’s System Preferences, clicking on Security & Privacy, and clicking on the FileVault tab. Be warned, however, that the initial encryption process can take hours, as will decryption if you ever need to turn FileVault off.

Privacy and Security on a Mac computer. " class="wp-image-109312" style="width:558px;height:481px" width="558" height="481" srcset="https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_11.png 1486w, https://www.backblaze.com/blog/wp-content/uploads/2021/08/Wipe-a-Mac_11-300x259.png 300w" sizes="(max-width: 558px) 100vw, 558px" />

With FileVault turned on, you can restart your Mac into its Recovery System following the directions above and erase your hard drive using Disk Utility, once you’ve unlocked it (by selecting the disk, clicking the File menu, and clicking Unlock). That deletes the FileVault key, which means any data on the drive is useless.

Nowadays, most Macs manage disk encryption through the T2 chip and its Secure Enclave, which is entirely separate from the main computer itself. This is why FileVault has no CPU overhead—it’s all handled by the T2 chip. Although FileVault doesn’t impact the performance of most modern Macs, we’d suggest only using it if your Mac has an SSD, not a conventional HDD.

Securely Erasing Free Space on Your SSD

If you don’t want to take Apple’s word for it, if you’re not using FileVault, or if you just want to, there is a way to securely erase free space on your SSD. It’s a little more involved, but it works. Before we get into the nitty-gritty, let me state for the record that this really isn’t necessary to do, which is why Apple’s made it so hard to do.

To delete all data from an SSD on an Apple computer, use Apple’s Terminal app. Terminal provides you with command line interface (CLI) access to the OS X operating system. Terminal lives in the Utilities folder, but you can access Terminal from the Mac’s Recovery System. Once your Mac has booted into the Recovery partition, click the Utilities menu and launch Terminal.

From a Terminal command line, type the following:

diskutil secureErase freespace VALUE /Volumes/DRIVE

That tells your Mac to securely erase the free space on your SSD. You’ll need to change VALUE to a number between 0 and 4. Zero is a single-pass run of zeroes, 1 is a single-pass run of random numbers, 2 is a seven-pass erase, 3 is a 35-pass erase. Finally, level 4 is a three-pass erase with random fills plus a final zero fill. DRIVE should be changed to the name of your hard drive. To run a seven-pass erase of your SSD drive in JohnB-MacBook, you would enter the following:

diskutil secureErase freespace 2 /Volumes/JohnB-MacBook

Note that while Mac’s Terminal typically uses forward slashes ( / ), if you have as space in the name of your hard drive, you’ll see a backslash ( \ ) to indicate that break in syntax. (So “Macintosh HD” becomes /Macintosh\ HD.) For example, to run a 35-pass erase on a hard drive called Macintosh HD, enter the following:

diskutil secureErase freespace 3 /Volumes/Macintosh\ HD

If you’re like the majority of computer users, you’ve never opened your Terminal application—and that’s probably a good thing. If you’re providing the proper instructions, a CLI lets you directly edit the guts of your computer. If you’re not providing the proper instructions, things will just error out, and likely you won’t know why. All this to say: Apple has made specific choices about designing products for folks who aren’t computer experts. Sometimes it limits how customizable you can get on your device (i.e. it’s super hard to zero out an SSD), but usually it’s for good reason—in this case, it’s to preserve the health of your drive in the long term.

When Erasing Is Not Enough: How To Destroy a Drive

If you absolutely, positively must be sure that all the data on a drive is irretrievable, see this Scientific American article (with contributions by Gleb Budman, Backblaze CEO), How to Destroy a Hard Drive—Permanently.

Since you’re interested in SSDs, you might enjoy reading other posts in our SSD 101 series. And if you’d like to learn how to securely erase a Windows PC HDD or SSD, take a look at our guide here.

The post Guide to How to Wipe a Mac or Macbook Clean appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Fire Works (or Does It?): How to Destroy Your Drives

Stephanie Doyle — Tue, 04 Jul 2023 16:35:00 +0000

Erasing data from old devices is important, but it doesn’t have to be boring. Sure, you could just encrypt the data, wipe your drive, and so on, but you can also physically destroy a drive in a myriad of exciting ways. In honor of the United State’s favorite day to celebrate with explosives, let’s talk about not-so-standard ways to get rid of old data (permanently).

Know Your Device

Effective data destruction starts with good planning. When you’re looking at how to securely erase your data, there are different options for hard disk drives (HDDs) and solid state drives (SSDs).

With an HDD, spinning disks are encased in a steel enclosure. In order to do sufficient levels of damage, it’s helpful to get through this steel layer first. Once you’re in, you can drill holes in it, wash it in acid, or shred it.

With an SSD, it’s not just recommended to get through that steel layer, it’s almost essential. SSDs are more resilient because data is stored magnetically. So, pull out that screwdriver, shuck that drive like an oyster, and expose your SSD. If you’re going the physical destruction route, make sure that you’re shredding with a narrow enough width that no forensic scientist can humpty-dumpty your data together again.

Have a Blast

We do have a Sr. Infrastructure Software Engineer who’s gone on record recommending explosives. Note that while we don’t doubt the efficacy, we can’t recommend this option. On the other hand, we’re big fans of bots that smash things.

Destroy Responsibly

We could be accused of overcomplicating things. It’s very effective to wipe your device, or just encrypt your data. Here’s a list of some more extensive articles on the subject that include those options:

But, if you want more peace of mind that the data isn’t coming back—maybe you’re one of the protagonists of Dead to Me?—destroy responsibly.

The post Fire Works (or Does It?): How to Destroy Your Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.