Hoarders anonymous for unstructured data

Are you a hoarder?

Do you have a storage area where you keep items you rarely or never use? Do you periodically sort out your stuff, or do you keep collecting items of debatable value? I think it's human nature to hold onto things we never use for extended periods of time.

It's like the junk drawer everybody has somewhere in their house that holds a collection of random items we think are important to keep, but we usually take the same couple of items out every time we open the drawer. It's called a "junk drawer" for a reason - because it's mostly junk, and most of it should be tossed in the trash. So why do we keep these items of questionable value or use?

In the American TV show, Hoarding: Buried Alive you notice that it's hard to identify the items of value mixed in with the piles and piles of trash. It's also difficult to identify who in the house owns a particular item so that you can ask them why they even have it, or get permission to remove it. In almost all cases, they don't use half the stuff they own and have forgotten that they even have it.

I can't help but draw parallels on how much this is like unstructured data - tons, and tons of PDFs, Word docs, spreadsheets, PowerPoint presentations, audio files, picture files, and pretty much anything else that doesn't live in a "structured" database - and how people in most organisations just keep collecting it. (Structured data refers to information that can be organised and is searchable with simple search engine algorithms.)

The IT team, tasked with getting their arms around it for compliance and governance purposes, has no idea who owns it, whether it's important or if anyone is even using it - and for these reasons, they are afraid to delete it. In the meantime, the line-of-business people continue to create and hoard more data - year after year - storing it in places (e.g. SharePoint, Dropbox, SendSpace, etc.) that IT has no control over, and probably does not even know about.

Hoarding is dangerous

As with physical hoarding, where some of the junk found at the bottom of the pile isn't just annoying, but is downright dangerous - mould, broken items and etc. - hoarded unstructured data including confidential Intellectual Property (IP) such as literary and artistic work and designs, customer data, medical records, personal identifiable information, etc. can fall into this "dangerous" category.

All organisations have unstructured data, but most are collecting and maintaining large volumes of orphaned data - so called because it has no owner. Without an owner to determine its importance, lifespan or who should have access to it, we default to keeping this data of unknown value forever. I have seen organisations go through a consolidation or migration, or move data centres, and they take all of the unassigned, unstructured mess along with them.

No one wants to become a hoarder, but how do things get so messy? Most organisations have a poor governance process for granting access to unstructured data. Because they are unable to determine who owns the data, they continue to collect more orphaned files in repositories like file shares and group folders. I've found that many companies will admit to having a problem managing unstructured data, but they seem to accept it as a cost of doing business.

I am convinced that the current acceptance of this problem with unstructured data is the corporate version of household hoarding. IT doesn't know what is out there, whether it's necessary or if it's subject to compliance. There is also a problem whereby compliance demands that risky information contained in some unstructured data be protected, and auditors are becoming savvy towards this concept of unstructured data and the lack of accountability, or ownership of the information contained therein.

Let's start the clean-up process

With household hoarding, a cleaning crew often is called in to help the hoarder because the government mandates a "clean up or face the consequences" stance. Shortly after the cleanup crew leaves, however, the stuff starts to collect again because the person falls back into old habits, with no control over what they collect and how to keep things clean and organised. It is not good enough to just clean up the mess. You need a governance process to keep your unstructured data clean and sorted. Don't wait for the authorities to tell you to put a plan in place to govern access to unstructured data. Start the clean-up process now, and stop being an unstructured data hoarder.

The good news is that you don't need a 12-step programme to conquer vast collections of unstructured data. You can overcome the problem and get control of the chaos with the following six steps:

Step 1: Discover users and resources - Determine what is important by rolling up your sleeves and digging through the piles of data

Step 2: Classify data and access rights - Prioritise by running the data through a risk engine; focus first on those areas of greatest risk

Step 3: Audit and report on usage - Track the actual usage of the data to help identify ownership and sort out stale data

Step 4: Assign ownership and approvers - Obtain validation from the business owners to determine what is important and facilitate acceptance of data stewardship

Step 5: Remediate - Clean up the mess with access certifications

Step 6: Automate control - Keep it clean with access requests and approvals

So, don't just wait for the auditor to show up and tell you what you have to do. Start cleaning up the unstructured data mess now, before it gets worse and you end up with a massive data breach that spins out of control.

Matthew Johnston is managing director for South Asia, Dell Software.