February 18

Data Cleanup Part 1: Primary UserIDs

Welcome to the February issue of Identity Management in 13 Easy Steps. In most parts of the country the weather is cold and dreary, and what better weather for an ID cleanup? 🙂

Time to clean the data
clean the data

So roll up the sleeves, find the glasses, and brew a lot of extra-strong coffee – it’s time to tackle those primary userIDs.

Primary userIDs – what are they?

A primary userID is the main ID that each user has in an organization. This is the one ID that they *should* have on all systems, although that is often not the case. Typically, the primary ID is the user’s network ID – that is, the ID that each person uses to log into their computer in the morning, and probably also to log into their email. Many organizations call this the LDAP ID or (for Windows-heavy shops) the Active Directory ID. Organizations that are mainframe-heavy might store their primary IDs on the mainframe.

The task at hand

On the surface, this month’s activity is simple: correlate each user’s primary ID with their name and other identity information, as this will be the basis for the identity repository going forward. Hopefully everyone’s primary ID is already stored electronically somewhere (at least in a spreadsheet) and there is some useful data already associated with each ID – like a name, an employee number, or other identifying information. If not, well, that’s where the extra-strong coffee comes in (or maybe decaf would be better?).

The task may be easy to describe, but there are three significant challenges in this cleanup process:

Challenge #1: mapping primary IDs to people

It is likely that the list of primary IDs (assuming it exists) is missing information, or has data that’s so outdated as to be useless. Worse still is a list of IDs without any information (who are bassfisher68 and jedimaster84?). Equally frustrating is the same-name problem: how many John Smiths, Trong Nguyens, and Juan Gonzalezes are in your organization… and whose name goes with which ID?

Challenge #2: are they even still here?

It is often hard to map IDs to people when the ID has persisted, but the person is long gone. Even more doubt is created when the ID belongs to someone with a common name.

Does jsmith3 belong to that contractor that was in here 2 years ago, or does it belong to the guy downstairs in accounting?

A nasty – but necessary – part of cleaning up primary IDs is identifying orphaned accounts that should no longer be active. On the upside, this is a healthy security exercise that often gets put off – after all, who wants to deal with the screaming users when the wrong IDs get disabled? But for identity management to work, this HAS to be done – no more excuses or avoidance!

Challenge #3: mapping primary IDs to primary sources of record

Once the IDs are mapped to the correct names/people and orphaned accounts are retired, it’s time to map the IDs to the corresponding accounts in the sources of record that were identified in last month’s exercise. Remember, identity management is just a facilitator of actions. A key integration is between identity management and the HR system, as that enables the automation of access creation and removal based on hire, transfer, and termination events in the HR system. Identity management can also facilitate the auto-provisioning or password self-service of a user’s other accounts (like email) based on proper linking.

The biggest difficulty in this exercise is typically matching the userID with the right HR record, due to potential differences in legal vs. preferred name. Very often, email addresses and userIDs are set up based on the individual’s preferred name (e.g., Mike, Trish, Betsy), whereas the HR record will contain their legal name (e.g., Michael, Patricia, Elizabeth).

Is Mike Smith the same guy as Michael Smith – or not?

Guessing is not allowed here – matching up the wrong user with the wrong HR record can have very serious consequences. HR doesn’t take kindly to people seeing each other’s salary information. Getting someone else’s email is generally frowned upon as well, especially if some new junior analyst was confused with a senior VP (believe me, this has happened more than once!)


There is no *right* or *easy* way to execute this cleanup.

With little starting information and/or a large user base, this will be a painful and time-consuming process, but here are some things to help get organized:

-        Determine the data set that is needed. Make sure it is the bare minimum to start because once identity management is implemented and the records are linked, a lot of additional information will populate automatically. The goal here is to identify which data points are needed to accurately link records between systems – nothing more

-        Start with the cleanest source of record to build some momentum. While this is often the HR record, sometimes email is the best bet. Other sources may also be appropriate (like the mainframe). In general, the cleanest sources of record are ones that are carefully controlled and well automated in a database or a repository.

-        Enlist the help of someone good at scripting to automate some of the searches and comparisons. Done right, this saves immeasurable time!

-        Communication is key!

  • Make sure the user base knows a cleanup is underway and why it benefits them
  • Solicit assistance from department heads – they can help identify users and their correct/current information
  • Ask the leadership to alert their people that they may be polled for information, and specify the name of the team that will do the polling (provide the names of individuals if possible). Users need to know that these requests are legitimate and not a phishing attempt (especially if they just attended training on phishing or Michael has already worked to improve your awareness program)
  • Communicate the cleanup process to the leadership so they know the who, what, where, when and why of the effort. This is especially important when the team ends up with a pool of orphaned IDs and no other means of research. The only remaining option is to deactivate those accounts and see if anyone complains. Management needs to understand and support this decision before it can be executed

-        Don’t be afraid to disable IDs if reasonable research has not yielded results. Researching identities is extremely time consuming – there is a point where enough is enough, and the security risk to the company should outweigh the brief inconvenience that a handful of users may experience

-        Engage HR representatives and local technical support personnel. They tend to know the users personally, and can be of great help identifying them

If existing records are already in pretty good shape, sit back and smile smugly while everyone else beats their head against the wall for a while.

Keeping it clean

If there is no current identity management system in place, it is important to keep the new repository of primary userIDs reasonably clean until the new system is in place. Otherwise this fun exercise will need to be repeated.

Staying up-to-date manually requires a process to keep user data in good repair but the process should not be complex or labor intensive. Do the bare minimum necessary to keep the data decently clean. It’s OK if it’s not perfect – a small final cleanup is inevitable.

A word about userID naming standards

If this process reveals the lack of a userID naming standard, or a standard that no longer makes sense for the organization, this is the right time to establish a new, sensible one. This is a large and painful exercise in and of itself, but it is far better to enter into an identity management implementation with a solid and appropriate naming standard than to try to fix it later.

Here are the things to consider:

-        Grandfathering existing users vs. making them change their ID to match the new standard

  • Unless there are specific technical reasons for converting everyone, I recommend grandfathering. A primary ID can be created in identity management in the new format and mapped to the untouched existing IDs. This meets the needs of identity management while minimizing impact on the users

-        Helping users with multiple ID formats across various systems consolidate to one ID format

  • Although this can be a little painful, many users are happy to undergo the initial challenge in exchange for not having to remember which ID to use on which system

-        Having different ID formats for employees vs. non-employees

  • I recommend not doing this. Having visual segregation of ID is much more important in a manual paradigm. With identity management there are many ways to identify a user’s employment status without segregating by ID, and having different ID formats causes more problems than it solves

-        Make sure that the selected format will work on all systems – including those legacy dinosaurs with all their length and character limitations

-        If you choose to have userIDs based on name, establish a clear policy about changing the ID in the case of marriage, divorce, sex change, etc.

  • Changing someone’s display name is easy. Changing their userID can be tricky, because on many systems this isn’t possible –the old ID has to be deleted and a new one created, which leaves a lot of room for error in copying permissions, files, scripts, etc. However, some people feel very strongly about their name, especially after a nasty divorce or a sex change, so there has to be a provision for this

-        Make sure the new naming standard scales adequately for the expected growth of the company, and that it addresses situations where users may need more than one ID, or where individuals have the exact same name (possibly even same middle name or middle initial)

Parking Lot

Doing a userID cleanup of this nature can uncover all kinds of interesting issues – like fields being used to store data that they were not meant to store, IDs being created through unofficial channels that probably shouldn’t’ve been created, etc. Some of these discoveries might be security risks, some might just be sloppy administration, and still others might impact the identity management implementation down the road. In any case, it is important to document these discoveries along the way and do something about it – even if that something is just notifying the responsible manager.

Action Recap

This month, we covered the following key actions:

  1. Identify the primary ID, and determine who owns each ID
  2. Identify and retire obsolete IDs
  3. Connect primary IDs to the appropriate records in the target systems identified in last month’s exercise
  4. Develop (and use!) a process for keeping the IDs clean until identity management can take over
  5. Make sure the current ID naming standard is adequate and fix it if it isn’t

None of these actions is quick and easy, but getting them done sets a firm foundation for a successful identity management implementation.

How can I help?

Do you need some clarification or additional assistance? Do you have an experience to share with others? Leave a comment below so we can all improve together.


identity management

You may also like

Are you using frameworks properly?

Leadership and communication are actually layers, not levels

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Subscribe to our newsletter now!