MeriTalk Q&A: NRC CIO Nelson on the March From Microfiche to AI

How long does it take to finally kill off one of the mainstays of 1970s-era records storage technology and in the process double the size of a Federal regulatory agency’s primary records database while vastly improving the accessibility and usefulness of the data contained within it?

If you ask David Nelson, chief information officer (CIO) at the Nuclear Regulatory Commission (NRC), the answer is just shy of four years – and with enormous benefits to the agency, its staff, and the public set to roll out for years to come.

Nelson sat down with MeriTalk last month to take us through NRC’s years-long effort to expand and reimagine the ADAMS (Agencywide Documents Access and Management System) library that forms the backbone of how NRC gathers, stores, and makes available crucial data about how the regulator oversees all things nuclear – from power plants to medical and construction applications.

In the following interview Q&A – edited for length – Nelson details the project’s challenges and how solutions have relied on creative deployments of up-to-date technologies. Here are a few thumbnails:

Digitizing 43 million pages of microfiche records;
Employing computer-visioning technology to create value from old images;
Training the technology to work better for the project’s needs;
Tapping cloud platforms for crucial portions of the work;
Carefully approaching the use of AI technologies for specific aspects of the work; and
Working through multiple iterations of new user interfaces to optimize the value of the expanded library for NRC staff, and soon for the public.

Here’s the full story…

MeriTalk: David, NRC is getting close to the culmination of a very large project to upgrade the content and usability of its Agencywide Documents Access and Management System (ADAMS), which serves as NRC’s official recordkeeping system and provides access to document libraries both for agency and public use. Please take us back a few years, or even decades, to the beginning of the story.

Nelson: ADAMS was developed and designed in 1999. We were looking for a way to better manage our records, which is of course very important to a regulator.

We have to maintain records when it comes to licensing actions, decisions that are made around those licensing actions, and also the types of things that our inspectors do in our regulatory work and oversight activity to make sure licensees are operating in a safe way. So, maintaining those records is very important to us, and what was being done before that was mostly on paper, so ADAMS was our way of moving to a much more electronic way of managing those records.

It’s basically an on-premise content management system. We are able to declare which documents are available for public access, and which documents have to remain restricted because of sensitive information and have to remain somewhat more secure. This entire system was built on premise and has been operated in that way since then.

MeriTalk: Technology has come a long way since 1999, but journalists of a certain age remember when records were the province of microfiche storage … when was the turning point for NRC to finally move past that era?

Nelson: 2019 was a turning point, that’s when we began our journey to really try to make this kind of information more accessible – particularly to support our mission and make sure that we have access to all of that rich, older data about licensing decisions, why they were made at the time, and how does that differ from today.

That information when it sat in cabinets and microfiche was not really very accessible. People would have to set up an appointment, go look in the cabinets and try to find the one thing that they were looking for, and while we had a filing system, of course, it was very difficult.

We had a filing system to manage the records and microfiche too, but you know, it’s still very, very difficult to find exactly what you’re looking for in that format. To add to that, these microfiche records .were from 1974 to 1999, there were quite a few of them, and the quality of the microfiche could vary greatly as well, some were blurred, some covered with stamps.

MeriTalk: How does one even begin making those into digital-format records?

Nelson: We had to figure out how to create something that’s accessible, machine-readable, and how to bring that into the more contemporary ADAMS library. We worked with some vendors, and landed on a solution that was novel to us, but we had really good success with it.

What this particular technology did is it used computer visioning – a number of different ways of looking at the data as it was trying to extract it into machine-readable, OCR [optical character recognition] stuff. We had documents we pulled out of there that we really couldn’t read before, and we ended up with something that looked very much like something that was typed on a word processor yesterday.

So not only did we make it more accessible, but we recreated it as much more accurate and accessible documents for the agency and regulators to use.

MeriTalk: And within that process is technology to extract data from specific data fields, and the like?

Nelson: It’s the starting point. We were a forms-based organization, and you start with that because you’re expecting certain information in certain sections of a form. It’s partly people training the technology, but you can get fine-tuned to the point where you are picking up the expected information from those fields pretty accurately and pretty quickly.

But what’s really strange is when things that are unexpected show up in those fields, and then teaching your technology to really understand what that is. I mentioned stamps before and believe it or not stamps were useful back in those days, but a lot of times they were never quite in the correct field. But through some of the new technology you can pull the information out of that and create machine-readable information. That used to be nearly impossible because the stamp was treated like an image, which is not something that can be interpreted with OCR.

MeriTalk: So how has that microfiche conversion effort gone?

Nelson: We’ve actually completed that entire set. We were able to do that in just short of two years. Most of that was done during the pandemic and by a small group of people in an area that was secured for health and safety reasons. We were able to complete the work because we knew it was important, and we had some dedicated people that were willing to come into the building.

MeriTalk: How many pieces of microfiche did you have to plow your way through?

Nelson: We ended up adding about 43 million pages to our repository, and that that was from about two million documents. This essentially doubled the size of our ADAMS library. This information is not just for our internal people, a lot of these were public documents so the project also benefits all of our public stakeholders.

MeriTalk: A lot of people might associate the Nuclear Regulatory Commission with nuclear plants and cooling towers, can you give us an idea of the scope of the NRC’s activities beyond just plants?

Nelson: We regulate the civilian use of nuclear materials. Those materials are extensively used in medicine, they are used in construction, in a lot of different tools. Some of the licensees are very small, they may be a construction company, for instance. We also regulate the storage of those materials, the transport of them, and the long-term storage of spent reactor fuel. It’s much more than regulating nuclear plants.

MeriTalk: So, after the couple years of effort to digitize all of the paper and fiche-based data, how is artificial intelligence playing a role in the ADAMS upgrade project?

Nelson: There was some AI involved in the tools we were using for the digitization, and some machine learning as well.

But regarding AI, the challenge is what to do after we have all these rich documents available. We pulled that off, but now how do you access it, how do you find what you want? It was already hard to find things that you wanted in there because the older record system and the content management systems were set up in such a way that was more like walking into a library set up on the Dewey Decimal System.

It’s all about record retention schedules, how long they need to be in there, how do we manage those, how do we move them around, how do we make sure that records are available – more than how do we use this to support the mission and how can we help people with decision-making using data? That was still a challenge, and in some ways even a greater challenge because we had a larger library.

We were really struggling to try to figure out how do we do that, and we tried a number of different things that just didn’t quite work out. We tried to use some AI tools that we had, but when we were approaching it in that direction, what we found is we were trying to respond to a specific question and trying to pull data out of unstructured data, and that became our structured data set that was appropriate for answering a single question. It was a very long process.

MeriTalk: How did you move forward from there?

Nelson: We worked with one of our vendors, a cloud service provider – I won’t mention the name but one of the major FedRAMP-approved cloud providers. We shared the challenge and talked to them about what kinds of cognitive tools are available to work with large data sets like this and create search engines that are much more powerful and quicker and work for the end-user.

We set up a pilot, and we crawled all of the data that we had in our content management system, pulled it all out, read it, and created what we call up in the cloud a text blob. What we were able to do with that is go through it and index all of that data and that metadata that’s there with it and create an index of the entire group of text in a very consumable format sitting up in the cloud.

Then we worked with our vendor and my own development team at NRC to create a user interface, basically a search engine, using the kinds of search engine technologies we use every day, they are very powerful. We created our own interface, and worked with our user community that includes some of our heaviest users, but also some of our users that really didn’t use the old system much because it was just so difficult to use that they never could find anything.

We put a heavy emphasis on user design and tried to develop a very simple interface – the first line on that interface looks just like the kinds featured on commercial search engines. So, you can put a text string in there, and you don’t have to look anything up by docket number, record number, the name of a plant, or any of that stuff.

And because we do have some very advanced users, you could break that down and do different kinds of filtering to get to the information you needed. We went through several iterations of that, but even with our initial tests we were able to find results in literally seconds. It was just amazing to us.

That started with only a section of our ADAMS library, so we moved the rest of it up in there and created an operation around it so that that text blob is updated very frequently – I think now we update it every 15 minutes. So, as we’re adding new things into our library, they go up into the cloud, they get indexed, and added to that area that we can actually do the right searches. Once you find what you’re looking for, you can go directly to the document that remains onsite in our content management system.

MeriTalk: One question about the user groups – are those the internal NRC people who are heavy users, or does it include users outside of the agency who are probably hitting that data heavily as well?

Nelson: We have not done that yet, but we do plan to do that in the future. The searching functionality was developed for our internal teams, but we do plan to improve the search for the public as well. The strangest thing about our public library is it gets crawled by all the commercial browsers anyway, so many times our public users are actually using those commercial search engines to find what they’re looking for.

MeriTalk: And it’s immeasurably easier since the microfiche conversion…

Nelson: Yes, there’s a lot more information in there for them to find. But those commercial search engines don’t have access to the other part of our ADAMS library, and that’s why we had to start with improving the search there.

In doing that, we were very much in an agile environment with human-centered design, so we went through a lot of iterations with that user group to make sure before we launched, we could improve as much as we could.

MeriTalk: How long did that process take to go through all of those iterations, is it months or years, and is it still continuing?

Nelson: I think the whole process from our original pilot through actually launching it publicly was probably just over six months. It wasn’t a long process. We went through less than 10 iterations of the human interface – probably around six or seven.

MeriTalk: So, after all of this work, is it accurate to say that the ADAMS library is still an on-prem system that uses cloud-based tools?

Nelson: It’s a great question. I think it’s a hybrid system at this point.

We’ve been struggling with the need to re-platform that particular system because that’s a very large system, and it goes into a lot of different parts of our work processes as well. We tried building search within the tools of that system, and we could never get anywhere. But by decoupling the search – actually taking the data and putting it into a different format that’s indexed and more accessible – the system no longer just sits in one place, to me the system is sitting in the cloud and sitting on-prem. But the basic content manager is still sitting on-prem.

When you have a large system like that, I think the approach of breaking off pieces of it that will solve the challenges first is a much better way than saying let’s just re-platform this thing, let’s figure out how we move everything for the purpose of moving to a different technology. That, to me, is not a good purpose. The purpose has to be to solve challenges that the agency has.

MeriTalk: Is there a vision down the road where you no longer need the on-prem element, and you just go to the cloud?

Nelson: We have a Cloud Smart strategy, like all the agencies do at this point in time, and we are moving as much as we possibly can to the cloud. But we would like to do that in a smart way. We’re refactoring and rethinking how that system should be working, and how do you take advantage of that migration to improve your business processes at the same time and simplify them. That’s the important thing.

So yes, everything’s on the table. We certainly are still looking at eventually how do we get to that state where most of what we have is up in the cloud. We’re actually moving pretty quickly.

MeriTalk: It sounds like you’ve done a lot to improve user experience, which is very high up in the President’s Management Agenda. Are there any next steps planned for user experience improvement?

Nelson: Yes, we do. One example has to do with AI. For the cloud service providers that work with the government, you’ve probably seen that all of them are looking to actually embed some of the generative AI-type tools into their products that they’re offering. That’s scary when you think about it, and it’s certainly a conversation we’re having at the CIO Council is how that how that impacts us.

Some of the agencies are doing a really good job of trying to figure out how to add even better metadata to the data that is public in there so that when these generative AI tools are out there, that they’ll see that data and see that as source data and important data, and use that versus everything else that’s out in the wild.

What I think is really interesting when we talk to our vendors is how can we use those kinds of tools on a very specific set of data – our data – so it’s embedded in the product, and we have an interface with one of these types of AI tools. I think it really simplifies a lot of the kind of work that you have to do with heavy data scientists that are really trying to dig into the data to find different insights.

I think it gives an interface to people that are much more program-oriented or mission-oriented than data science-oriented to really work with that generative AI and ask the questions and pull new insights out of this heavily index data that we have right now. Those products should be available in the next year.

So, I really think that beyond just having a very effective search engine now that we’re able to move around and find data, and be able to look at new facets across all that data and all the work that the agency has done since 1974, all the decisions that have been made, and combine different things in ways that will be much simplified with those kinds of interfaces.

MeriTalk: Any thoughts on what agencies ought to be doing to get ready for more AI use in general?

Nelson: I think all of us have to try – especially Federal agencies – to try to figure out how to set ourselves up for that so that people are using these tools appropriately and using our data appropriately – and not necessarily mixing it with other pieces of data that can be generated by anybody.

MeriTalk: Back to the ADAMS improvements, are there some lessons to be shared?

Nelson: I’ve had conversations about it with my peers on the Federal CIO Council. I’m also the chief data officer [CDO] at NRC, so with some of my CDO colleagues as well.

I’m hoping to do something similar to this interview we are doing now with the CIO Council, now that we have it up and running right. I want to be able to share this because I think it is a challenge, and not just at NRC. Federal agencies have a lot of older records, and also there is OMB M19-21, which is all about digitizing records and managing those digital records.

So, it’s both ends of this – it’s new ways to do that digitization that give you a higher quality product, but then it’s also about how it can be used more efficiently within the agency. I want to share that story.

MeriTalk: The project has come a long way in four years, do you have one good piece of advice for your peers about taking on a big project like that?

Nelson: I’m always a believer that you’re not looking at the next shiny technology, you’re looking at the problem in front of you, and then working with others. You have to have awareness of what those technologies are, but I don’t think you want to go find a use for technology. Rather, you need to find the problem, and then use different approaches to solve those problems.

The other thing is to break down your problem. The answer is not necessarily that this isn’t working, so I have to move to a new system. Find what isn’t working, and then a way to decouple that from what other system you are working on, and then find something that can actually work with it and provide a much better result for you.

MeriTalk: A little bit more on the personal side, can you tell us a little bit about your technology path? Has it always been a natural interest, or something you acquired along the way?

Nelson: I’ve been heavily involved in tech for my entire career. I started in the Air Force in technology and telecoms, and then spent 20 years in the private sector, working in telecoms, satellites, software development, broadband over power lines, but always within technology organizations. And then I rejoined Federal service about 20 years ago. I started with Centers for Medicare and Medicaid, and worked several very large technology programs. I was the CIO there when I left. And then for the past seven years here at the Nuclear Regulatory Commission, so it’s always been technology driven.

MeriTalk: Last question – what do enjoy doing in your “real” life that has little to do with technology?

Nelson: My wife and I truly enjoy getting out on the Chesapeake Bay and boating almost every weekend we can.

Cookie	Duration	Description
AWSALBCORS	7 days	Amazon Web Services set this cookie for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pxhd	1 year	PerimeterX sets this cookie for server-side bot detection, which helps identify malicious bots on the site.

Cookie	Duration	Description
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.

Cookie	Duration	Description
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
_gat	1 minute	Google Universal Analytics sets this cookie to restrain request rate and thus limit data collection on high-traffic sites.

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
ln_or	1 day	Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.
pardot	past	The pardot cookie is set while the visitor is logged in as a Pardot user. The cookie indicates an active session and is not used for tracking.
UID	1 year 1 month 4 days	Scorecard Research sets this cookie for browser behaviour research.
vuid	1 year 1 month 4 days	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
__gads	1 year 24 days	Google sets this cookie under the DoubleClick domain, tracks the number of times users see an advert, measures the campaign's success, and calculates its revenue. This cookie can only be read from the domain they are currently on and will not track any data while they are browsing other sites.

Cookie	Duration	Description
anj	3 months	AppNexus sets the anj cookie that contains data stating whether a cookie ID is synced with partners.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
GoogleAdServingTest	session	Google sets this cookie to determine what ads have been shown to the website visitor.
IDE	1 year 24 days	Google DoubleClick IDE cookies store information about how the user uses the website to present them with relevant ads according to the user profile.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
muc_ads	1 year 1 month 4 days	Twitter sets this cookie to collect user behaviour and interaction data to optimize the website.
personalization_id	1 year 1 month 4 days	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
uuid2	3 months	The uuid2 cookie is set by AppNexus and records information that helps differentiate between devices and browsers. This information is used to pick out ads delivered by the platform and assess the ad performance and its attribute payment.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
_mkto_trk	1 year 1 month 4 days	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
__gpi	1 year 24 days	Google Ads Service uses this cookie to collect information about from multiple websites for retargeting ads.