Data Access, Sharing Vital to AI Success, Federal Officials Say

More freedom to share data is vital to the success of developing AI technologies that meet the needs of Federal agencies, government officials said today.

Damian Kostiuk, deputy chief data officer in the Office of Information Technology (OIT) at the Department of Homeland Security’s U.S. Citizenship and Immigration Services (USCIS) component, talked about the need to share data more freely during a discussion at ATARC’s Artificial Intelligence and Data Analytics Breakfast Summit on April 6.

“I think a lot of speakers may talk about how people for security [reasons] want to shield and isolate data all over the place and not share it,” Kostiuk said. “That is inherently just not going to facilitate the role of AI – you need to have data sharing.”

Kostiuk talked about the need for a centralized data sharing location to enable the creation of robust AI tools for agencies to cut down on time spent performing more mundane tasks.

“I can think of at least three right off the top of my head, situations within our offices where the discovery of a beautiful thing that saved hundreds of thousands of hours of man time in working products,” he said.

Amanda Mitchell, IT specialist in AI/ML integration and transformation at DHS, agreed about the potential for AI and machine learning (ML) tech to save time for Federal agency personnel.

She talked about how USCIS employs a digital evidence classifier to scroll through hundreds of thousands of artifacts to look for evidence, and the development of ML tech that was able to save millions of “page scrolls” performed by agency personnel in that process.

Kostiuk discussed the need for government to adopt its own private AI tools in order to avoid some of the issues that have emerged around public AI tools like ChatGPT.

“You’re going to need your own private version of it” in order to “control the source of the content that’s in there,” he said. By taking that route, the output is “not going to be toxic, it’s not going to be racist or … sexist nonsense. You’re going to trust the provenance of it,” he said.

Data quality, said Mitchell, also continues to be of overwhelming importance in building AI tech.

“Do we even have the data, or can you even create synthetic data, that will properly represent the question I’m trying to answer before I even buy anything from anybody,” she asked. “I can guarantee you most of the time, not really … AI is only as strong as the quality and accuracy of the data it evaluates.”