AI Coding Tools in State Government Engineering

I spent a few months working with the engineering department at an Oregon state agency, helping their developers adopt GitHub Copilot as their approved AI coding tool across the legacy environment they operate in. The experience looked nothing like the demos you see online.

No one was vibe coding. Nobody built an app in 20 minutes. What happened instead was slower, messier, and more useful than anyone expected, including me and the team. We saw some common themes that apply to other organizations sitting on legacy code and wondering whether AI tools are worth the trouble.

Here is what we did together.

The short version: AI coding tools became useful only after we stopped treating adoption like a tool rollout and started treating it like workflow integration. The work was less about clever prompts and more about environments, repos, security expectations, review habits, reusable context, and engineers staying in charge of judgment.

That is the same adoption problem I see in a lot of government AI work. Training helps, but AI training by itself is not enough when teams are trying to change how work happens inside real operating constraints.

It is also the same standard a useful pilot should meet. The goal is not tool exposure. The goal is workflow change that can hold under real conditions.

Stages Of AI Engineering Maturity

Most AI tool rollouts follow the same script: buy licenses, schedule training, send a follow-up email saying "let us know if you have questions." Then nothing happens, and folks mostly migrate back to their old ways rather than developing a new way of working and adjusting team process.

What worked with this team was splitting the effort into two distinct phases. The first phase was getting people set up to use the tool, and that took longer and was more complex than anyone expected. The second phase was building repeatable patterns inside each developer's actual daily work.

Skipping the first phase is a major reason adoption and rollout stall. Not spending enough time on the second phase is why people who do use the tools often end up just scratching the surface of what is possible.

Phase 1: Get The Environment Working

Before anyone could evaluate whether Copilot was useful, we had to deal with the government engineering environment.

The team had developers on Visual Studio 2019, Visual Studio 2022, and VS Code. Some repos lived in GitHub, others in Azure DevOps Git, and some were still in TFVC, a version control system that predates Git and does not support modern AI workflows cleanly.

The team was also broken into different specialty subteams, with individual setups across their tools and computers just to get their repos working. One size did not fit all. We had to build different plans and requirements across subgroups based on the IDEs, repo locations, source control systems, access patterns, and security expectations they were working within.

At this agency, setting up the Copilot environment was not a half-day task. It took weeks to sort out which developers could run Copilot, which repos were compatible, and which systems would need modernization before AI tools could touch them in a meaningful way. A couple of weeks into the work, I was still meeting with people who were planning deeper use cases but did not yet have access fully working.

This is the ugly part that you do not get to see in most conversations about AI coding tools. The hard part was not just logging into the tool. It was creating new team policies, reviewing readiness across tools, understanding differences across repos, mapping modernization needs and legacy unsupported systems and libraries, and setting up standards around security expectations.

A good many repos were not ready for current AI coding tools without modernization. Even the medium-term and newer repos involved questions about where context, prompt, and instruction documents should live, who could edit them, how to use them, and how the team would support and own updates to those documents.

The team also had to agree on expectations for how much AI assistance was acceptable, when and how to introduce new workflows to the team, and where human review and judgment were required. Hint: all the time.

These were not small details or small conversations. They were the operating decisions that form a new way of working and determine whether adoption feels safe and practical enough to improve adoption, or whether the team would prefer to keep doing things the old way.

If your team is still in this stage, the Government AI Workflow Integration Checklist is a useful way to pressure-test governance, repo readiness, validation, and adoption support before you push for broader use.

Phase 2: Build Repeatable Patterns Into Daily Work

Once developers had working environments, the temptation was to say, "great, go explore." That works for basic surface-level adoption, but it does not show them what is possible with the tool.

Telling a developer who has been writing C# against a 15-year-old codebase to "just try Copilot" is like handing someone a chainsaw and saying, "just cut something." They might cut a piece of paper or trim a few leaves off a tree and then tell you the old tools, scissors or loppers, are just fine.

Working more deeply with AI and building it into the workflow is more like showing them that the chainsaw can cut a tree, then showing them how to do it safely so it does not fall on your house.

What worked was sitting down with individual developers, understanding what they were actually working on, and finding a specific task where an AI tool could help, then building it out together.

Then we shared those examples with the rest of the team. That mattered because the team could see how their peers were using AI in practical and new ways, in their environment, with the same issues and constraints.

The common thread was that engineers stayed in charge, used AI on real problems they were facing in their work, and helped the rest of the team see new possibilities.

Here are five use cases that came directly from that work:

automated test generation
legacy refactoring
reusable scripts for local server operations
log analysis with Splunk
repo-level instructions as a team change

Those examples also shaped the broader list of AI use cases I would put near the top for state and local government teams.

1. Automated Test Generation

One developer worked on a repository that had almost no test coverage. The team had talked about building out a test framework for a while, but nobody had the time to figure out how to start.

We built an automated unit test prompt and supporting testing context documents that could generate tests against the existing codebase, scaffold the test framework, use real data structure examples, and produce initial test cases.

The developer went from "we should really have tests" to actually having them, largely generated by AI, while also building a repeatable process that could work on other projects.

For a team carrying brittle legacy code, getting test coverage in place is one of the highest-value things you can do. Using AI for this is a low-risk, high-reward use case to pilot because it helps the team build safety around future changes without asking AI to own the final engineering judgment.

This is also why legacy workflow integration matters more than broad tool enthusiasm. The value came from applying the tool to a reviewable workflow the team already cared about.

2. Legacy Refactoring

The legacy team needed to migrate from old, no-longer-supported library methods to a newer replacement.

This is the kind of work that usually takes weeks of reading through documentation for both libraries, trying to understand the code that is already there and how it works, manually mapping old methods to new equivalents, and then testing each substitution.

With AI, the developer imported the legacy code alongside the new library's documentation. The model correctly explained the logic in the current codebase, mapped the older methods to the new ones, and identified where the replacements needed minor adjustment.

What might have taken two to four weeks of research, troubleshooting, and decision-making was compressed into about one week of focused work.

The developer's knowledge still mattered, especially in how they used the AI tool to approach the problem. They used it to explain the existing logic, identify the narrow problem they needed to solve, bring in the appropriate context, map the solution, and then verify the output.

The AI handled a lot of the detailed cross-referencing that would have consumed most of that time. The biggest remaining step for this developer was manual testing because the older system did not have integrated automated tests, which circles back to use case #1.

If you are choosing where to start, this is why the first repo matters. The Legacy Repo AI Pilot Selection Guide is built for that decision: which repo has enough business value, reviewability, ownership, and repeatability to make a good first pilot.

Pilot selection

Choose the first repo before expanding AI coding tools.

Use the Legacy Repo AI Pilot Selection Guide to compare candidate repos by business value, reviewability, ownership, technical complexity, and repeatability before turning one use case into a real pilot.

Get the pilot selection guide

3. Reusable Scripts For Local Server Operations

Before AI, writing a script for every repeated task could be annoying enough that people just kept doing the task by hand. That is no longer the case, and it is one reason I have seen people add a new folder to their projects just for repeatable scripts.

One developer was working on a project where changes involved restarting local servers through multiple steps across different repos. Done manually, the process was easy to get wrong. Miss a step, and the process needed to start over. The team used this process multiple times a day.

We used AI to build a script that automated the full restart sequence. Once it was in place, the developer could run the script or have an LLM run it automatically. That reduced human error in the restart process and freed up the developer to work on something else while waiting for the 15-minute process to reset.

This is a strong example of using AI to build tools that are specific to your environment and help improve repeated processes. It is now much easier to write a useful script in a language or toolchain you do not use every day.

That matters because scripts can do work that plain LLM chats cannot do as reliably. They reduce the need to keep asking the model for the same thing and getting an inconsistent or incorrect response.

The greater pattern is to use AI to build tools that it can use, or that you can run, to save time, reduce manual error, and turn a fragile process into a reusable tool.

4. Log Analysis With Splunk

That same developer also worked with us on a pattern for reviewing Splunk logs. We gathered structured log data, then provided it to the LLM to identify errors and bugs faster than manual review.

This is a good example of a broader practice: gather structured data from your existing tools, bring it to the AI, and let it help you find what matters.

The anti-pattern is asking the LLM to find the data itself when you already have a reliable tool that can do the job. If a tool can retrieve the right information, it is usually better to give the AI access to that tool or that information, like the Splunk logs in this case, rather than asking the model to figure it out from scratch.

You see this same pattern in CLIs, MCPs, and skills that give LLMs structured tools they can use appropriately. This was another use case many people on the team had not considered, and it opened up a new mental map for ways they could work with AI.

5. Repo-Level Instructions As A Team Change

For those unfamiliar, Copilot supports an instructions file that loads as context at the start of every chat session. We built one that told Copilot: you are working in a legacy C# application, these are the common patterns, and these are the known issues to watch for.

This became a template that needed to be customized, but it could quickly improve the output of Copilot for all members of the team once adopted as a standard practice.

In some legacy repos, however, adding new files is not as simple as dropping a markdown document into the repo and moving on. The team may need to decide where instructions should live, who can edit them, how they are reviewed, and whether the repo can support that kind of change cleanly.

It is the human part that makes this hard: making decisions, changing the way people work, and agreeing on how the team will own the new practice.

We worked to help the team outline standards and governance for adding shared instructions or other AI-focused configuration files to the repo context. Leads on the team had to help develop the instructions, test them, show the value, and then bring the team along on a new way of working.

Repo instructions are very useful, but they are also a new management problem because they touch team norms, repo governance, review ownership, and who is allowed to change the files that guide AI output.

The pattern across all five use cases was consistent: do not just show people the tool. Help them build something specific to their work that they can reuse. A test framework. A migration workflow. A server script. A log analysis pattern. An instruction file.

Improving daily work and showing engineers how to get to higher-value use cases is what makes them curious. That curiosity leads to better solutions over time.

What I Wasn't Expecting

I went in thinking I would spend most of my time on prompting techniques and workflow design. Instead, the majority of early effort was environment assumptions, access issues, and figuring out which systems or projects were compatible at all.

The biggest gains came from simple things the team does every day: explaining legacy code, refactoring old confusing systems, and adding small process steps to the way they had already been working. Work that could have stretched across weeks or months was compressed into a single week. But getting there required expertise from the people doing the work to frame the problem and validate the results.

One developer's small win could easily become a win for the whole team. The developer who built the local server restart script, the log analysis pattern, and the Copilot instruction file generated value far beyond their own work, allowing the team to learn, understand, and adapt their own daily work.

Just as important was creating dedicated time to share what was working so the team could improve its learning and shared understanding.

How We Measured Progress

We kept measurement simple. Surveys and direct feedback from each developer.

Across the board, adoption went from some developers not using AI at all to every participant using AI tools daily within the engagement period.

But the more honest metric is confidence. Developers went from "I'm not sure this is useful for what I do" to "I have specific workflows where this saves me real time."

That shift came from the hands-on work of building patterns together, reinforcing the training, and working in their actual codebase on real problems they were facing.

What I Would Tell A Government IT Leader

Do not start with the AI tool. Start with your environment.

Which IDEs are your developers on? Which version control systems? Are repos accessible with modern tooling? How much of your codebase can an AI tool actually read? Which systems are too sensitive, too old, or too hard to validate? Answer those questions first, because the answers will shape everything that follows.

Then resist the urge to run one big training and walk away. Pick two or three developers who are willing to experiment, sit with them on real work, and help them build something repeatable: a test framework, a small refactoring workflow, an operational script, a log review pattern, or a repo instruction file. Pick something specific to their job that they are actually working on.

Then share what works across teams. The developers on this team had different stacks, different problems, and different comfort levels with AI. But the patterns that emerged worked across teams once someone took the time to show how they applied the tools, shared their knowledge, and captured lessons learned.

That is the bridge many rollouts miss. For the broader rollout pattern, see Why Most Government AI Rollouts Fail After the Pilot.

Unlike a lot of private-sector AI narratives, this workflow is not about replacing developers. In government work, the purpose is public good: making systems better, safer, and stronger for the people who depend on them. The people on this team are carrying heavy workloads with years of technical debt, complex legacy systems, and demanding 20-year-overdue refactoring projects.

AI tools cannot clear the backlog overnight, but they can make the work faster and less tedious, which matters a lot when you are a small team maintaining systems that a state government depends on.

That is what actually happened, and honestly it was way more interesting than vibe coding another to-do app. Just patient, specific work that made a real team's day-to-day work a little easier.

What It Took To Make AI Coding Tools Useful Inside A State Government Engineering Team

Stages Of AI Engineering Maturity

Phase 1: Get The Environment Working

Phase 2: Build Repeatable Patterns Into Daily Work

1. Automated Test Generation

2. Legacy Refactoring

Choose the first repo before expanding AI coding tools.

3. Reusable Scripts For Local Server Operations

4. Log Analysis With Splunk

5. Repo-Level Instructions As A Team Change

What I Wasn't Expecting

How We Measured Progress

What I Would Tell A Government IT Leader

Pick the right first repo before the pilot starts.

Talk through your next AI adoption project.

Related reading

How One AI-Assisted Test Became a Repeatable Engineering Workflow

Why Useful AI Adoption Starts With One Real Workflow, Not a Broad Rollout

How Engineering Teams Can Integrate AI Into Legacy Workflows While Keeping Governance Intact