Tech Works: When Should Engineers Use Generative AI?

Create a generative AI policy that helps guide engineers in what to do — and not to do — with ChatGPT, Copilot or any other chatbot.

Sep 22nd, 2023 5:00am by Jennifer Riggins

Image by Diana Gonçalves Osterfeld.

Tech Works is a monthly column by longtime New Stack contributor Jennifer Riggins that explores workplace conditions, management ideas, career development and the tech job market as it affects the people who build and run the software the world relies on. We welcome your feedback and ideas for future columns.

Your developers are already playing around with generative AI. You can’t stop them completely and you probably don’t want to, lest they fall behind the curve. After all, you want your developers focusing on meaningful work, and Large Language Model (LLM)-trained code-completion tools like Amazon Web Services’ CodeWhisperer and GitHub’s Copilot have great potential to increase developer productivity.

But, if you don’t have a generative AI policy in place, you’re putting your organization at risk, potentially harming your code quality and reliability.

ChatGPT’s code is inaccurate more often than not, according to an August study by Purdue University researchers. Yet more than 80% of Fortune-500 companies have accounts on it. You could also be putting your reputation on the line. Just look at Samsung, which recently had an accidental leak of sensitive internal source code by an engineer into ChatGPT, which sparked a blanket ban on generative AI assistants. That’s probably a reasonable short-term response, but it lacks long-term vision.

In order to take advantage of this productivity potential, without the PR pitfalls, you have to have a clearly communicated generative AI policy for engineering teams at your organization.

For this edition of Tech Works, I talked to engineering leadership who adopted GenAI early to help you decide how and when to encourage your software engineers to use generative AI, and when to deter them from leveraging chatbots and risking your organization’s privacy and security.

Consumer vs. Enterprise Generative AI Tools

There are many generative AI tools out there — CodeWhisperer, Google’s Bard, Meta AI’s LLaMA, Copilot, and OpenAI’s ChatGPT. But thus far, it’s the latter two that have gotten the buzz within engineering teams. Deciding which GenAI tool to use comes down to how you’re using it.

“People are just dropping stuff in ChatGPT and hoping to get the right answer. It’s a research tool for OpenAI you’re using for free. You’re just giving them free research,” Zac Rosenbauer, CTO and co-founder of a developer documentation platform company Joggr, told The New Stack. (By default, ChatGPT saves your chat history and uses the conversations to further train its models.)

Rosenbauer then showed me a series of slides to explain how an LLM works, which comes off as more guessing the probability of a word to fill in Mad Libs than going for the most accurate response. “That’s why you get really stupid answers,” he said. “Because it’s going to try to just answer the question no matter what.”

Public LLMs are trained to give an answer, even if they don’t know the right one, as shown by the Purdue study that found 52% of code written by ChatGPT is simply wrong, even while it looks convincing. You need to explicitly tell a chatbot to only tell you if it knows the right answer.

Add to this, the very valid concern that employees from any department are copy-pasting personally identifiable information or private company information into a public tool like ChatGPT, which is effectively training it on your private data.

It’s probably too soon for any teams to have gained a competitive edge from the brand-new ChatGPT Enterprise, but it does seem that, due to both quality and privacy concerns, you want to steer your engineers away from regular ChatGPT for a lot of their work.

“The first thing we say to any company we deal with is to make sure you’re using the right GenAI,” said James Gornall, cloud architect lead at CTS, which is focused on enabling Google customer business cases for data analytics, including for Vertex AI, the generative AI offering within an enterprise’s Google Cloud perimeter. “There’s enterprise tooling and there’s consumer tooling.”

“Every company now has GenAI usage and people are probably using things that you don’t know they’re using.”

— James Gornall, CTS

ChatGPT may be the most popular, but it’s also very consumer-focused. Always remind your team: just because a tool is free, doesn’t mean there isn’t a cost for using it. That means never putting private information into a consumer-facing tool.

“No business should be doing anything in Bard or ChatGPT as a strategy,” Gornall told The New Stack. Free, consumer-facing tools are usually harmless at the individual level, but, “the second you start to ask it anything around your business, strategy approach or content creation” — including code — “then you want to get that in something that’s a lot more ring-fenced and a lot more secure.”

More often than not, generative AI benefits come from domain specificity. You want an internal developer chatbot to train on your internal strategies and processes, not the whole world.

“Every company is now kind of a GenAI company. Whether you like it or not, people are probably going to start typing in the questions to these tools because they’re so easy to get a hold of,” Gornall said.

“You don’t even need a corporate account or anything. You can register for ChatGPT and start copying and pasting stuff in, saying ‘Review this contract for me’ or, in Samsung’s case, ‘Review this code,’ and, invariably, that could go very badly, very, very quickly.”

You not only increase privacy and security by staying within your organizational perimeters, you increase your speed to value.

GenAI “can save a lot of time; for example, generating documents or generating comments — things that developers generally hate doing. But other times, we will try using this and it’ll actually take us twice as long because now we’re having to double-check everything that it wrote.”

— Ivan Lee, Datasaur

Don’t use a consumer-facing GenAI tool for anything that is very proprietary, or central to how your business operates, advised Karol Danutama, vice president of engineering at Datasaur. But, if there is something that is much more standardized where you could imagine 100 other companies would need a function just like this, then he has advised his team to feel more comfortable using LLMs to suggest code.

Don’t forget to factor in ethical choices. A company-level AI strategy must cover explainability, repeatability and transparency, Gornall said. And it needs to do so in a way that’s understood by all stakeholders, even your customers.

Context Is Key to Developer Flow State

You will always gain more accuracy and speed to value if you are training an existing LLM within the context of your business, on things like internal strategies and documentation. A context-driven chatbot — like the enterprise-focused Kubiya — needs to speak to the human content creator, and hopefully speed up or erase the more mundane parts of developers’’ work. Early engineering use cases for generative AI include:

Creating code snippets.
Generating documentation and code samples.
Creating functions.
Importing libraries.
Creating classes.
Generating a wireframe.
Running quality and security scans
Summarizing code.

It has the potential to “really get rid of a lot of the overhead of the 200 characters you have to type before you start on a line of code that means anything to you,” Gornall said. You still have to manually review it for relevance and accuracy within your context. “But you can build something real by taking some guidance from it and getting some ideas of talking points.”

For coding, he said, these ideas may or may not be production-ready, but generative AI helps you talk out how you might solve a problem. So long as you’re using an internal version of GenAI, you can feed in your coding standards, coding styles, policy documents and guideline templates into the chatbot. It will add that content to its own continuous improvement from external training, but keep your prompts and responses locked up.

“You can scan your entire codebase in a ridiculously quick amount of time to say, ‘Find me anything that doesn’t conform to this,’ or ‘Find me anything that’s using this kind of thing that we want to deprecate,’” Gornall said.

Don’t close off your dataset, he advised. You need to continue to train on third-party data too, lest you create an “echo chamber within your model where, because you’re just feeding it your wrong answers, it is going to give you wrong answers.” With the right balance of the two, you can maintain control and mitigate risk.

Generative AI for Documentation

One of the most in-demand productivity enablers is documentation. Internal documentation is key to self-service, but is usually out of date — if it even exists at all — and difficult to find or search.

Add to that, documentation is typically decoupled from the software development workflow, triggering even more context switching and interrupted flow state to go to Notion, Confluence or an external wiki to look something up.

“If you know about developers, if it’s not in their [integrated development environment], if it’s not in their workflow, they will ignore it,” Rosenbauer said.

This makes docs ripe for internal generative AI.

“We felt that developer productivity recently had suffered because of how much was asked to do,” Rosenbauer said. “The cognitive load of the developer is so much higher than it was, in my opinion, 10 or 15 years ago, even with a lot more tooling available.”

“Generative AI is not helping the core current role of an engineer, but it’s getting rid of a lot of the noise. It’s getting rid of a lot of the stuff that can take time but not deliver value.”

—James Gornall, CTS

He reflected on why he and Seth Rosenbauer, his brother and Joggr co-founder quit their jobs as engineering team leads just over a year ago.

For example, Zak Rosenbauer noted, “DevOps, though well-intended, was very painful for a lot of non-DevOps software engineers. Because the ‘shift left’ methodology is important — I think of it as an empowering thing — but it also forces people to do work they weren’t doing before.”

So the Rosenbauers spent the following six months exploring what had triggered that dive in developer productivity and increase in cognitive load. What they realized is that the inadequacy or non-existence of internal documentation is a huge culprit.

As a result, they created Joggr, a generative AI tool — one that “regenerates content,” Zac Rosenbauer said. One of the company’s main focuses is automatically regenerating code snippets to maintain documentation, descriptions, portions of text, links to code and more. About a third of Joggr’s customers currently are working in platform engineering and they expect that practice to grow.

Will GenAI Take Jobs Away?

“The question we get asked quite a lot is: Is it taking our jobs? I don’t think so. I think it’s changing people’s jobs and people will do well to learn how to work with these things and get the most out of them, but I think it is still very early days,” Gornall said.

“Generative AI is not helping the core current role of an engineer, but it’s getting rid of a lot of the noise. It’s getting rid of a lot of the stuff that can take time but not deliver value.”

It is unlikely that the rate of development and adoption of generative AI will slow down, so your organization needed a GenAI policy yesterday. And it must include a plan to train engineers about it.

Just like his search engine native generation learned with the help of Google and StackOverflow, Ivan Lee, CEO and founder of Datasaur, believes that the next-gen CompSci grads will be asking ChatGPT or Copilot. Everyone on a team will have to level up their GenAI knowledge. Don’t forget, identifying flaws in other people’s code is a key part of any engineering job — now you just have to apply that skill to machine-written code, too.

Lee added, “We need to be very careful about knowing how to spot check, understanding the strengths of this technology and the weaknesses.”

Jennifer Riggins is a culture side of tech storyteller, journalist, writer, and event and podcast host, helping to share the stories where culture and technology collide and to translate the impact of the tech we are building. She has been...