For the past couple of years, OpenAI has felt like a one-man brand. With his showbiz style and fundraising glitz, CEO Sam Altman overshadows all other big names on the firm’s roster. Even his bungled ouster ended with him back on top—and more famous than ever. But look past the charismatic frontman and you get a clearer sense of where this company is going. After all, Altman is not the one building the technology on which its reputation rests. 

That responsibility falls to OpenAI’s twin heads of research—chief research officer Mark Chen and chief scientist Jakub Pachocki. Between them, they share the role of making sure OpenAI stays one step ahead of powerhouse rivals like Google.

I sat down with Chen and Pachocki for an exclusive conversation during a recent trip the pair made to London, where OpenAI set up its first international office in 2023. We talked about how they manage the inherent tension between research and product. We also talked about why they think coding and math are the keys to more capable all-purpose models; what they really mean when they talk about AGI; and what happened to OpenAI’s superalignment team, set up by the firm’s cofounder and former chief scientist Ilya Sutskever to prevent a hypothetical superintelligence from going rogue, which disbanded soon after he quit. 

In particular, I wanted to get a sense of where their heads are at in the run-up to OpenAI’s biggest product release in months: GPT-5.

Reports are out that the firm’s next-generation model will be launched in August. OpenAI’s official line—well, Altman’s—is that it will release GPT-5 “soon.” Anticipation is high. The leaps OpenAI made with GPT-3 and then GPT-4 raised the bar of what was thought possible with this technology. And yet delays to the launch of GPT-5 have fueled rumors that OpenAI has struggled to build a model that meets its own—not to mention everyone else’s—expectations.

Altman has been uncharacteristically modest: “[GPT-5] is an experimental model that incorporates new research techniques we will use in future models,” he posted on X—which makes it sound more like a work in progress than another horizon-shifting release. 

But expectation management is part of the job for a company that for the last several years has set the agenda for the industry. And Chen and Pachocki set the agenda inside OpenAI.

Twin peaks 

The firm’s main London office is in St James’s Park, a few hundred meters east of Buckingham Palace. But I met Chen and Pachocki in a conference room in a coworking space near King’s Cross, which OpenAI keeps as a kind of pied-à-terre in the heart of London’s tech neighborhood (Google DeepMind and Meta are just around the corner). OpenAI’s head of research communications, Laurance Fauconnet, sat with an open laptop at the end of the table. 

Chen, who was wearing a maroon polo shirt, is clean-cut, almost preppy. He’s media trained and comfortable talking to a reporter. (That’s him flirting with a chatbot in the “Introducing GPT-4o” video.) Pachocki, in a black elephant-logo tee, has more of a TV-movie hacker look. He stares at his hands a lot when he speaks.

But the pair are a tighter double act than they first appear. Pachocki summed up their roles. Chen shapes and manages the research teams, he said. “I am responsible for setting the research roadmap and establishing our long-term technical vision.”

“But there’s fluidity in the roles,” Chen said. “We’re both researchers, we pull on technical threads. Whatever we see that we can pull on and fix, that’s what we do.”

Chen joined the company in 2018 after working as a quantitative trader at the Wall Street firm Jane Street Capital, where he developed machine-learning models for futures trading. At OpenAI he spearheaded the creation of DALL-E, the firm’s breakthrough generative image model. He then worked on adding image recognition to GPT‑4 and led the development of Codex, the generative coding model that powers GitHub Copilot.

Pachocki left an academic career in theoretical computer science to join OpenAI in 2017 and replaced Sutskever as chief scientist in 2024. He is the key architect of OpenAI’s so-called reasoning models—especially o1 and o3—which are designed to tackle complex tasks in science, math, and coding. 

When we met they were buzzing, fresh off the high of two new back-to-back wins for their company’s technology.

On July 16, one of OpenAI’s large language models came in second in the AtCoder World Tour Finals, one of the world’s most hardcore programming competitions. On July 19, OpenAI announced that one of its models had achieved gold-medal-level results on the 2025 International Math Olympiad, one of the world’s most prestigious math contests.

The math result made headlines, not only because of OpenAI’s remarkable achievement, but because rival Google DeepMind revealed two days later that one of its models had achieved the same score in the same competition. Google DeepMind had played by the competition’s rules and waited for its results to be checked by the organizers before making an announcement; OpenAI had in effect marked its own answers.

For Chen and Pachocki, the result speaks for itself. Anyway, it’s the programming win they’re most excited about. “I think that’s quite underrated,” Chen told me. A gold medal result in the International Math Olympiad puts you somewhere in the top 20 to 50 competitors, he said. But in the AtCoder contest OpenAI’s model placed in the top two: “To break into a really different tier of human performance—that’s unprecedented.”

Ship, ship, ship!

People at OpenAI still like to say they work at a research lab. But the company is very different from the one it was before the release of ChatGPT three years ago. The firm is now in a race with the biggest and richest technology companies in the world and valued at $300 billion. Envelope-pushing research and eye-catching demos no longer cut it. It needs to ship products and get them into people’s hands—and boy, it does. 

OpenAI has kept up a run of new releases—putting out major updates to its GPT-4 series, launching a string of generative image and video models, and introducing the ability to talk to ChatGPT with your voice. Six months ago it kicked off a new wave of so-called reasoning models with its o1 release, soon followed by o3. And last week it released its browser-using agent Operator to the public. It now claims that more than 400 million people use its products every week and submit 2.5 billion prompts a day. 

OpenAI’s incoming CEO of applications, Fidji Simo, plans to keep up the momentum. In a memo to the company, she told employees she is looking forward to “helping get OpenAI’s technologies into the hands of more people around the world,” where they will “unlock more opportunities for more people than any other technology in history.” Expect the products to keep coming.

I asked how OpenAI juggles open-ended research and product development. “This is something we have been thinking about for a very long time, long before ChatGPT,” Pachocki said. “If we are actually serious about trying to build artificial general intelligence, clearly there will be so much that you can do with this technology along the way, so many tangents you can go down that will be big products.” In other words, keep shaking the tree and harvest what you can.

A talking point that comes up with OpenAI folks is that putting experimental models out into the world was a necessary part of research. The goal was to make people aware of how good this technology had become. “We want to educate people about what’s coming so that we can participate in what will be a very hard societal conversation,” Altman told me back in 2022. The makers of this strange new technology were also curious what it might be for: OpenAI was keen to get it into people’s hands to see what they would do with it.

Is that still the case? They answered at the same time. “Yeah!” Chen said. “To some extent,” Pachocki said. Chen laughed: “No, go ahead.” 

“I wouldn’t say research iterates on product,” said Pachocki. “But now that models are at the edge of the capabilities that can be measured by classical benchmarks and a lot of the long-standing challenges that we’ve been thinking about are starting to fall, we’re at the point where it really is about what the models can do in the real world.”

Like taking on humans in coding competitions. The person who beat OpenAI’s model at this year’s AtCoder contest, held in Japan, was a programmer named Przemysław Dębiak, also known as Psyho. The contest was a puzzle-solving marathon in which competitors had 10 hours to find the most efficient way to solve a complex coding problem. After his win, Psyho posted on X: “I’m completely exhausted … I’m barely alive.”  

Chen and Pachocki have strong ties to the world of competitive coding. Both have competed in international coding contests in the past and Chen coaches the USA Computing Olympiad team. I asked whether that personal enthusiasm for competitive coding colors their sense of how big a deal it is for a model to perform well at such a challenge.

They both laughed. “Definitely,” said Pachocki. “So: Psyho is kind of a legend. He’s been the number one competitor for many years. He’s also actually a friend of mine—we used to compete together in these contests.” Dębiak also used to work with Pachocki at OpenAI.

When Pachocki competed in coding contests he favored those that focused on shorter problems with concrete solutions. But Dębiak liked longer, open-ended problems without an obvious correct answer.

“He used to poke fun at me, saying that the kind of contest I was into will be automated long before the ones he liked,” Pachocki recalled. “So I was seriously invested in the performance of this model in this latest competition.”

Pachocki told me he was glued to the late-night livestream from Tokyo, watching his model come in second: “Psyho resists for now.” 

“We’ve tracked the performance of LLMs on coding contests for a while,” said Chen. “We’ve watched them become better than me, better than Jakub. It feels something like Lee Sedol playing Go.”

Lee is the master Go player who lost a series of matches to DeepMind’s game-playing model AlphaGo in 2016. The results stunned the international Go community and led Lee to give up professional play. Last year he told the New York Times: “Losing to AI, in a sense, meant my entire world was collapsing … I could no longer enjoy the game.” And yet, unlike Lee, Chen and Pachocki are thrilled to be surpassed.   

But why should the rest of us care about these niche wins? It’s clear that this technology—designed to mimic and, ultimately, stand in for human intelligence—is being built by people whose idea of peak intelligence is acing a math contest or holding your own against a legendary coder. Is it a problem that this view of intelligence is skewed toward the mathematical, analytical end of the scale?

“I mean, I think you are right that—you know, selfishly, we do want to create models which accelerate ourselves,” Chen told me. “We see that as a very fast factor to progress.”  

The argument researchers like Chen and Pachocki make is that math and coding are the bedrock for a far more general form of intelligence, one that can solve a wide range of problems in ways we might not have thought of ourselves. “We’re talking about programming and math here,” said Pachocki. “But it’s really about creativity, coming up with novel ideas, connecting ideas from different places.”

Look at the two recent competitions: “In both cases, there were problems which required very hard, out-of-the-box thinking. Psyho spent half the programming competition thinking and then came up with a solution that was really novel and quite different from anything that our model looked at.”

“This is really what we’re after,” Pachocki continued. “How do we get models to discover this sort of novel insight? To actually advance our knowledge? I think they are already capable of that in some limited ways. But I think this technology has the potential to really accelerate scientific progress.” 

I returned to the question about whether the focus on math and programming was a problem, conceding that maybe it’s fine if what we’re building are tools to help us do science. We don’t necessarily want large language models to replace politicians and have people skills, I suggested.

Chen pulled a face and looked up at the ceiling: “Why not?”

What’s missing

OpenAI was founded with a level of hubris that stood out even by Silicon Valley standards, boasting about its goal of building AGI back when talk of AGI still sounded kooky. OpenAI remains as gung-ho about AGI as ever, and it has done more than most to make AGI a mainstream multibillion-dollar concern. It’s not there yet, though. I asked Chen and Pachocki what they think is missing.

“I think the way to envision the future is to really, deeply study the technology that we see today,” Pachocki said. “From the beginning, OpenAI has looked at deep learning as this very mysterious and clearly very powerful technology with a lot of potential. We’ve been trying to understand its bottlenecks. What can it do? What can it not do?”  

At the current cutting edge, Chen said, are reasoning models, which break down problems into smaller, more manageable steps, but even they have limits: “You know, you have these models which know a lot of things but can’t chain that knowledge together. Why is that? Why can’t it do that in a way that humans can?”

OpenAI is throwing everything at answering that question.

“We are probably still, like, at the very beginning of this reasoning paradigm,” Pachocki told me. “Really, we are thinking about how to get these models to learn and explore over the long term and actually deliver very new ideas.”

Chen pushed the point home: “I really don’t consider reasoning done. We’ve definitely not solved it. You have to read so much text to get a kind of approximation of what humans know.”

OpenAI won’t say what data it uses to train its models or give details about their size and shape—only that it is working hard to make all stages of the development process more efficient.

Those efforts make them confident that so-called scaling laws—which suggest that models will continue to get better the more compute you throw at them—show no sign of breaking down.

“I don’t think there’s evidence that scaling laws are dead in any sense,” Chen insisted. “There have always been bottlenecks, right? Sometimes they’re to do with the way models are built. Sometimes they’re to do with data. But fundamentally it’s just about finding the research that breaks you through the current bottleneck.” 

The faith in progress is unshakeable. I brought up something Pachocki had said about AGI in an interview with Nature in May: “When I joined OpenAI in 2017, I was still among the biggest skeptics at the company.” He looked doubtful. 

“I’m not sure I was skeptical about the concept,” he said. “But I think I was—” He paused, looking at his hands on the table in front of him. “When I joined OpenAI, I expected the timelines to be longer to get to the point that we are now.”

“There’s a lot of consequences of AI,” he said. “But the one I think the most about is automated research. When we look at human history, a lot of it is about technological progress, about humans building new technologies. The point when computers can develop new technologies themselves seems like a very important, um, inflection point.

“We already see these models assist scientists. But when they are able to work on longer horizons—when they’re able to establish research programs for themselves—the world will feel meaningfully different.”

For Chen, that ability for models to work by themselves for longer is key. “I mean, I do think everyone has their own definitions of AGI,” he said. “But this concept of autonomous time—just the amount of time that the model can spend making productive progress on a difficult problem without hitting a dead end—that’s one of the big things that we’re after.”

It’s a bold vision—and far beyond the capabilities of today’s models. But I was nevertheless struck by how Chen and Pachocki made AGI sound almost mundane. Compare this with how Sutskever responded when I spoke to him 18 months ago. “It’s going to be monumental, earth-shattering,” he told me. “There will be a before and an after.” Faced with the immensity of what he was building, Sutskever switched the focus of his career from designing better and better models to figuring out how to control a technology that he believed would soon be smarter than himself.

Two years ago Sutskever set up what he called a superalignment team that he would co-lead with another OpenAI safety researcher, Jan Leike. The claim was that this team would funnel a full fifth of OpenAI’s resources into figuring out how to control a hypothetical superintelligence. Today, most of the people on the superalignment team, including Sutskever and Leike, have left the company and the team no longer exists.   

When Leike quit, he said it was because the team had not been given the support he felt it deserved. He posted this on X: “Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity. But over the past years, safety culture and processes have taken a backseat to shiny products.” Other departing researchers shared similar statements.

I asked Chen and Pachocki what they make of such concerns. “A lot of these things are highly personal decisions,” Chen said. “You know, a researcher can kind of, you know—”

He started again. “They might have a belief that the field is going to evolve in a certain way and that their research is going to pan out and is going to bear fruit. And, you know, maybe the company doesn’t reshape in the way that you want it to. It’s a very dynamic field.”

“A lot of these things are personal decisions,” he repeated. “Sometimes the field is just evolving in a way that is less consistent with the way that you’re doing research.”

But alignment, both of them insist, is now part of the core business rather than the concern of one specific team. According to Pachocki, these models don’t work at all unless they work as you expect them to. There’s also little desire to focus on aligning a hypothetical superintelligence with your objectives when doing so with existing models is already enough of a challenge.

“Two years ago the risks that we were imagining were mostly theoretical risks,” Pachocki said. “The world today looks very different, and I think a lot of alignment problems are now very practically motivated.”

Still, experimental technology is being spun into mass-market products faster than ever before. Does that really never lead to disagreements between the two of them?

I am often afforded the luxury of really kind of thinking about the long term, where the technology is headed,” Pachocki said. “Contending with the reality of the process—both in terms of people and also, like, the broader company needs—falls on Mark. It’s not really a disagreement, but there is a natural tension between these different objectives and the different challenges that the company is facing that materializes between us.”

Chen jumped in: “I think it’s just a very delicate balance.”  

By

Leave a Reply

Your email address will not be published. Required fields are marked *