Improve Writing in the Atlassian Editor
How I achieved the first statistically significant quality improvement in Atlassian Editor's most-used AI feature, and doubled montly active users to 289,000+ users.
Improve Writing was the most popular AI writing tool in the Atlassian Editor, but also a persistent source of negative feedback.
The output was verbose and robotic. A feature that was meant to make writing clearer was doing the opposite.
The machine learning team had experimented to improve the feature from a tactical lens — tweaking parameters, testing latency, maintaining locales. But they hadn't looked at it from the lens of what makes great writing.
I proposed a new collaboration model: content design x MLE, combining language expertise with experimentation infrastructure.
This approach has since become a defining model for prompt engineering at Atlassian:
- 1
Audit outputs and customer feedback to understand where and how the feature could be improved.
- 2
Define actionable rules that are specific enough for the LLM to follow and let the feature live up to its name.
- 3
Create a golden dataset of 60+ examples and run an evaluation pipeline across it to determine the outputs of different prompt iterations.
- 4
A/B test with customers to determine the results between control and test variants.
- 5
Ship and measure impact.
During the A/B test, we saw a statistically significant lift in insertion rate — the first statistically significant quality change the feature had seen in its two-year history.
Following the release of our new prompt, daily usage more than tripled from 33,000 to 100,000+ uses per day, and monthly active users more than doubled from 120,000 to 289,000+ users. The feature is now the fastest-growing in Editor AI.
It also changed the way content and machine learning collaborate at Atlassian. Content design expertise is now seen as a key driver of prompt writing, golden dataset creation, and LLM evaluation — and our project became the template other teams follow.