World's largest software provider and second-largest cloud provider 

Developed a method to systematically measure and rank harm within multiple versions of customers’ foundational models while tracking model iterations and parameters.

About the Client

The world’s largest software and second largest cloud provider needed a systematic method of measuring and ranking of responses per level of harm across foundational model iterations and parameters.


To achieve organizational goals, the client’s leading foundational model development team needed a method of measuring the level of harm within different versions of the same foundational model.



  • Developed customized interfaces to facilitate the ranking of responses for harm grading, while tracking model iteration and parameters. 
  • Created customized certification and training process for human experts to fully understand the definitions of harm. 
  • Fine-tuned model versions based on the harm-grading teams’ feedback. 
  • Created and curated a set of taxonomies aligned with the client's safety and governance policies. 


This project resulted in substantial improvements to the safety and reliability of the client’s generative AI.