NLP Summarization: Synthetic Text Message Collection for Machine Learning

A leading technology company was looking to enhance machine learning ability to recognize and replicate conversational text messages.
Group of people standing in a circle, each using their own smartphone.

A leading technology company was looking to enhance machine learning ability to recognize and replicate conversational text messages. They were looking to create a high volume and high-quality synthetic text message datasets in Arabic, Mandarin, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.

How we helped:

Centific delivered over 1M high-quality synthetic text messages across 11 languages spanning America, Europe, and Asia using its advanced AI and Machine Learning solution. Our team worked with PowerApps and PowerBI teams to enable NLP to help citizen developers to build their products.

  • Centific tapped into its global partners and almost 600,000+ crowd resource pool to collect and deliver synthetic crowd-made text messages.
  • A high volume of synthetic text messages was received in a very short period, so the team had to quickly adjust to yield and maintain a consistent level of data diversity and quality.
  • Centific engaged QA experts in each language to provide optimum high-quality deliverables.
  • Our team evaluated AI-generated rephrase against actual user query (match intension 1, not matching 0), and provided suggested rephrase to train AI.
  • Delivered AI engine optimization insights from solution learnings.