AI based

Safety Sheets

extraction

Generative AI solution

to digitize information in tannery

In the leather industry, which includes major industrial districts in Italy (both in Tuscany and Veneto), safety is a crucial concern. Kode has been collaborating for years with key players in the sector, supporting them in an overall digitalization process aimed at improving operations and safety. One aspect of this process is the digital extraction of information from Safety Data Sheets, which tanning companies are required to catalog and store to inform operators about hazards and necessary precautions for the handling and transportation of these substances.

Our Solution

Challenge

Safety Data Sheets (SDS) are PDF files that provide detailed information on a chemical substance or mixture, using standardized codes according to international norms, such as hazards related to the substance’s handling. In many countries, it is legally required for every marketed chemical substance to be accompanied by an SDS. Furthermore, companies that use chemical substances in their processes are required to centrally organize and manage this information by extracting it from individual files and structuring it in a dedicated database. Manual execution of this task entails significant time and cost, as well as a high risk of errors.

The project’s goal is to develop dedicated software capable of managing and optimizing the information extraction process from each document, leaving human operators responsible only for final validation.

A key challenge in this project lies in the structure of an SDS: although it adheres to a common format defined by regulations, the editorial styles vary greatly, making automated extraction difficult with traditional algorithms.

Architecture

 

The software, developed as a SaaS (with data management delegated to the end user), leverages the power of OpenAI Inc’s generative language models (GPT) to extract specific information from Italian-language SDSs, alongside a network of components designed to integrate into the company’s existing IT ecosystem. The software is structured as a modular platform with components distributed across two interconnected server spaces:

  • Provider server, which hosts all the services forming the system’s operational core
  • User server, which hosts the database where the extracted data is saved.

Features

The compunded solution include features supporting the whole process:

  •  Information Extraction: The extraction algorithm organizes SDS texts into prompts optimized for the GPT generative language model by OpenAI. The prompts are crafted to minimize the input text while obtaining concise and structured outputs from GPT.
  • Pictogram Recognition: The system includes a computer vision algorithm developed by Kode for automatically identifying hazard pictograms, triggered when these icons are not accompanied by corresponding textual codes.
  • Validation and Saving: Users can manually edit all extracted field data, correcting any mistakes or omissions, and entering product and supplier codes generated by the company’s ERP system, enabling integration of the new data with the company’s existing records.
  • Other Features: The system also includes database browsing and extraction functionalities, as well as Authentication/Registration (using Kode’s proprietary Princess Security Service framework module) to ensure system security and privacy. 

Results

From the earliest tests, the software has proven highly effective both in identifying the correct codes and texts and in detecting pictograms.

Thanks to this solution, the company has significantly reduced the time needed to manage SDSs, while providing operators with improved oversight and intervention capabilities.

Want to know the tannery we developed this project for?

Contact us and learn more from its firsthand account.

Contact form

Thank you for your message