Blue Pencil Elevates Social Impact with AI Driven Entity Resolution

The Problem

A social impact SaaS platform tāshi, owned Blue Pencil, was facing challenges while handling massive amounts of user data. Disjoint and disparate records hampered the effectiveness of their real time data tracking of social, environmental and economic impact. It also made reporting and social impact analysis difficult. This created a need for an automated and effective way of handling millions of records.

The User

Leading innovator in creative ideas, tāshi drives quantifiable social, financial, and environmental impact by merging modern technologies with strategic advice services. Using our SaaS model, we enable smooth cooperation among companies, funding organizations, and partners in execution to carry out very influential social initiatives for our spouses. 2022 has seen us enable companies to distribute ₹ 1.75 billion in social assistance, improving the lot of approximately 450,000 people. Our transparency in reporting is guaranteed by real-time data tracking features and in line with legal requirements, which qualifies us as a reliable friend for companies wanting to include social responsibility into their operations tactics.

We talked with Lalit Pagaria, the VP of Technology at tāshi, to learn more about the company and how they are using Zingg.

What is your role within tāshi?

As the VP of Technology, I oversee the entire technology function at tāshi. My role focuses on driving innovation, managing our platform's architecture, and ensuring we utilize AI-driven solutions like Zingg to scale the social impact efforts of our partners.

Can you share details about your data stack?

Our platform is hosted on AWS, utilizing Kubernetes and Apache Airflow for orchestrating batch data processes, including identity resolution. This architecture allows us to process millions of records efficiently and deliver real-time tracking of social benefits. Our data stack is designed for high scalability, enabling us to meet the demands of complex, large-scale social programs.

Why is entity resolution essential for tāshi?

Entity resolution is crucial for maintaining data integrity across programs, which directly impacts the accuracy and effectiveness of the social initiative. As we manage programs with hundreds of thousands of users, resolving duplicate records and preventing fraud are essential to ensure resources are allocated correctly. With Zingg integrated into our platform, we have streamlined the process of identifying and merging duplicate records, enhancing the integrity of our data. This precise tracking is vital for evaluating program outcomes and maintaining trust with our stakeholders.

How did you approach entity resolution previously, and what challenges did you face?

Initially, we relied on manual processes and tools like Google’s Open Refine for basic data matching, but this approach proved inefficient and lacked scalability. Custom solutions using Python for fuzzy matching and deduplication were cumbersome when dealing with large datasets. These methods highlighted the need for a more advanced, automated system to handle complex entity resolution across various data sets.

What has been your experience using Zingg for entity resolution?

The integration of Zingg significantly improved our ability to manage and resolve user data across multiple programs. We processed million records, allowing us to efficiently cluster and cleanse data, which enhanced program integrity. The setup took about a month due to initial dependencies, but once operational, Zingg enabled us to scale our data management efforts. Despite early challenges with its command-line interface, the results have been transformative, allowing us to maintain high standards of data accuracy.

How do you consume entity-resolved results from Zingg?

We use Zingg to cleanse and resolve duplicates within our application’s database. The results are integrated into a user interface for program members to review and resolve any discrepancies. For data points below a certain confidence threshold, we apply manual validation using custom SQL queries. We’ve also semi-automated this process using Zingg’s Python API and FastAPI, enabling more efficient, ongoing data management.

What impact have resolved entities had on your business?

Accurate entity resolution has significantly enhanced our operational efficiency. By leveraging Zingg and other data validation processes, we successfully identify leakage by merging duplicate records, improving fraud detection, and optimizing resource allocation. This precision in data management has allowed us to effectively ensure that social benefits reach the people who need them most.