Leading innovator in creative ideas, Blue Pencil, owned by tāshi, drives quantifiable social, financial, and environmental impact by merging modern technologies with strategic advisory services. Through its SaaS model, Blue Pencil facilitates seamless collaboration among corporations, funding organizations, and execution partners to implement high-impact social initiatives. In 2022, the organization enabled companies to distribute ₹1.75 billion in social assistance, positively impacting approximately 450,000 individuals. With real-time data tracking capabilities and full compliance with legal standards, Blue Pencil ensures transparency in reporting, making it a trusted partner for businesses integrating social responsibility into their operational strategies.
We, at Zingg, talked with Lalit Pagaria, the VP of Technology at tāshi, to learn more about the company and how they are using Zingg.
The social impact SaaS platform, Blue Pencil, was facing challenges while handling massive amounts of user data. Disjoint and disparate records hampered the effectiveness of their real time data tracking of social, environmental and economic impact. It also made reporting and social impact analysis difficult. This created a need for an automated and effective way of handling millions of records.
As the VP of Technology, I oversee the entire technology function at tāshi. My role focuses on driving innovation, managing our platform's architecture, and ensuring we utilize AI-driven solutions like Zingg to scale the social impact efforts of our partners.
Our platform is hosted on AWS, utilizing Kubernetes and Apache Airflow for orchestrating batch data processes, including identity resolution. This architecture allows us to process millions of records efficiently and deliver real-time tracking of social benefits. Our data stack is designed for high scalability, enabling us to meet the demands of complex, large-scale social programs.
Entity resolution is crucial for maintaining data integrity across programs, which directly impacts the accuracy and effectiveness of the social initiative. As we manage programs with hundreds of thousands of users, resolving duplicate records and preventing fraud are essential to ensure resources are allocated correctly. With Zingg integrated into our platform, we have streamlined the process of identifying and merging duplicate records, enhancing the integrity of our data. This precise tracking is vital for evaluating program outcomes and maintaining trust with our stakeholders.
Initially, we relied on manual processes and tools like Google’s Open Refine for basic data matching, but this approach proved inefficient and lacked scalability. Custom solutions using Python for fuzzy matching and deduplication were cumbersome when dealing with large datasets. These methods highlighted the need for a more advanced, automated system to handle complex entity resolution across various data sets.
The integration of Zingg significantly improved our ability to manage and resolve user data across multiple programs. We processed million records, allowing us to efficiently cluster and cleanse data, which enhanced program integrity. The setup took about a month due to initial dependencies, but once operational, Zingg enabled us to scale our data management efforts. Despite early challenges with its command-line interface, the results have been transformative, allowing us to maintain high standards of data accuracy.
We use Zingg to cleanse and resolve duplicates within our application’s database. The results are integrated into a user interface for program members to review and resolve any discrepancies. For data points below a certain confidence threshold, we apply manual validation using custom SQL queries. We’ve also semi-automated this process using Zingg’s Python API and FastAPI, enabling more efficient, ongoing data management.
Accurate entity resolution has significantly enhanced our operational efficiency. By leveraging Zingg and other data validation processes, we successfully identify leakage by merging duplicate records, improving fraud detection, and optimizing resource allocation. This precision in data management has allowed us to effectively ensure that social benefits reach the people who need them most.