The North Carolina Free Enterprise Foundation and the NC State Board of Elections (NCSBE) wanted to improve transparency around campaign finance data. While the NCSBE already stored campaign finance data, they lacked the capability to provide comprehensive historical patterns and insights.
Understanding who was spending money on campaigns and how that influenced political decisions within North Carolina was critical for public transparency.
Political funding data in North Carolina presented two major challenges: some records were available in digital format, while others were scanned PDFs that required manual transcription.
Moreover, the data was fraught with inconsistencies and inaccuracies due to manual entry errors. These issues made it nearly impossible to establish funding patterns or consolidate insights.
To address this, NCSBE engaged with CrossroadsCX, a boutique data consultancy firm to build a model data platform. The vision for the platform was bold - to make data easily accessible to the public, while ensuring it was clean and organized for deeper analysis.
The problems with inconsistent and unresolved data had stalled similar projects in the past.
"Entity resolution was just too much of a bear to successfully pull off a project like this,"
noted Steinmetz (Co-founder at CrossroadsCX) - underlining the need for a powerful solution.
The CrossroadsCX team leveraged Zingg to resolve inconsistencies, including identifying different forms of the same organization, such as "New Belgium Brewing Co." and "New Belgium Brewery PAC."
Steinmetz elaborated,
"Zingg helps identify if records are related to each other, like if names are spelled differently or if there are duplicated records. We can see these are clearly the same entity, but it’s not machine-readable. That's where Zingg comes in.”
Zingg’s entity resolution engine is key in identifying and consolidating entities with varied spellings and entries, especially for manually entered data.
"We’re not just doing this based on name. We’re also matching based on address, city, and other fields."
Zingg groups together all instances of a single entity under a unified, cleaned-up name. Using a combination of name, address, and other metadata, Zingg matches up records and eliminates redundant entries, a crucial step for North Carolina funding data analysis.
For instance, Zingg identifies 22 distinct ways to record "Facebook" for significant expenses.
All in all, Zingg found ~ 17,000 clusters (groups of names that refer to the same person or organization) - revealing the true scope of financial transactions amounting to $250 million, which would have otherwise been difficult to understand.
Its effectiveness lies in its ability to match entities across multiple fields, ensuring high-quality matches thereby improving accuracy.
Steinmetz stated,
"Our plan was to pull the digitally submitted records from the NCSBE website and transcribe scanned PDF reports",
All data files were stored in Google Cloud Storage, where Google Cloud Functions triggered the processing tasks.
The data was cleansed using a multi-step approach:
The project handles a database with ~ 50 million transactions. Not all transactions need processing - only those related to named entities. This reduces the data to ~ 500,000 unique organizations.
A transaction ID is used to map these organizations back to individual transactions, ensuring that each transaction can be accurately attributed to the correct entity.
For example, once "New Belgium Brewing Company" is selected as the master entity name, the system aggregates all transactions under this unified name, regardless of how the entity appeared initially in the raw data.
The cleaned data is then sent to Snowflake using Snowpipe. The analytic layers directly consume the Zingg results in Snowflake. Tableau Public and d3.js were used to build interactive dashboards
Zingg entity resolution successfully resolves over 500,000 unique organizations from a dataset of over 50 million transactions. The platform now provides valuable insights into political spending in North Carolina.
Zingg successfully identifies multiple variations of the same entity.
The visualizations allowed users to explore spending patterns, and relationships between political contributors/recipients.
The public dashboards include comparisons & trends in campaign funding, enabling users to filter data by entities, committees, and spending categories.
All Image Credits: Jimmy Steinmetz (CrossroadsCX)