Open source applications in Applied Generative AI and Language Processing
Real-world data is far from perfect. It often contains multiple records belonging to the same entity (e.g., customer, property, etc.). These records can come from multiple systems and have variations across different attributes. This makes it hard to combine them together, especially with growing data volumes. Unfortunately, unharmonized data is not fit for use in customer analytics, risk and compliance and data engineers and scientists end up building some sort of rule or heuristic based system to manage it.
This talk will cover Entity Resolution, which is also referred to as identity resolution, record linkage, deduplication or fuzzy matching - the needs and challenges, and introduce open source python package Zingg which can be used to resolve entities at scale.
We will discuss Zingg algorithms and the Python API usage.
Join me here at the world’s largest Applied NLP, LLM, and Generative AI community at this fifth edition of the NLP Summit! (24-26th September)