Content

Hello everyone, Apologies for the slight delay. As promised, I’m sharing more details about our process when working with anonymized datasets. To ensure we deliver the most value to our customers while maintaining full data privacy, we’ve developed two separate processes. This distinction is based on the different needs of our customers. Some customers are interested in data related to a specific region or product category. For them, we rely solely on anonymized datasets, following the first process. The second process is designed for customers who bring their own CRM datasets and wish to target specific outlets. This approach allows for more tailored and strategic insights. First process: Our standard approach begins by sharing one month of data covering all your active outlets, along with a corresponding list of anonymized outlets. We typically recommend splitting this into two separate datasets. The first dataset includes transactional details such as product and category names, prices, costs, quantities, number of guests, timestamps, and the anonymized outlet_id2. The second dataset provides approximate geographical and contextual information, such as country, city, truncated postal code (e.g. 2636XX), rounded latitude and longitude (e.g. 51.9244° N, 4.4777° E becomes 51.9° N, 4.4° E), venue type if available (e.g. Bakery, Café, Restaurant, Florist), and the same anonymized outlet_id2. Splitting the data in this way helps avoid repeating outlet details within each transactional record. It also enables us to link anonymized outlets with their corresponding transactions through outlet_id2. Once we receive the data, our system classifies the products found in the transactions. This classification helps us gain deeper insights into the types of venues we're analyzing, the products they sell, and their consumption patterns. As a final result, we generate an enriched outlet list that is then presented on our platform and made available to our clients. Clients can then use our platform to create anonymized clusters, leveraging a wide range of filters such as detailed market segments (e.g. Hotels, Restaurants, and Cafés > Full-Service Restaurants), estimated annual sales, country, region, city, types of products sold, and more. All analyses are conducted on a cluster of outlets that includes at least three outlets to ensure further anonymization. Second process: To enable this process, we’ll need you to provide an identifiable outlet list that includes the outlet name, exact address, and outlet_id1. This dataset will be used solely for accurate CRM matching between your outlets and our clients CRM datasets, such as those from Coca-Cola, Heineken, and Diageo. To provide more clarity, here is an overview of how the CRM matching process is structured and how the resulting data is intended to be used: Step 1: We receive the client’s CRM dataset, which typically includes fields such as:

  • google_pid (Google Place ID, tracked by the client)

  • crm_id (internal CRM identifier)

  • Address details (country, city, street address, postal code)

  • brand_status (e.g. customer, prospect)

  • brand_segmentation (e.g. gold, silver, bronze — based on client-specific criteria)

  • brand_channel (e.g. café, bar, restaurant)

  • Additional segmentation fields, depending on the client

Step 2: We then match the CRM dataset against the enriched outlet list that includes your outlet_id1. After the matching process, we generate a list of matched outlets containing outlet_id1, along with any relevant brand segmentation fields. This matched list is then shared with you. Step 3: You can use outlet_id1 to link each outlet to its anonymized outlet_id2 in the transactional data. Once this mapping is done, the corresponding brand segmentation fields are returned to us, now associated with outlet_id2. Throughout this process, only you have access to the mapping between outlet_id1 and outlet_id2, meaning you are the only party able to identify individual outlets. To maintain privacy and comply with data protection standards, this segmentation data is used solely to build anonymized clusters on our platform. Each segment must include at least three outlets to prevent reidentification through unique combinations of segmentation attributes. image.png

Last updated