How India Loses 12 Billion Dollars over Misunderstanding

Data Sutram
4 min readJul 10, 2020

--

You might be thinking, ‘I could never lose that much money over a misunderstanding’. But if you are an E-Commerce company that operates in India well, you are definitely losing 15% of your revenue on ‘misunderstanding’.

The Digital India Initiative coupled with the ever-decreasing costs of internet has led to a rapid growth in the Internet penetration rate.

[Source]

Nearly 100 million users of the internet, who were added in 2015–20 i.e. over 50% of the current internet users prefer communicating in Hindi, Hinglish or their local dialect rather than English.

Terminology like, “Hanuman mandir ke pass”, “School ke peeche” etc, are commonly used to communicate locations via text messages or in check ins.Thus, when they place an order and need to input their addresses they continue to write in the same manner.

[Source]

Most of the time, technology fails to interpret such terminology. Thus, E-Commerce companies are unable to locate their customers and in 2019 alone this ‘misunderstanding’ cost the E-Commerce industry approximately 12 billion dollars. [Source]

In order to make the ‘Digital India Initiative’ a true success it is important to adapt technology to the way the locals talk. In a country as diverse as India, this problem intensifies as there are over 720 local dialects spoken among 135+ crore people.

Addressing The Misunderstanding

DataSutram has been working on solutions for the same, and has come up with a NLP-Based Address Wrapper to help bridge the gap between local dialect and a machine-interpretable language.

Not only does the address wrapper successfully Geo-Code addresses written in a local dialect, it also picks out the fraudulent orders among them.

It follows a 4 layered procedure to do so:

Layer 1 : Integrity checks

The first step in this layer is to do a basic integrity check to ensure none of the input values are gibberish. For example the integrity checks ensure:

  • Pincode has 6 characters
  • Pincode corresponds to a location within India
  • City name is mentioned and is valid
  • House number is mentioned

Layer 2: Catchphrase Translation

A TF-IDF score calculation algorithm identifies the commonly used phrases that the system is unable to interpret. For eg,

1. Hindi words spelled out in English : For eg, “148, Mina Bazaar market building ke uss paar” the customer means ‘opposite Mina Bazaar Market Building’ but has used ‘ke uss paar’ due to which the current system is unable to recognize it as a valid address.

2. Spelling Errors in English words : For eg, “niyar hanuman temple”, ‘niyar’ has been used instead of ‘near’

These phrases are then appropriately translated,

Layer 3: Landmark Identification

Phrases such as ‘near, opposite, close to” etc were commonly used to refer to a landmark and by analysing the words before and after such phrases the Landmarks referred to in an address are identified. For eg, from “niyar hanuman temple Moti Bazaar” , ‘Hanuman temple” is identified as a landmark.

Layer 4: Checking Address Deliverability

By reverse geo-coding, the identified landmark is located on a map and its pincode is identified. This pincode is cross-validated with the user input pincode to ensure that the address is genuine.

If the address was successfully verified then it was recognized as a ‘deliverable address’.

Impact

Currently, this Module categorises 85% user-input addresses as deliverable and will improve itself when trained with additional data for each region. Making sure, that technology adapts to humans and not the other way around. Besides, it helps E-Commerce companies cut down on their costs.

Consider that you are an E-Commerce company, as of now, around 35% of the addresses are flagged as incomplete, this means a direct opportunity loss.

And around 10% of your orders are getting delivered to wrong addresses and being Returned to Origin (RTO) , which implies, wasted resources (both time & money) on the attempted delivery.

DataSutram’s NLP based Address Wrapper helps resolve these problems and equips you to take up and deliver an order, irrespective of how inaccurately the address was typed out. Deploying this model, makes it possible to

  • Deliver to 86% of the addresses previously deemed as ‘incomplete’.
  • Cross-check user input addresses with nearby landmarks to reduce RTO cases on delivery.

--

--

Data Sutram
Data Sutram

Written by Data Sutram

Simplyfing Intelligence to make data accessible, relatable & easy to understand.

Responses (1)