Categories
APIs Platforms Tools

Do-it-yourself NLP versus wit, LUIS, or api.ai

 

NPL_bot_

 

Alex and I have been building bots for about 1.5 years and have talked to hundreds of bot devs through our BotsBerlin meetup, which now has over 1,000 members. Something we get asked a lot is whether it’s worth investing in building your own NLP engine, or whether it makes sense to use a third party service like wit.ai, LUIS, or api.ai.

What does a chatbot’s NLP engine do?

Let’s say you’re building a restaurant bot. These tools will help you take a sentence typed by a human, and turn them into structured data, for example:

 

NLP Module chatbots

 

Do you build yours or use third-party tools? Let us know in our DE Survey.

The structure on the right is something computers can actually work with, and you can pass this on to the business logic of your bot. For example, you would probably query the Foursquare API and fetch a list of restaurants. If there are some popular restaurants matching those constraints, you would probably suggest those to your user. If not, you might suggest a Chinese restaurant instead.

NLP-api-chatbots

Foursquare has already done the hard work of finding matching restaurants, so the trickiest part of building this MVP is finding a way to generate structured data from natural language. The great thing about tools like wit, LUIS, and api.ai is that they make this part so easy that you can build an MVP like the above in an afternoon. In our experience, 3rd party tools are an excellent way to build quick prototypes. You could just as quickly build a bot to find videos with the YouTube API, or products from Product Hunt.

Reasons to do it yourself

If your restaurant bot is a runaway success, you will inevitably want to become independent. We see that the more advanced bot teams are all developing their own NLP. Data from the Developer Economics surveys, which polled the opinions of thousands of developers interested in chatbots, are pointing towards a democratisation of chatbots through open source projects (there’s a live survey out now if you want to contribute to this knowledge pool).
Here are three real-life examples of why people switch.

API constraints

databot was a Slack app we built at the start of 2016. Databot would connect your data warehouse to your Slack, so you could ask

what was the ROI like for October’s facebook ads?

and databot would generate the corresponding SQL query and answer your question.

We started off using wit.ai, which would always default to guessing that October referred to the following October, not the previous one. So we had a lot of fun with our date library to build a workaround. Of course wit could add a feature to let you customise this default, but that’s missing the more general point. If you use an API you are have to live with someone else’s engineering decisions, and that friction tends to grow as your project matures.

Data ownership

We talked to a startup building a commerce bot, specifically one which let you look for presents for friends and family and find good deals, e.g. “my sister likes running and craft coffee and I want to spend around $30”. For them, gathering the data around people’s purchasing intentions is core to the value of their business, and they want to make sure it belongs to them. Moreover, for privacy sensitive verticals like insurance, health, and banking, sending every message to a 3rd party is not an option, users and businesses just aren’t comfortable with it.

Performance

Admithub is an education startup. This team actually has one of the most technically advanced NLP modules I’ve seen, it can recognise thousands of intents. Their bot helps university students by updating them about events and deadlines, and can answer questions ranging from “when are housing applications due?” to “can I have a salamander in my dorm room”.

AdmitHub found very quickly that third party tools weren’t up to this task (they tend to optimise for the small data use case, performing well even when a developer is getting started and there are only a few examples). Most also failed to handle misspelled words, which are common when chatting with teenagers. While simple bots are generalizable, sophisticated bots are all complicated in their own way. Every algorithm has trade-offs, and a one-size-fits-all approach can let you down when your use case becomes more advanced.

Bonus: Control your own fate

Ultimately, technological independence is compelling for many teams. It’s great to use free tools developed by big tech companies, but they may not stay free (Microsoft have started charging for LUIS) and they may disappear with little notice (like Parse did).

The rise of do-it-yourself NLP

{wit,LUIS,api}.ai are wonderful tools that make prototyping very quick. But from talking to dozens of bot teams, I’m convinced that everyone will eventually become independent. Early indications from the state of AI survey are that virtually all businesses are uncomfortable relying on APIs for their AI, and that doesn’t surprise me given the examples I’ve just talked about. The engineering case is that web APIs just aren’t the solution to every problem in programming. The business case is that you really want to own your data and be independent.

In 2017 we will see the bots that have traction moving away from 3rd party NLP services. The biggest drawback, until now, has been the engineering investment and machine learning talent required to build a custom NLP engine. It makes no sense every bot team to reinvent the same things, so at LASTMILE we decided to open source ours. You can find out more at rasa.ai

 

Are you involved in ML and/or AI? Take the Developer Economics Survey and shape the future of ML/AI development.