Analysis Featured

Semantic Fingerprinting: Natural Language for the Finance Industry

semantic fingerprinting

The financial industry relies on language as much as numbers. Communicating with potential customers, marketing, documentation, etc. all are activities that require words. A financial services company may need a way to find documents based on meaning that delivers fewer false positives. A bank may need to extract topics from various data sources like social media or email so that statistical analysis can be performed. An online lender may need to automatically categorize messages so that intent can be determined and routed to the correct department.

But current language processing is based on statistical modeling instead of natural language understanding, and that leaves financial businesses scrambling to make their systems fit typical uses.

Semantic Fingerprinting With Boolean Logic

“A whole bunch of problems are implied by this,” Francisco Webber, CEO of, said. “The statistical processing model is hard to apply to a business environment. Solutions tend to be labor-intensive, expensive, and insecure. I ended up changing the approach completely by using a brain-based approach.”

The human brain is the reference representation for natural language processing, so by understanding how the brain does it, Webber can get insight into how computers can do it. He said it works better than traditional methods and takes less effort.

Webber called this representation of language according to the principles of the human cortex “a semantic fingerprint of text.” Using Boolean logic, does not rely on additional machine learning programming. Their system can represent any text in any language as a semantic fingerprint to classify, filter, or search documents within a business context.

The Genesis of Semantic Fingerprinting

The team came out of an initial startup specializing in patent retrieval, which was sold to partners. A part of the team regrouped as in 2012.

“We used experience to provide a better solution,” Webber said.

Research funding helped them create the first prototype. In 2013 they found an angel who brought the total funding to $6.5 million by investing in the prototype. Now, there is a version 2.0.

“To convert a word into a semantic fingerprint, we use a Boolean vector,” Webber said. “It uses 128 x 128 bits and 16K features. Our specific process adds to the distribution of features a topology that makes it comparable. For example, the word “organ” has a semantic fingerprint that overlaps with both “piano” and “liver” because the word can be used in different contexts. Our engine is trained to discern which “organ” is wanted based on a collection of reference materials.”

How Semantic Fingerprinting Works defines a semantic map that folds into every context and is true for every word that occurs within that context. They have selected about 400,000 Wikipedia pages to end up with every word in the collection. This enables them to convert any text into a fingerprint, then add all the words together that make up a fingerprint representation of the sentence. Thus, they can compare words to sentences and paragraphs since any fingerprint can be compared to any other fingerprint.

“We can throw a fingerprint at the engine and look for all contexts where that fingerprint can occur,” Webber said. “So for the possible context of the word, ‘organ,’ the algorithm finds all the contexts in which it appears including ‘liver’ and ‘piano.’ The building blocks are all packaged into a library so we can scale to any problem. We cope with terabytes of data because the algorithm is perfectly parallel. It’s the fastest you can do on modern computers.”

This semantic fingerprint system is more efficient than other natural language processing systems. It has more features and is more fine-grained. Webber said this is the effect of trying to do things the same way a human brain does. This flexibility means that search engines can be easily tailored to a specific site with contract risk analysis, document search, topic detection, and classification of messages, for example.

“Filtering in social media space is a great demonstration of the value of semantic fingerprinting,” Webber said, “because we want a 360-degree view of the brand. We want every tweet relevant to smartphones. We want to process with an analytics packet. You get about 20,000 tweets per second with traditional methods. Matching every tweet with 200 keywords, you get 20 million comparisons per second. We can do 50,000 fingerprints per second and filter out interesting messages.”

Semantic Fingerprinting is Useful For a Variety of Business Processes

Other uses for semantic fingerprinting are highly individualized news feeds and product recommendations. Since the filtering is inexpensive, it can be offered on a “per user” basis. HR departments could utilize the semantic fingerprint instead of manually editing resumes, even if matching an English job description with a French resume. In another example, customer support cases often solve similar problems. Traditional methods of matching are complex, but by adequately fingerprinting documents, all possible duplicate representations can be matched and new customer support cases can be resolved with complete references to any pertinent data.

“Most of our customers come from the financial space,” Webber said. “In something like email compliance monitoring, we can find out if a message needs to be inspected for fraud or regulated items. A big New York bank is a large customer for contract intelligence. On the west coast, a network manufacturer uses it for support functionality. Everything is rooted in the fact that we evolve text into semantic fingerprints and overlap measurements to derive business value.”

Webber said that’s competition has had a marketing effect on the company. IBM Watson is the leader in natural language processing and’s main competitor, but they use a conventional approach to the problem.

“We get called by customers who say they have tried everything including building themselves with elastic search and deploying IBM Watson, but their results are not good enough. If you get 20,000 false positives per day, you need a big team to filter through them. These companies come to us, and a couple of weeks later, they realize our technology does high-precision, high-volume work with low false positives. We solve their problem.”

This semantic approach can be used with higher volumes of data and filter for meaning without keywords in any technology sector where a language model is used, such as speech-to-text.

“Everybody is talking about bots, but a bot doesn’t actually understand what people tell them,” Webber said. “In any application where it makes sense to understand what the text means, our technology is performing well.” is being used in finance, fintech, insurtech, manufacturers of complex goods, and consumer good companies. Their lean team of 15 people handles about 20 concurrent projects and can ramp up to more.

What the Future Holds — for the Internet and for

“Our main approach is to create a proof-of-concept,” Webber said, “in order to get people acquainted with the technology. In the future, we plan to have smaller packages for smaller companies who want to run as a service in the cloud. Our current focus is on enterprise deployments, at about $100K per node per year. For very large companies, we also offer a flat fee of $1.7 million.”’s projected return for this year is $5 million.

The semantic engine could be a component that integrates with any other documentation system. It has a lot of potential applications. Startups and small companies can be provided with an API of functionality without investments or needing to know how it works — it can be deployed on a pay-per-use basis. In the future, it may be implemented inside hardware allowing a web-scale application.

“Advertising is not the future of the internet,” Webber said. “The future is in real personalization, getting a system so smart you can see the world the way you want to see it instead of the way some statistical engine puts it in front of you.”

For any financial entity that uses language to communicate with customers and potential customers, opens the door wider by efficiently simplifying processes so the business can flourish.


Written by Nicki Jacoby.


default image

"Your daily letter is great!" , Ram , Founder and CEO, PeerIQ

default image

"Hi George - just want to tell you that you are doing a great work with Lending Times;-) Brgds, Kasper" , Kasper, Partner and Co-founder at Dansk Faktura Børs A/S

default image

"I've been following your newsletter for some time now and have been very impressed with the content." Charlie,Co-Founder | Bolstr

default image

"Hey George, I must say I really enjoy your site. It has inspired me to do some changes at our platform and we are the biggest consumer lender in Sweden." , Ludwig, CEO @ Savelend Sweden AB

default image

"Your daily email is very useful. It gives quick update on what's going in the market. Thank you very much for all that info." Yann Murciano, Head of Base Metals Trading at Morgan Stanley

Our daily p2p news digest

Daily News Summary Digest Sent Daily To Your Inbox