❗⚡ Content tagger

Monday, August 2, 2021

We're proud to announce the completion of a 2-year long development of the final stage of the Jotup summarisation system: a high-accuracy universal text content tagger, and its associated AI backend.

This replaces the previous, 3-rd party provided, tagging system; however it retains an identical tag dictionary, enabling a seamless transition. At long last, the entirety of the Jotup summarisation system is in-house. One outcome of this is that we're no longer limited to how much content we're able to summarise per day/week (and our costs of doing so are drastically reduced).

Based on both machine and crowd learning, the system is aware of approximately 35 million different concepts and entities, and is able to reliably, and within seconds, tag any text-based content with any relevant concepts.

The tagger is able to:

  • Identify both specific (e.g. technologies, people, other named entities) and abstract (e.g. industries, academic disciplines) concepts for any given text
  • For every identified concept, to intelligently normalise it to the most commonly-accepted term, from multiple potential synonyms

The tagger is modular, so in addition to the broad-scope system, the following areas are given separate, and highly-accurate, attention:

  • STEM
  • Biomed, incl. various pathologies (and, therefore, COVID-19, unlike many of alternative tagging systems)
  • Geographic (place names) - these are then placed in the separate 'Geotags' field
  • Financial, incl. blockchain

More scope areas will be added in the future.

Tagging alone would be great. However, Jotup is all about information discovery, so it was also crucial that our content discovery system could match users and their manually-entered interests with content, tagged by AI. This was successfully achieved. The tagger normalises the terms to their most commonly accepted form using crowdlearning; likewise, when users enter their interests (in form of comma-separated tags) into Newsfeeds, this normalisation then occurs again, this time for the user-entered tags. Both sides of the equation are therefore likely to have identical tags.

Below are some comparisons between the old 3rd party ('Tags' field) and new ('Tags_testing' field) tagging systems: