In TheyBuyForYou we have been working on a layered architecture of data services, ontologies, core APIs and tools that allows different levels of access and use of our procurement knowledge graph.
A layer-based architecture allows separating the different services so that most of the interaction occurs only between adjacent layers and any change in a technology does not affect the rest of the services. As shown in the figure "Tech Stack", five main layers have been defined, those corresponding to data, tools, schemas, core APIs and added-value services that will be explained below.
Tech Stack
High-level Architecture
Data
This bottom layer contains the data that feeds both, the knowledge graph and the document database. The knowledge graph data is obtained from the OpenOpps and OpenCorporates datasets and, through the data ingestion tool, they are transformed into RDF format.
TBFY Knowledge Graph (KG)
Document repository
Database that contains the set of legal documents indexed form Harvester.
Schemas
This layer contains the vocabularies of our domain. These vocabularies are the intermediaries that get the knowledge graph to be understood with tools like SPARQL GUI or R4R.
TBFY ontology
The TBFY ontology imports the OCDS ontology (for procurement data) and the euBusinessGraph ontology (for company data). In addition, it contains a few extensions in order to represent additional meta information needed for the TBFY KG.
euBusinessGraph ontology
Tools
This layer contains the tools built or used to create the Knowledge Graph and provide access to it. We have to distinguish between tools created specifically for the project (internal) and tools that have not been developed specifically for this project but have been used (external). Among the types of tools, there are those tools that feed databases to those ones that query the TheyBuyForYou SPARQL endpoint.
Internal
Harvester
Harvester downloads articles and legal documents from public procurement sources (OpenOpps, JRC-Acquis or TED) and indexes them into SOLR to allow performing complex queries and visualising results through Banana.
R4R
It allows building and deploying RESTful services from SPARQL queries. The core API uses it to browse the TBFY knowledge graph.
KG data ingestion pipeline
Data ingestion pipeline downloads OCDS releases in JSON format and reconciled supplier-company records in JSON format, enriches and transforms the data to RDF (using RML), and publishes the data to the TBFY KG database.
External
SPARQL GUI for TBFY KG
It uses YASGUI (Yet Another SPARQL GUI) as a web application to query any SPARQL endpoint.
OptiqueVQS
core APIs
This layer contains the set of core APIs built or used in the project. We have to distinguish between APIs created specifically for the project (internal) and tools that have not been developed specifically for this project but have been used (external). These core APIs are implemented with the basic resources to extract information from the knowledge graph, from the document repository or even from external data sources.
internal
knowledge graph API
Public procurement OCDS API
external
OpenCorporates companies API
OpenCorporates companies API provides access to data about 135 million companies from primary public sources.
OpenCorporates reconciliation API
OpenCorporates reconciliation API allows OpenRefine users to match company names to legal corporate entities getting more information about companies.
OpenOpps API
OpenOpps API provides access to tender and contract data from a range of European government bodies, formatting according to OCDS.
librAIry API
Wikifier Web Service
Wikifier takes a text document as input and annotates it with links to relevant Wikipedia concepts.
Spend Network’s Classification tool
The classification tool is an advanced classifier to add multiple labels to procurement notices based on the Common Procurement Vocabulary, or CPV. This classifier gives notices five, scored, Level 3 CPV codes based on their text and description.
Added-value services
In this top layer we find non-basic services and tools, which go beyond standard ones and have extended features and add-ons to basic core functions.
API Gateway
search API
Storytelling
Suppliers notebook
Organisation comparison notebook
Streamstory
Anomaly detection
Average payment period to suppliers
Average Payment Period to suppliers is an indicator that measures the delay in the payment of commercial debts in economic terms for entities associated to the Zaragoza city council.
COPIN (COmpra Pública INclusiva)
COPIN (COmpra Pública INclusiva) aims at providing better understanding on how public administrations specify and evaluate public tenders.
Online KG data comparison tool
It provides, through a web interface, the analysis of tender and award data, extracted from the Knowledge Graph through the core API and Search API.