library(ontologics)
library(dplyr, warn.conflicts = FALSE)
Any work with an ontology would either start by reading it in from an already existing database, or by creating a new ontology from scratch.
Even though this package is still under development, we do already
provide a function that can read in an ontology from an
*.rds
file (one that is optimized for the usage within R),
and can write to any format that is useful for triplestores or the
semantic web. This vignette focuses on the basic building blocks for
creating a new ontology and you can find more on how to map new concepts from external
ontologies, and how to export an
ontology so that it’s interoperable with the semantic web.
# read in example ontology
<- load_ontology(path = system.file("extdata", "crops.rds", package = "ontologics"))
crops
# ... has a pretty show-method
crops #> sources : 1
#> -> 'harmonised' (73)
#>
#> classes : 3
#> ∟ group 20 Groups of crop or livestock commoditi...
#> ∟ class 53 Classes of crop or livestock commodi...
#> ∟ crop 0 Crop or livestock commodities
#>
#> top concepts: 73
#> -> group: 'CEREALS' (10), 'FRUIT' (8), 'VEGETABLES' (6), 'UNGULATES' (5), 'BIOENERGY CROPS' (4), ...
#> -> class: 'Bioenergy herbaceous' (20), 'Barley' (20), 'Fibre crops' (20), 'Flower herbs' (20), 'Grass crops' (20), ...
#> -> crop:
The onto
class is an S3 class with the 3 slots
@sources
, @classes
and @concepts
,
each of which are reflected by an entry in the show-method. Often the
classes in an ontology have a hierarchical order, but this is not
obligatory. In any case, the first three levels of the hierarchical
structure together with the number of concepts of each level and the
description is shown here. Moreover, the five most frequent concepts are
shown together with a visual representation of the frequency
distribution of all concepts at the first three levels.
The three main slots are represented by a function that allows to add
new items to this slot (new_source
, new_class
and new_concept
) and an additional function allows to
create mappings between your focal ontology and any external ontology
(new_mappings
). There is more detailed information about
the architecture of the onto
-class in the vignette Ontology database
description.
A new ontology is built by calling the function
start_ontology()
. This requires a bunch of meta-data that
will be stored in the ontology and which serve the purpose of properly
linking also this ontology to other linked open data.
<- start_ontology(name = "land_surface_properties",
lulc version = "0.0.1",
path = tempdir(),
code = ".xx",
description = "showcase of the ontologics R-package",
homepage = "https://github.com/luckinet/ontologics",
license = "CC-BY-4.0")
lulc#> sources : 1
#> -> 'harmonised' (0)
#>
#> classes : 0
#>
#> top concepts: 0
These information are stored in the @sources
slot, just
like any other external data source. It is recommended to always set the
code
for building IDs with a leading symbol that can’t be
transformed into a numeric/integer, to avoid problems in case the
ontology is opened in a spreadsheet program that may automatically do
this transformation without asking or informing the author.
kable(lulc@sources)
id | label | version | date | description | homepage | license | notes |
---|---|---|---|---|---|---|---|
1 | harmonised | 0.0.1 | 2023-05-10 | showcase of the ontologics R-package | https://github.com/luckinet/ontologics | CC-BY-4.0 |
Next, classes and their hierarchy need to be defined. Each concept is
always a combination of a code, a label and a class. The code must be
unique for each unique concept, but the label or the class can have the
same value for two concepts. For instance, the concept
football
can have the class game
or the class
object
and then mean two different things, despite having
the same label.
# currently it is only possible to set one class at a time
<- new_class(
lulc new = "landcover",
target = NA,
description = "A good definition of landcover",
ontology = lulc)
<- new_class(
lulc new = "land-use",
target = "landcover",
description = "A good definition of land use",
ontology = lulc)
# the class IDs are derived from the code that was previously specified
kable(lulc@classes$harmonised[, 1:6])
id | label | description | has_broader | has_close_match | has_narrower_match |
---|---|---|---|---|---|
.xx | landcover | A good definition of landcover | NA | NA | NA |
.xx.xx | land-use | A good definition of land use | landcover | NA | NA |
Then, new concepts that have these classes can be defined. In case classes are chosen that are not yet defined, you’ll get a warning.
<- c(
lc "Urban fabric", "Industrial, commercial and transport units",
"Mine, dump and construction sites", "Artificial, non-agricultural vegetated areas",
"Temporary cropland", "Permanent cropland", "Heterogeneous agricultural areas",
"Forests", "Other Wooded Areas", "Shrubland", "Herbaceous associations",
"Heterogeneous semi-natural areas", "Open spaces with little or no vegetation",
"Inland wetlands", "Marine wetlands", "Inland waters", "Marine waters"
)
<- new_concept(
lulc new = lc,
class = "landcover",
ontology = lulc
)
kable(lulc@concepts$harmonised[, 1:5])
id | label | description | class | has_broader |
---|---|---|---|---|
.01 | Urban fabric | NA | landcover | NA |
.02 | Industrial, commercial and transport units | NA | landcover | NA |
.03 | Mine, dump and construction sites | NA | landcover | NA |
.04 | Artificial, non-agricultural vegetated areas | NA | landcover | NA |
.05 | Temporary cropland | NA | landcover | NA |
.06 | Permanent cropland | NA | landcover | NA |
.07 | Heterogeneous agricultural areas | NA | landcover | NA |
.08 | Forests | NA | landcover | NA |
.09 | Other Wooded Areas | NA | landcover | NA |
.10 | Shrubland | NA | landcover | NA |
.11 | Herbaceous associations | NA | landcover | NA |
.12 | Heterogeneous semi-natural areas | NA | landcover | NA |
.13 | Open spaces with little or no vegetation | NA | landcover | NA |
.14 | Inland wetlands | NA | landcover | NA |
.15 | Marine wetlands | NA | landcover | NA |
.16 | Inland waters | NA | landcover | NA |
.17 | Marine waters | NA | landcover | NA |
An ontology is different from a vocabulary in that concepts that are contained in an ontology are related semantically to one another. For example, concepts can be nested into other concepts. Hence, let’s create also a second level of concepts that depend on the first level.
<- tibble(
lu concept = c(
"Fallow", "Herbaceous crops", "Temporary grazing",
"Permanent grazing", "Shrub orchards", "Palm plantations",
"Tree orchards", "Woody plantation", "Protective cover",
"Agroforestry", "Mosaic of agricultural-uses",
"Mosaic of agriculture and natural vegetation",
"Undisturbed Forest", "Naturally Regenerating Forest",
"Planted Forest", "Temporally Unstocked Forest"
),broader = c(
rep(lc[5], 3), rep(lc[6], 6),
rep(lc[7], 3), rep(lc[8], 4)
)
)
<- get_concept(label = lu$broader, ontology = lulc) %>%
lulc left_join(lu %>% select(label = broader), .) %>%
new_concept(
new = lu$concept,
broader = .,
class = "land-use",
ontology = lulc
)#> Joining with `by = join_by(label)`
kable(lulc@concepts$harmonised[, 1:5])
id | label | description | class | has_broader |
---|---|---|---|---|
.01 | Urban fabric | NA | landcover | NA |
.02 | Industrial, commercial and transport units | NA | landcover | NA |
.03 | Mine, dump and construction sites | NA | landcover | NA |
.04 | Artificial, non-agricultural vegetated areas | NA | landcover | NA |
.05 | Temporary cropland | NA | landcover | NA |
.05.01 | Fallow | NA | land-use | .05 |
.05.02 | Herbaceous crops | NA | land-use | .05 |
.05.03 | Temporary grazing | NA | land-use | .05 |
.06 | Permanent cropland | NA | landcover | NA |
.06.01 | Permanent grazing | NA | land-use | .06 |
.06.02 | Shrub orchards | NA | land-use | .06 |
.06.03 | Palm plantations | NA | land-use | .06 |
.06.04 | Tree orchards | NA | land-use | .06 |
.06.05 | Woody plantation | NA | land-use | .06 |
.06.06 | Protective cover | NA | land-use | .06 |
.07 | Heterogeneous agricultural areas | NA | landcover | NA |
.07.01 | Agroforestry | NA | land-use | .07 |
.07.02 | Mosaic of agricultural-uses | NA | land-use | .07 |
.07.03 | Mosaic of agriculture and natural vegetation | NA | land-use | .07 |
.08 | Forests | NA | landcover | NA |
.08.01 | Undisturbed Forest | NA | land-use | .08 |
.08.02 | Naturally Regenerating Forest | NA | land-use | .08 |
.08.03 | Planted Forest | NA | land-use | .08 |
.08.04 | Temporally Unstocked Forest | NA | land-use | .08 |
.09 | Other Wooded Areas | NA | landcover | NA |
.10 | Shrubland | NA | landcover | NA |
.11 | Herbaceous associations | NA | landcover | NA |
.12 | Heterogeneous semi-natural areas | NA | landcover | NA |
.13 | Open spaces with little or no vegetation | NA | landcover | NA |
.14 | Inland wetlands | NA | landcover | NA |
.15 | Marine wetlands | NA | landcover | NA |
.16 | Inland waters | NA | landcover | NA |
.17 | Marine waters | NA | landcover | NA |
Here we see that get_concept()
was used to extract those
broader concepts, into which the new level is nested. This is to ensure
that a valid concept is provided, i.e., one that has already been
included into the ontology.