charlatan is a wee bit complex. This vignette aims to
help you contribute to the package. For a general introduction on
contributing to rOpenSci packages see our Contributing
guide.
Let’s start with some definitions.
For the purposes of this package:
charlatan. For example, we have providers for phone
numbers, addresses and people’s names. Adding a provider may involve a
single file, more than one file; and a single R6 class or many R6
classes.en-US,
en-GB). Some fakers won’t have any locales, whereas others
can have many.If you aren’t familiar with R6, have a look at the R6 website, in particular the introductory vignette.
Open an issue if you want to add a new provider or locale to an existing provider; it helps make sure there’s no duplicated effort and we can help make sure you have the knowledge you need.
Providers are generally first created by making an R6 class. Let’s
start with a heavily simplified base R6 class that defines some utility
methods. We call it BaseProvider in charlatan,
but here we’ll call it MyBaseProvider to avoid
confusion.
library(R6)
MyBaseProvider <- R6::R6Class(
  'MyBaseProvider',
  public = list(
    random_element = function(x) {
      if (length(x) == 0) return('')
      if (inherits(x, "character")) if (!any(nzchar(x))) return('')
      x[sample.int(n = length(x), size = 1)]
    },
    random_int = function(min = 0, max = 9999, size = 1) {
      stopifnot(max >= min)
      num <- max - min + 1
      sample.int(n = num, size = size, replace = TRUE) + (min - 1)
    }
  )
)If you don’t need to handle locales it becomes simpler:
FooBar <- R6::R6Class(
  'FooBar',
  inherit = charlatan::BaseProvider,
  public = list(
    integer = function(n = 1, min = 1, max = 1000) {
      super$random_int(min, max, n)
    }
  )
)We can create an instance of the FooBar class by calling
$new() on it. It only has one method
integer(), which we can call to get a random integer.
x <- FooBar$new()
x
#> <FooBar>
#>   Inherits from: <BaseProvider>
#>   Public:
#>     bothify: function (text = "## ??") 
#>     check_locale: function (x) 
#>     clone: function (deep = FALSE) 
#>     integer: function (n = 1, min = 1, max = 1000) 
#>     lexify: function (text = "????") 
#>     numerify: function (text = "###") 
#>     random_digit: function () 
#>     random_digit_not_zero: function () 
#>     random_digit_not_zero_or_empty: function () 
#>     random_digit_or_empty: function () 
#>     random_element: function (x) 
#>     random_element_prob: function (x) 
#>     random_int: function (min = 0, max = 9999, size = 1) 
#>     random_letter: function () 
#>     randomize_nb_elements: function (number = 10, le = FALSE, ge = FALSE, min = NULL, max = NULL)
x$integer()
#> [1] 40If your provider will need to handle different locales, it gets a bit more complex. In the Python library faker from which this package draws inspiration, you can create separate folders for each provider within the Python library.
However, R doesn’t allow this, so instead we categorize different locales for each provider within the file names. For example, for the address provider we have files in the package:
Where the latter two provides specific data for each locale, and the
first file has the AddressProvider class that pulls in the
locale specific data.
Here, we’ll create a very simplified AddressProvider
class using an example locale file.
library(charlatan)
file <- system.file("examples", "address-provider-en_US.R", package = "charlatan")
source(file)
MyAddressProvider <- R6::R6Class(
  inherit = MyBaseProvider,
  'MyAddressProvider',
  lock_objects = FALSE,
  public = list(
    locale = NULL,
    city_suffixes = NULL,
    initialize = function() {
      self$locale <- 'en_us'
      self$city_suffixes <-
        eval(parse(text = paste0("city_suffixes_", self$locale)))
    },
    city_suffix = function() {
      super$random_element(self$city_suffixes)
    }
  )
)We can create an instance of the MyAddressProvider class
by calling $new() on it. It only has one method
city_suffix(), which we can call to get a random city
suffix.
x <- MyAddressProvider$new()
x
#> <MyAddressProvider>
#>   Inherits from: <MyBaseProvider>
#>   Public:
#>     city_suffix: function () 
#>     city_suffixes: town ton land ville berg burgh borough bury view port mo ...
#>     clone: function (deep = FALSE) 
#>     initialize: function () 
#>     locale: en_us
#>     random_element: function (x) 
#>     random_int: function (min = 0, max = 9999, size = 1)
x$city_suffix()
#> [1] "bury"When you want to add a new locale to an existing provider, look in
the R/ folder of the package and the locales that are
available are in the file names.
Pick one of the locale files for the provider you’re extending, make a duplicate of it and rename the file with your new locale. Then modify the duplicate, copying the format but putting in place the appropriate information for the new locale.
Where the data comes from for the new locale may vary. One easy way
to start may be porting over locales in the faker Python library that are
not yet in charlatan.
If it’s a locale for which you can’t easily port over from another library, you need to get the data from a variety of sources. There are some R based packages that should help:
Keep in mind when using data to look at their license, if any, and any implications with respect to whether it can be used in this package.
It’s a little tricky how this is done. In the
initialize() block of each main provider file (e.g.,
address-provider.R) we pull in the appropriate locale
specific data based on the user input locale. For example, here’s an
abbreviated initialize block from the
AddressProvider:
initialize = function(locale = NULL) {
  if (!is.null(locale)) {
    # check global locales
    super$check_locale(locale)
    # check address provider locales
    check_locale_(locale, address_provider_locales)
    self$locale <- locale
  } else {
    self$locale <- 'en_US'
  }
  self$city_prefixes <- parse_eval("city_prefixes_", self$locale)
}A few things to note:
en_USparse_eval() to pull in the data. Essentially,
parse_eval() makes the string
city_prefixes_en_US, then finds that in the package
environment and eval()’s it to bring the data into the R6
object in the city_prefixes slot. We repeat this for each
data type. The result is the user initialized class with locale specific
data.