| Type: | Package | 
| Title: | Turn Clean Data into Messy Data | 
| Version: | 0.1.1 | 
| Description: | Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc. | 
| License: | MIT + file LICENSE | 
| Depends: | R (≥ 2.10) | 
| Imports: | assertthat, purrr, stringr | 
| Suggests: | charlatan, testthat (≥ 2.0.0), tibble, covr | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| URL: | https://github.com/mdlincoln/salty | 
| BugReports: | https://github.com/mdlincoln/salty/issues | 
| NeedsCompilation: | no | 
| Packaged: | 2024-08-31 04:04:06 UTC; mlincoln | 
| Author: | Matthew Lincoln  | 
| Maintainer: | Matthew Lincoln <matthew.d.lincoln@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-08-31 04:20:02 UTC | 
salty: Turn Clean Data into Messy Data
Description
Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc.
Author(s)
Maintainer: Matthew Lincoln matthew.d.lincoln@gmail.com (ORCID)
See Also
Useful links:
Access the original source vector for a given shaker function
Description
Access the original source vector for a given shaker function
Usage
inspect_shaker(f)
Arguments
f | 
 A shaker function  | 
Value
A character vector
Examples
inspect_shaker(shaker$punctuation)
Sample a proportion of indices of a vector
Description
Sample a proportion of indices of a vector
Usage
p_indices(x, p)
Arguments
x | 
 A vector  | 
p | 
 A numeric probability between 0 and 1  | 
Value
An integer vector of indices.
Salt vectors with common data problems
Description
These are easy-to-use wrapper functions that call either salt_insert (for including new characters) or salt_replace (for salting that requires replacement of specific characters) with sane defaults.
Usage
salt_punctuation(x, p = 0.2, n = 1)
salt_letters(x, p = 0.2, n = 1)
salt_whitespace(x, p = 0.2, n = 1)
salt_digits(x, p = 0.2, n = 1)
salt_ocr(x, p = 0.2, rep_p = 0.1)
salt_capitalization(x, p = 0.1, rep_p = 0.1)
salt_decimal_commas(x, p = 0.1, rep_p = 0.1)
Arguments
x | 
 A vector. This will always be coerced to character during salting.  | 
p | 
 A number between 0 and 1. Percent of values in   | 
n | 
 A positive integer. Number of times to add new values from
  | 
rep_p | 
 A number between 0 and 1. Probability that a given match should be replaced in one of the selected values.  | 
Details
For a more fine-grained control over how characters are added and whether , see the documentation for salt_insert, salt_substitute, salt_replace, and salt_delete.
Functions
-  
salt_punctuation(): Punctuation characters -  
salt_letters(): Upper- and lower-case letters -  
salt_whitespace(): Spaces -  
salt_digits(): 0-9 -  
salt_ocr(): Replace some substrings with common OCR problems -  
salt_capitalization(): Flip capitalization of letters -  
salt_decimal_commas(): Flip decimals to commas and vice versa 
Delete some characters from some values
Description
Delete some characters from some values
Usage
salt_delete(x, p = 0.2, n = 1)
Arguments
x | 
 A vector. This will always be coerced to character during salting.  | 
p | 
 A number between 0 and 1. Percent of values in   | 
n | 
 A positive integer. Number of times to add new values from
  | 
Value
A character vector the same length as x
Examples
x <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
       "Nunc finibus tortor a elit eleifend interdum.",
       "Maecenas aliquam augue sit amet ultricies placerat.")
salt_delete(x, p = 0.5, n = 5)
salt_empty(x, p = 0.5)
salt_na(x, p = 0.5)
Insert new characters into some values in a vector
Description
Inserts a selection of characters into a percentage of values in the supplied vector.
Usage
salt_insert(x, insertions, p = 0.2, n = 1)
Arguments
x | 
 A vector. This will always be coerced to character during salting.  | 
insertions | 
 A shaker function, or a character vector.  | 
p | 
 A number between 0 and 1. Percent of values in   | 
n | 
 A positive integer. Number of times to add new values from
  | 
Value
A character vector the same length as x
Remove entire values from a vector
Description
Remove entire values from a vector
Usage
salt_na(x, p = 0.2)
salt_empty(x, p = 0.2)
Arguments
x | 
 A vector  | 
p | 
 A number between 0 and 1. Proportion of values to edit.  | 
Value
A vector the same length as x
Replace certain patterns into some values in a vector
Description
Inserts a selection of characters into some values of x. Pair salt_replace with the named vectors in replacement_shaker, or supply your own named vector of replacements. The convenience functions salt_ocr and salt_capitalization are light wrappers around salt_replace.
Usage
salt_replace(x, replacements, p = 0.1, rep_p = 0.5)
Arguments
x | 
 A vector. This will always be coerced to character during salting.  | 
replacements | 
 A replacement_shaker function, or a named character vector of patterns and replacements.  | 
p | 
 A number between 0 and 1. Percent of values in   | 
rep_p | 
 A number between 0 and 1. Probability that a given match should be replaced in one of the selected values.  | 
Value
A character vector the same length as x
Examples
x <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
       "Nunc finibus tortor a elit eleifend interdum.",
       "Maecenas aliquam augue sit amet ultricies placerat.")
salt_replace(x, replacement_shaker$capitalization, p = 0.5, rep_p = 0.2)
salt_ocr(x, p = 1, rep_p = 0.5)
Substitute certain characters in a vector
Description
Substitute certain characters in a vector
Usage
salt_substitute(x, substitutions, p = 0.2, n = 1)
Arguments
x | 
 A vector. This will always be coerced to character during salting.  | 
substitutions | 
 Values to be substituted in  | 
p | 
 A number between 0 and 1. Percent of values in   | 
n | 
 A positive integer. Number of times to add new values from
  | 
Value
A character vector the same length as x
Examples
x <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
       "Nunc finibus tortor a elit eleifend interdum.",
       "Maecenas aliquam augue sit amet ultricies placerat.")
salt_substitute(x, shaker$digits, p = 0.5, n = 5)
Randomly swap out entire values in a vector
Description
Because swaps can be provided by either a character vector or a function
that returns a character vector, salt_swap can be fruitfully used in
conjunction with the charlatan::charlatan package to intersperse real data with
simulated data.
Usage
salt_swap(x, swaps, p = 0.2)
Arguments
x | 
 A vector. This will always be coerced to character during salting.  | 
swaps | 
 Values to be swapped out  | 
p | 
 A number between 0 and 1. Percent of values in   | 
Value
A character vector the same length as x
Examples
x <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
       "Nunc finibus tortor a elit eleifend interdum.",
       "Maecenas aliquam augue sit amet ultricies placerat.")
new_values <- c("foo", "bar", "baz")
salt_swap(x, swaps = new_values, p = 0.5)
salty: Turn Clean Data Into Messy Data
Description
Insert, delete, replace, and substitute bits of your data with messy values.
Details
Convenient wrappers such as salt_punctuation are provided for quick access
to this package's functionality with simple defaults. For more fine-grained
control, use one of the underlying salt_ functions:
-  
salt_insert will insert new characters into some of the values of
x. All the original characters of the original values will be maintained. -  
salt_substitute will substitute some characters in some of the values of
xin place of some of the original characters. -  
salt_replace will replace some characters in some of the values of
x. Unlike salt_substitute, salt_replace does conditional replacement dependent on the original values ofx, such as changing capitalization or simulating OCR errors based on certain character combinations. -  
salt_delete will remove some characters in the values of
x -  
salt_na and salt_empty will replace some values of
xwithNAor with empty strings. -  
salt_swap replaces entire values of
xwith new strings 
Get a set of values to use in salt_ functions
Description
shaker contains various character sets to be added to your data using salt_insert and salt_substitute. replacement_shaker is for salt_replace, and contains pairlists that replace matched patterns in your data.
Usage
shaker
replacement_shaker
available_shakers()
Format
An object of class list of length 6.
An object of class list of length 3.
Value
A sampling function that will be called by salt_insert, salt_substitute, or salt_replace.
Examples
salt_insert(letters, shaker$punctuation)
available_shakers()