Type: | Package |
Title: | Automates the Creation of New Statistical Analysis Projects |
Version: | 0.11.0 |
Date: | 2024-07-01 |
Description: | Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing. |
License: | GPL-3 | file LICENSE |
Language: | en-US |
LazyLoad: | yes |
Encoding: | UTF-8 |
Depends: | R (≥ 2.7), digest, tibble |
Imports: | methods |
Suggests: | foreign, feather, reshape, plyr, formatR, qs, stringr, ggplot2, lubridate, log4r (≥ 0.1-5), DBI, RMySQL, RSQLite, gdata, RODBC, RJDBC, readxl, xlsx, tuneR, pixmap, data.table, RPostgreSQL, GetoptLong, whisker, testthat (≥ 3.0.0), reticulate |
URL: | http://projecttemplate.net |
BugReports: | https://github.com/KentonWhite/ProjectTemplate/issues |
Collate: | 'ProjectTemplate-package.R' 'add.config.R' 'preinstalled.readers.R' 'add.extension.R' 'addins.R' 'arff.reader.R' 'get.project.R' 'cache.R' 'cache.name.R' 'cache.project.R' 'clean.variable.name.R' 'clear.R' 'clear.cache.R' 'translate.dcf.R' 'config.R' 'create.project.R' 'create.project.rstudio.R' 'create.template.R' 'csv.reader.R' 'csv2.reader.R' 'db.reader.R' 'dbf.reader.R' 'epiinfo.reader.R' 'feather.reader.R' 'file.reader.R' 'list.data.R' 'load.project.R' 'migrate.project.R' 'migrate.template.R' 'mp3.reader.R' 'mtp.reader.R' 'octave.reader.R' 'ppm.reader.R' 'project.config.R' 'r.reader.R' 'rdata.reader.R' 'rds.reader.R' 'reload.project.R' 'require.package.R' 'run.project.R' 'show.project.R' 'spss.reader.R' 'sql.reader.R' 'stata.reader.R' 'stopifnotproject.R' 'stub.tests.R' 'systat.reader.R' 'test.project.R' 'tsv.reader.R' 'url.reader.R' 'wsv.reader.R' 'xls.reader.R' 'xlsx.reader.R' 'xport.reader.R' |
RoxygenNote: | 7.3.1 |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-07-01 18:05:05 UTC; kwhite |
Author: | Aleksandar Blagotic [ctb], Diego Valle-Jones [ctb], Jeffrey Breen [ctb], Joakim Lundborg [ctb], John Myles White [aut, cph], Josh Bode [ctb], Kenton White [ctb, cre], Kirill Mueller [ctb], Matteo Redaelli [ctb], Noah Lorang [ctb], Patrick Schalk [ctb], Dominik Schneider [ctb], Gerold Hepp [ctb], Zunaira Jamil [ctb], Glen Falk [ctb] |
Maintainer: | Kenton White <jkentonwhite@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-07-01 18:30:06 UTC |
ProjectTemplate: Automates the Creation of New Statistical Analysis Projects
Description
Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing.
Author(s)
Maintainer: Kenton White jkentonwhite@gmail.com [contributor]
Authors:
John Myles White [copyright holder]
Other contributors:
Aleksandar Blagotic [contributor]
Diego Valle-Jones [contributor]
Jeffrey Breen [contributor]
Joakim Lundborg [contributor]
Josh Bode [contributor]
Kirill Mueller [contributor]
Matteo Redaelli [contributor]
Noah Lorang [contributor]
Patrick Schalk [contributor]
Dominik Schneider [contributor]
Gerold Hepp [contributor]
Zunaira Jamil [contributor]
Glen Falk [contributor]
See Also
Useful links:
Associate a reader function with an extension.
Description
This function will associate an extension with a custom reader function.
Usage
.add.extension(extension, reader)
Arguments
extension |
The extension of the new data file. |
reader |
The function to use when reading the data file. It should
accept three arguments: |
Value
No value is returned; this function is called for its side effects.
Warning
This interface should not be considered as stable and is likely to be replaced by a different mechanism in a forthcoming version of this package.
See Also
Examples
## Not run: .add.extension('foo', foo.reader)
Attach a package or add a namespace
Description
Internal method to attach a package or only add the namespace.
Usage
.attach.or.add.namespace(package.name, attach)
Arguments
package.name |
name of the package to load, as a character vector |
attach |
boolean indicating whether to attach the package in the global namespace |
Value
Boolean indicating whether the package was successfully loaded
Construct the file names for the cache and hash
Description
Construct the file names for the cache and hash
Usage
.cache.filename(variable, cache_format)
Arguments
variable |
Variable name for which to construct file names |
cache_format |
|
Details
The returned object is a list with two fields:
-
data
: The path to the file in which the variable contents will be saved; -
hash
: The path to the file in which the cache metadata will be stored.
Value
A list with file names
Get configured cache file format strategy
Description
Get configured cache file format strategy
Usage
.cache.format()
Value
A named object of mode expression
.
Calculate the hash of the data stored in a variable
Description
Calculate the hash of the data stored in a variable
Usage
.cache.hash(variables, env = .TargetEnv)
Arguments
variables |
character vector of variable names |
env |
environment from which to load the variable |
Details
The hashes are calculated using the digest::digest
function.
Value
data.frame with the variable names and the corresponding hashes
Print the current cache status
Description
Print the current cache status
Usage
.cache.status()
Value
No value is returned; this function is called for its side effects.
List all cached variables
Description
List all variables for which files are available in the cache. The info is
purely based on the files in the cache
directory. There is no
guarantee the variable can actually be loaded from the cache.
Usage
.cached.variables()
Value
Character vector of cached variables
Compare the project version with the current ProjectTemplate version
Description
Compare the project version with the current ProjectTemplate version
Usage
.check.version(config, warn.migrate = TRUE)
Arguments
config |
Project configuration |
warn.migrate |
Logical indicating whether a warning should be raised if the project version is older than the installed version of ProjectTemplate. |
Value
0
if the numbers are equal, -1
if b
is later
and 1
if a
is later (analogous to the C function
strcmp
).
Convert one or more data sets to data.tables
Description
Converts all base::data.frame
s referred to in the input to
data.table
s. The resulting data set is stored in the
.TargetEnv
.
Usage
.convert.to.data.table(data.sets)
Arguments
data.sets |
A character vector of variable names. |
Value
No value is returned; this function is called for its side effects.
Convert one or more data sets to tibbles
Description
Converts all base::data.frame
s referred to in the input to
tibble
s. The resulting data set is stored in the
.TargetEnv
.
Usage
.convert.to.tibble(data.sets)
Arguments
data.sets |
A character vector of variable names. |
Value
No value is returned; this function is called for its side effects.
Create a data.frame with the cache metadata
Description
Create a data.frame with the cache metadata
Usage
.create.cache.hash(variable, depends, CODE)
Arguments
variable |
Name of the variable to be cached |
depends |
Vector of variable names of dependencies for the variable to be cached, optional. |
CODE |
Code block to generate |
Details
The hashes for the various objects are calculated using the .cache.hash
function.
Value
data.frame
containing the variable name and its dependencies, with the
corresponding hashes appended.
See Also
Create a project structure
Description
.create.project.existing
creates a project directory structure inside
an existing directory with the default files from a given template.
.create.project.new
first creates a new directory and then passes
further control to .create.project.existing
. In case the project
creation fails, the newly created directory is cleaned up.
Usage
.create.project.existing(
project.name,
merge.strategy,
template,
rstudio.project
)
.create.project.new(project.name, template, rstudio.project)
Arguments
project.name |
Character vector with the name of the project directory |
merge.strategy |
Character vector determining whether the directory should be empty or is allowed to contain non-conflicting files |
template |
Name of the template from which the project should be created |
rstudio.project |
Logical indicating whether an |
Value
No value is returned; this function is called for its side effects.
See Also
create.project
, create.template
Check if a directory is empty
Description
Checks if the directory listing by .list.files.and.dirs
is empty.
Usage
.dir.empty(path)
Arguments
path |
Character vector containing the path to the directory to check. |
Value
Logical indicating whether the passed directory was empty.
Run code and assign the results to variable
Description
Run code and assign the results to variable
Usage
.evaluate.code(variable, CODE)
Arguments
variable |
variable name in which to store the result of |
CODE |
code block that returns a result which can be stored in a variable |
Details
No error handling is done on the executed code, nor is the
Get the location of a template from its name
Description
Checks the configured option('ProjectTemplate.templatedir')
for the
template. If no matching template was found the system templates are checked,
and finally the current directory is checked. If no template was found with
the given name an error is raised.
Usage
.get.template(template)
Arguments
template |
Character vector containing the name of the template |
Value
Character vector containing the location of the template. If no template was found by the given name an error is raised.
Check if the project was loaded
Description
Currently does a very basic check to see if the variable project.info
exists in the .TargetEnv
. No check is performed on the contents of the
variable.
Usage
.has.project()
Value
Logical indicating whether the project was loaded.
Initialize the logger for the project
Description
Creates a log4r::logger
and provides a default log file
log/project.log
.
Usage
.init.logger(config, my.project.info)
Arguments
config |
Named list containing the project configuration |
my.project.info |
Named list containing the project information |
Value
Returns my.project.info
amended with the new information.
Test whether a given path is a ProjectTemplate project
Description
Test whether a given path is a ProjectTemplate project
Usage
.is.ProjectTemplate(path = getwd())
Arguments
path |
Directory to check, defaults to the current working directory. |
Value
Logical indicating whether the given path is a valid project.
Check whether the cache is empty
Description
Check whether the cache is empty
Usage
.is.cache.empty()
Value
Logical indicating whether the cache is empty
Check whether variables are cached
Description
Check whether variables are cached
Usage
.is.cached(varnames)
Arguments
varnames |
Character vector of variable names |
Value
Logical vector indicating whether the variable is in the cache.
Check if path is an existing directory
Description
Checks if a given path exists, and if so if it is a directory.
Usage
.is.dir(path)
Arguments
path |
Character vector containing the path to the directory to check. |
Value
Logical indicating a valid directory was passed.
Build the list of data available for loading into memory
Description
This function produces a data.frame of all data files in the project, with
meta data on if and how the file will be loaded by load.project
.
Usage
.list.data(config)
Arguments
config |
List containing the configuration to use. |
Details
The returned data.frame contains the following variables, with one
observation per file in data/
:
filename | Character variable containing the filename relative
to data/ directory. |
varname | Character variable containing the name of the variable into which the file will be imported. * |
is_ignored | Logical variable that indicates whether the file.
is ignored through the data_ignore option in the configuration |
is_directory | Logical variable that indicates whether the file is a directory. |
is_cached | Logical variable that indicates whether the file is
already available in the cache/ directory. |
cached_only | Logical variable that indicates whether the
variable is only available in the cache/ directory. This occurs
when calling the cache function with a code fragment in a munge script.
|
reader | Character variable containing the name of the reader
function that will be used to load the data. Contains a
character(0) if no suitable reader was found.
|
* Note that some readers return more than one variable, usually with the
listed variable name as prefix. This is true for for example the
xls.reader
and xlsx.reader
.
Value
A data.frame listing the available data, with relevant meta data
List all files and directories, excluding .. and .
Description
Creates a directory listing of a given path, including hidden files and subdirectories, but excluding the .. and . aliases.
Usage
.list.files.and.dirs(path)
Arguments
path |
Character vector indicating the path to the parent folder of which the contents should be listed. |
Value
Directory listing of path
Load the data from the cache and data directories
Description
Gets the list of available variables in cache/
and data/
and
loads the data in memory. Data from the cache is loaded first, then in
alphabetical order.
Usage
.load.data(config, my.project.info)
Arguments
config |
Named list containing the project configuration |
my.project.info |
Named list containing the project information |
Value
Returns my.project.info
amended with the new information.
Load the helper functions
Description
Sources all helper scripts in lib
. If lib/globals.R
exists this
is loaded first, all other scripts are sourced in alphabetical order.
Usage
.load.helpers(config, my.project.info)
Arguments
config |
Named list containing the project configuration |
my.project.info |
Named list containing the project information |
Value
Returns my.project.info
amended with the new information.
Load the libraries listed in the configuration into memory
Description
Load the libraries listed in the libraries entry in global.dcf and add the
library names to the project.info
.
Usage
.load.libraries(config, my.project.info)
Arguments
config |
Named list containing the project configuration |
my.project.info |
Named list containing the project information |
Value
Returns my.project.info
amended with the new information.
Source all munge scripts
Description
Sources all munge scripts in the munge
directory in alphabetical
order.
Usage
.munge.data(config, my.project.info)
Arguments
config |
Named list containing the project configuration |
my.project.info |
Named list containing the project information |
Value
Returns my.project.info
amended with the new information.
Get the current ProjectTemplate version
Description
Reads the installed version of ProjectTemplate from the DESCRIPTION
file.
Usage
.package.version()
Value
Version as a character vector.
Match readers to the extensions of the data files
Description
Match readers to the extensions of the data files
Usage
.parse.extensions(data.files, config)
Arguments
data.files |
a vector of paths to data files |
Value
A list of readers
and varnames
Prepare a regular expression for matching files to be ignored
Description
Constructs a single regular expression for matching file names in data that should not be imported. It can detect literal names, globs with wildcards and regular expressions.
Usage
.prepare.data.ignore.regex(ignore_files)
Arguments
ignore_files |
A comma separated character vector that lists all patterns to be matched for ignoring |
Value
A chained regular expression that matches all patterns in the
ignore_files
variable.
Make sure a required directory exists before usage
Description
Checks if the requested directory exists, and if not creates the directory. In the latter case a warning is raised.
Usage
.provide.directory(name)
Arguments
name |
Character vector containing the name of the required directory. |
Value
No value is returned; this function is called for its side effects.
Stop silently
Description
Temporarily disable option(show.error.messages)
and stop execution.
Usage
.quietstop()
Value
No value is returned; this function is called for its side effects.
Read metadata for a variable in the cache
Description
Read metadata for a variable in the cache
Usage
.read.cache.info(variable)
Arguments
variable |
Variable name for which to look up the metadata |
Details
The returned object is a list with two fields:
-
in.cache
: Logical indicating whether the requested variable was found in the cache -
hash
: A data.frame as was created by.create.cache.hash
Value
list
with metadata, see Details for more info.
Remove variables to keep from a list of candidates for removal
Description
Remove variables to keep from a list of candidates for removal
Usage
.remove.sticky.vars(names, keep)
Arguments
names |
character vector of variable names that are candidate for removal |
keep |
character vector of variable names that should not be removed |
Details
If the sticky_variables
option is part of the config
variable the config
variable itself is added to the list of variables
to keep. Also all variables listed in config$sticky_variables
in a
comma separated list are added to keep.
Value
A character vector containing the variables to remove.
Require internal package
Description
Internal method to require a package that is necessary for the internal functioning of ProjectTemplate. Never attaches the package unless configured to do so in global.dcf (which throws a warning).
Usage
.require.package(package.name)
Arguments
package.name |
name of the package to load, as a character vector |
Value
No value is returned; this function is called for its side effects.
Return an RStudio project file as character vector
Description
Return an RStudio project file as character vector
Usage
.rstudioprojectfile()
Value
Character vector with the contents of an empty RStudio project file
Raise an error if given path is not a valid project
Description
Function to stop processing if the path is not a Project Template return the project name if it is a Project Template directory.
Usage
.stopifnotproject(additional_message = "", path = getwd())
Arguments
additional_message |
Optional message to show if the given path is not a valid project |
path |
Path to check if it is a valid project |
Value
Project name if it is a valid Project.
Raise an error if given path is a valid project
Description
Function to stop processing if the path is a Project Template.
Usage
.stopifproject(additional_message = "", path = getwd())
Arguments
additional_message |
Optional message to show if the given path is not a valid project |
path |
Path to check if it is a valid project |
Value
No value is returned; this function is called for its side effects
Unload the project variables keeping the data
Description
Removes the config
, logger
and project.info
variables
from memory, leaving all data variables in place.
Usage
.unload.project()
Value
No value is returned; this function is called for its side effects.
Compare sets of variable names
Description
Compare the variables (excluding functions) in the global env with a passed in string of names and return the set difference.
Usage
.var.diff.from(given.var.list = "", env = .TargetEnv)
Arguments
given.var.list |
Character vector of variable names |
env |
Environment in which to compare the sets of variables |
Write a variable and its metadata to cache
Description
Write a variable and its metadata to cache
Usage
.write.cache(cache.hash, ...)
Arguments
cache.hash |
a |
... |
extra parameters passed to |
Details
cache.hash is a data frame with two columns: variable
and hash
.
Row name VAR
is the name of the variable to save.
Row name CODE
is the hash value of the code to compute variable.
Row name DEPENDS.*
are the dependent variables that CODE
depends on.c
The helper function .create.cache.hash
creates a suitable dataframe
Value
No value is returned, this function is called for its side effects.
Add project specific config to the global config
Description
Enables project specific configuration to be added to the global config object. The
allowable format is key value pairs which are appended to the end of the config
object, which is accessible from the global environment.
Usage
add.config(..., apply.override = FALSE)
Arguments
... |
A series of key-value pairs containing the configuration. The key is the
name that gets added to the config object. These can be overridden at load
time through the |
apply.override |
A boolean indicating whether overrides should be applied. This
can be used to add a setting disregarding arguments to |
Details
Once defined, the value can be accessed from any ProjectTemplate
script by
referencing config$my_project_var
.
Examples
library('ProjectTemplate')
## Not run:
add.config(
keep_bigdata=TRUE, # Whether to keep the big data file in memory
parse=7 # number of fields to parse
)
if (config$keep_bigdata) ...
## End(Not run)
Cache a data set for faster loading.
Description
This function will store a copy of the named data set in the cache
directory. This cached copy of the data set will then be given precedence
at load time when calling load.project
. Cached data sets are
stored as .RData
or optionally as .qs
files.
Usage
cache(variable = NULL, CODE = NULL, depends = NULL, ...)
Arguments
variable |
A character string containing the name of the variable to be saved. If the CODE parameter is defined, it is evaluated and saved, otherwise the variable with that name in the global environment is used. |
CODE |
A sequence of R statements enclosed in |
depends |
A character vector of other global environment objects that the CODE depends upon. Caching will be forced if those objects have changed since last caching |
... |
Additional arguments passed on to |
Details
Usually you will want to cache datasets during munging. This can be the raw
data just loaded, or it can be the result of further processing during munge. Either
way, it can take a while to cache large variables, so cache will only cache when it
needs to.
The clear.cache("variable")
command
can be run to flush individual items from the cache.
Calling cache()
with no arguments returns the current status of the cache.
Value
No value is returned; this function is called for its side effects.
See Also
Examples
library('ProjectTemplate')
## Not run: create.project('tmp-project')
setwd('tmp-project')
dataset1 <- 1:5
cache('dataset1')
setwd('..')
unlink('tmp-project')
## End(Not run)
Translate a variable name into a file name for caching.
Description
This function will translate a variable name into a form that is suitable as a filename on most OS's.
Usage
cache.name(data.filename)
Arguments
data.filename |
The variable name to be translated into a filename. |
Value
A translated variable name.
Examples
library('ProjectTemplate')
## Not run: cache.name('example.1')
Cache a project's data sets in binary format.
Description
This function will cache all of the data sets that were loaded by
the load.project
function in a binary format that is
easier to load quickly. This is particularly useful for data sets
that you've modified during a slow munging process that does not
need to be repeated.
Usage
cache.project()
Value
No value is returned; this function is called for its side effects.
See Also
create.project
, load.project
,
get.project
, show.project
Examples
library('ProjectTemplate')
## Not run: load.project()
cache.project()
## End(Not run)
Translate a file name into a valid R variable name.
Description
This function will translate a file name into a name that is a valid variable name in R. Non-alphabetic characters on the boundaries of the file name will be stripped; non-alphabetic characters inside of the file name will be replaced with dots.
Usage
clean.variable.name(variable.name, config = .load.config())
Arguments
variable.name |
A character vector containing a variable's proposed name that should be standardized. |
config |
A list of configuration variables. Defaults to those loaded by load.project |
Value
A translated variable name.
Examples
library('ProjectTemplate')
## Not run: clean.variable.name('example_1')
Clear objects from the global environment
Description
This function removes specific (or all by default) named objects from the global
environment. If used within a ProjectTemplate
project, then any variables
defined in the config$sticky_variables
will remain.
Usage
clear(..., keep = c(), force = FALSE)
Arguments
... |
A sequence of character strings of the objects to
be removed from the global environment. If none given, then all items except
those in |
keep |
A character vector of variables that should remain in the global environment |
force |
If |
Value
The variables kept and removed are reported
Examples
library('ProjectTemplate')
## Not run:
clear("x", "y", "z")
clear(keep="a")
clear()
## End(Not run)
Clear data sets from the cache
Description
This function remove specific (or all by default) named data sets from the cache
directory. This will force that data to be read in from the data
directory
next time load.project
is called.
Usage
clear.cache(...)
Arguments
... |
A sequence of character strings of the variables to be removed from the cache. If none given, then all items in the cache will be removed. |
Value
Success or failure is reported
Examples
library('ProjectTemplate')
## Not run:
clear.cache("x", "y", "z")
## End(Not run)
Create a new project.
Description
This function will create all of the scaffolding for a new project.
It will set up all of the relevant directories and their initial
contents. For those who only want the minimal functionality, the
template
argument can be set to minimal
to create a subset of
ProjectTemplate's default directories. For those who want to dump
all of ProjectTemplate's functionality into a directory for extensive
customization, the dump
argument can be set to TRUE
.
Usage
create.project(
project.name = "new-project",
template = "full",
dump = FALSE,
merge.strategy = c("require.empty", "allow.non.conflict"),
rstudio.project = FALSE
)
Arguments
project.name |
A character vector containing the name for this new project. Must be a valid directory name for your file system. |
template |
A character vector containing the name of the template to
use for this project. By default a |
dump |
A boolean value indicating whether the entire functionality of ProjectTemplate should be written out to flat files in the current project. |
merge.strategy |
What should happen if the target directory exists and
is not empty?
If |
rstudio.project |
A boolean value indicating whether the project should
also be an 'RStudio Project'. Defaults to |
Details
If the target directory does not exist, it is created. Otherwise, it can only contain files and directories allowed by the merge strategy.
Value
No value is returned; this function is called for its side effects.
See Also
load.project
, get.project
,
cache.project
, show.project
Examples
library('ProjectTemplate')
## Not run: create.project('MyProject')
Create a new template
Description
This function writes a skeleton directory structure for creating your own custom templates.
Usage
create.template(target, source = "minimal")
Arguments
target |
Name of the new template. It is created under the directory
specified by |
source |
Name of an existing template to copy, defaults to the built in 'minimal' template. |
Show information about the current project.
Description
This function will return all of the information that ProjectTemplate has
about the current project. This information is gathered when
load.project
is called. At present, ProjectTemplate keeps a
record of the project's configuration settings, all packages that were loaded
automatically and all of the data sets that were loaded automatically. The
information about autoloaded data sets is used by the
cache.project
function.
Usage
get.project()
Details
In previous releases this information has been available through the
global variable project.info
. Using this variable is now deprecated
and will result in a warning.
Value
A named list.
See Also
create.project
, load.project
,
cache.project
, show.project
Examples
library('ProjectTemplate')
## Not run: load.project()
get.project()
## End(Not run)
Listing the data for the current project
Description
This function produces a data.frame of all data files in the project, with
meta data on if and how the file will be loaded by load.project
.
Usage
list.data(...)
Arguments
... |
Named arguments to override configuration from
|
Details
The returned data.frame contains the following variables, with one
observation per file in data/
:
filename | Character variable containing the filename relative
to data/ directory. |
varname | Character variable containing the name of the variable into which the file will be imported. * |
is_ignored | Logical variable that indicates whether the file.
is ignored through the data_ignore option in the configuration |
is_directory | Logical variable that indicates whether the file is a directory. |
is_cached | Logical variable that indicates whether the file is
already available in the cache/ directory. |
cached_only | Logical variable that indicates whether the
variable is only available in the cache/ directory. This occurs
when calling the cache function with a code fragment in a munge script.
|
reader | Character variable containing the name of the reader
function that will be used to load the data. Contains a
character(0) if no suitable reader was found.
|
* Note that some readers return more than one variable, usually with the
listed variable name as prefix. This is true for for example the
xls.reader
and xlsx.reader
.
Value
A data.frame listing the available data, with relevant meta data
See Also
load.project
, show.project
,
project.config
Examples
library('ProjectTemplate')
## Not run: list.data()
Automatically load data and packages for a project.
Description
This function automatically load all of the data and packages used by
the project from which it is called. The behavior can be controlled by
adjusting the project.config
configuration.
Usage
load.project(...)
Arguments
... |
Named arguments to override configuration from |
Details
...
can take an argument override.config or a single named
list for backward compatibility. This cannot be mixed with the new style
override. When a named argument override.config is present it takes
precedence over the other options. If any of the provided arguments is
unnamed an error is raised.
Value
No value is returned; this function is called for its side effects.
See Also
create.project
, get.project
,
cache.project
, show.project
, project.config
Examples
library('ProjectTemplate')
## Not run: load.project()
Load Project
Description
Call this function as an addin to load the library and run 'load.project()'
Usage
loadproject_addin()
Migrates a project from a previous version of ProjectTemplate
Description
This function automatically performs all necessary steps to migrate an existing project so that it is compatible with this version of ProjectTemplate
Usage
migrate.project()
Value
No value is returned; this function is called for its side effects.
See Also
Examples
library('ProjectTemplate')
## Not run: migrate.project()
Migrate a template to a new version of ProjectTemplate
Description
This function updates a skeleton project to the current version of ProjectTemplate.
Usage
migrate.template(template)
Arguments
template |
Name of the template to upgrade. |
Automatically read data into memory
Description
The preinstalled readers are automatically loaded in the list preinstalled.readers
.
The reader functions will load a data set stored in the data
directory into
the specified global variable binding. These functions are not meant to be called directly.
Usage
preinstalled.readers
arff.reader(data.file, filename, variable.name)
csv.reader(data.file, filename, variable.name)
csv2.reader(data.file, filename, variable.name)
db.reader(data.file, filename, variable.name)
dbf.reader(data.file, filename, variable.name)
epiinfo.reader(data.file, filename, variable.name)
feather.reader(data.file, filename, variable.name)
file.reader(data.file, filename, variable.name)
mp3.reader(data.file, filename, variable.name)
mtp.reader(data.file, filename, variable.name)
octave.reader(data.file, filename, variable.name)
ppm.reader(data.file, filename, variable.name)
r.reader(data.file, filename, variable.name)
rdata.reader(data.file, filename, variable.name)
rds.reader(data.file, filename, variable.name)
spss.reader(data.file, filename, variable.name)
sql.reader(data.file, filename, variable.name)
stata.reader(data.file, filename, variable.name)
systat.reader(data.file, filename, variable.name)
tsv.reader(data.file, filename, variable.name)
url.reader(data.file, filename, variable.name)
wsv.reader(data.file, filename, variable.name)
xls.reader(data.file, filename, workbook.name)
xlsx.reader(data.file, filename, workbook.name)
xport.reader(data.file, filename, variable.name)
Arguments
data.file |
The name of the data file to be read. |
filename |
The path to the data set to be loaded. |
variable.name |
The name to be assigned to in the global environment. |
Format
An object of class list
of length 55.
Details
Some file formats can contain more than one dataset. In this case all datasets are loaded
into separate variables in the format <variable.name>.<subset.name>
, where the
subset.name
is determined by the reader automatically.
The sql.reader
function will load data from a SQL database based on configuration
information found in the specified .sql file. The .sql file must specify
a database to be accessed. All tables from the database, one specific tables
or one specific query against any set of tables may be executed to generate
a data set.
queries can support string interpolation to execute code snippets using mustache syntax (http://mustache.github.io). This is used to create queries that depend on data from other sources. Code delimited is {{...}}
Example: query: SELECT * FROM my_table WHERE id IN ({{ids}}). Here ids is a vector previously loaded into the Global Environment through ProjectTemplate
Examples of the DCF format and settings used in a .sql file are shown below:
Example 1 type: mysql user: sample_user password: sample_password host: localhost dbname: sample_database table: sample_table
Example 2 type: mysql user: sample_user password: sample_password host: localhost port: 3306 socket: /Applications/MAMP/tmp/mysql/mysql.sock dbname: sample_database table: sample_table
Example 3 type: sqlite dbname: /path/to/sample_database table: sample_table
Example 4 type: sqlite dbname: /path/to/sample_database query: SELECT * FROM users WHERE user_active == 1
Example 5 type: sqlite dbname: /path/to/sample_database table: *
Example 6 type: postgres user: sample_user password: sample_password host: localhost dbname: sample_database table: sample_table
Example 7 type: odbc dsn: sample_dsn user: sample_user password: sample_password dbname: sample_database query: SELECT * FROM sample_table
Example 8 type: oracle user: sample_user password: sample_password dbname: sample_database table: sample_table
Example 9 type: jdbc class: oracle.jdbc.OracleDriver classpath: /path/to/ojdbc5.jar (or set in CLASSPATH) user: scott password: tiger url: jdbc:oracle:thin:@myhost:1521:orcl query: select * from emp
Example 10 type: heroku classpath: /path/to/jdbc4.jar (or set in CLASSPATH) user: scott password: tiger host: heroku.postgres.url port: 1234 dbname: herokudb query: select * from emp
Example 11 In this example RSQLite::initExtension() is automatically called on the established connection.
Liam Healy has written extension-functions.c, which is available on http://www.sqlite.org/contrib. It provides mathematical and string extension functions for SQL queries using the loadable extensions mechanism.
type: sqlite dbname: /path/to/sample_database plugin: extension query: SELECT *,STDEV(value1) FROM example_table
Value
No value is returned; the reader functions are called for its side effects.
Functions
-
arff.reader()
: Read the Weka file format from files with the.arff
extension. -
csv.reader()
: Read a comma separated values file with the.csv
extension. -
csv2.reader()
: Read a semicolon separated values file with the.csv2
extension.In May 2018, the default behavior of the reader for .csv2 files changed to use R's read.csv2(), where the field separator is assumed to be ';' and the decimal separator to be ','.
-
db.reader()
: Read a SQlite3 database with a.db
file extension.If you want to specify a single table or query to execute against the database, move it elsewhere and use a .sql file interpreted by
sql.reader
. -
dbf.reader()
: Read an XBASE file with a.dbf
file extension. -
epiinfo.reader()
: Read an Epi Info file with a .rec file extension. -
feather.reader()
: Read a feather file in Apache Arrow format with a.feather
file extension. -
file.reader()
: Read an arbitrary file described in a.file
file.A
.file
file must contain DCF that specifies the path to the data set and which extension should be used from the dispatch table to load the data set.Examples of the DCF format and settings used in a .file file are shown below:
path: http://www.johnmyleswhite.com/ProjectTemplate/sample_data.csv extension: csv
-
mp3.reader()
: Read an MP3 file with a.mp3
file extension.This function will load the specified MP3 file into memory using the tuneR package. This is useful for working with music files as a data set.
-
mtp.reader()
: Read a Minitab Portable Worksheet with a.mtp3
file extension. -
octave.reader()
: Read an Octave file with a.m
file extension.This function will load the specified Octave file into memory using the
foreign::read.octave
function. -
ppm.reader()
: Read a PPM file with a.ppm
file extension.Data is loaded using the
pixmap::read.pnm
function. -
r.reader()
: Read an R source file with a.R
file extension.This function will call source on the specified R file, executing the code inside of it as a way of generating data sets dynamically, as in many Monte Carlo applications.
-
rdata.reader()
: Read an RData file with a.rdata
or.rda
file extension.This function will load the specified RData file into memory using the
load
function. This may generate many data sets simultaneously. -
rds.reader()
: Read the RDS file format from files with the.rds
extension. -
spss.reader()
: Read an SPSS file with a.sav
file extension.This function will load the specified SPSS file into memory. It will convert the resulting list object into a data frame before inserting the data set into the global environment.
-
sql.reader()
: Read a database described in a.sql
file. -
stata.reader()
: Read a Stata file with a.stata
file extension. -
systat.reader()
: Read a Systat file with a.sys
or.syd
file extension. -
tsv.reader()
: Read a tab separated values file with the.tsv
or.tab
file extensions. -
url.reader()
: Read a remote file described in a.url
file.This function will load data from a remote source accessible through HTTP or FTP based on configuration information found in the specified .url file. The
.url
file must specify the URL of the remote data source and the type of data that is available remotely. Only one data source per.url
file is supported currently.Examples of the DCF format and settings used in a .url file are shown below:
Example 1 url: http://www.johnmyleswhite.com/ProjectTemplate/sample_data.csv separator: ,
-
wsv.reader()
: Read a whitespace separated values file with the.wsv
or.txt
file extensions. -
xls.reader()
: Read an Excel file with a.xls
file extension.This function will load the specified Excel file into memory using the
readxl
package. -
xlsx.reader()
: Read an Excel 2007 file with a.xlsx
file extension.This function will load the specified Excel file into memory using the
readxl
package. -
xport.reader()
: Read an XPort file with a.xport
file extension.
See Also
ProjectTemplate Configuration file
Description
Every ProjectTemplate
project has a configuration file found at
config/global.dcf
that contains various options that can be tweaked
to control runtime behavior. The valid options are shown below, and must
be encoded using the DCF
format.
Usage
project.config()
Details
Calling the project.config()
function will display the current project
configuration.
The options that can be configured in the config/global.dcf
are
shown below
data_loading | This can be set to TRUE or FALSE. If data_loading is on, the system will load data from both the cache and data directories with cache taking precedence in the case of name conflict. |
data_loading_header | This can be set to TRUE or FALSE. If data_loading_header is on, the system will load text data files, such as CSV, TSV, or XLSX, treating the first row as header. |
data_ignore | A comma separated list of files to be ignored when importing
from the data/ directory. Regular expressions can be used but should be delimited
(on both sides) by / . Note that filenames and filepaths should never begin with
a / , entire directories under data/ can be ignored by adding a trailing / . |
cache_loading | This can be set to TRUE or FALSE. If cache_loading is on, the system will load data from the cache directory before any attempt to load from the data directory. |
recursive_loading | This can be set to TRUE or FALSE. If recursive_loading is on, the system will load data from the data directory and all its sub directories recursively. |
munging | This can be set to TRUE or FALSE. If munging is on, the system will execute the files in the munge directory sequentially using the order implied by the sort() function. If munging is FALSE, none of the files in the munge directory will be executed. |
logging | This can be set to TRUE or FALSE. If logging is on, a logger object using the log4r package is automatically created when you run load.project(). This logger will write to the logs directory. |
logging_level | The value of logging_level is passed to a logger object using the log4r package during logging when when you run load.project(). |
load_libraries | This can be set to TRUE or FALSE. If load_libraries is on, the system will load all of the R packages listed in the libraries field described below. |
libraries | This is a comma separated list of all the R packages that the user wants to automatically load when load.project() is called. These packages must already be installed before calling load.project(). |
as_factors | This can be set to TRUE or FALSE. If as_factors is on, the system will convert every character vector into a factor when creating data frames; most importantly, this automatic conversion occurs when reading in data automatically. If FALSE, character vectors will remain character vectors. |
tables_type | This is the format for default tables. Values can be 'tibble' (default), 'data_table', or 'data_frame' |
attach_internal_libraries | This can be set to TRUE or FALSE. If attach_internal_libraries is on, then every time a new package is loaded into memory during load.project() a warning will be displayed informing that has happened. |
cache_loaded_data | This can be set to TRUE or FALSE. If cache_loaded_data is on, then data loaded from the data directory during load.project() will be automatically cached (so it won't need to be reloaded next time load.project() is called). |
sticky_variables | This is a comma separated list of any project-specific
variables that should remain in the global environment after a clear() command.
This can be used to clear the global environment, but keep any large datasets in
place so they are not unnecessarily re-generated during load.project() .
Note that any this will be over-ridden if the force=TRUE parameter is passed
to clear() `. |
underscore_variables | This can be set to TRUE to use
underscores ('_') in variable names or FALSE to replace underscores
('_') with dots ('.'). The default is TRUE . When migrating old
projects, underscore_variables is set to FALSE . |
cache_file_format | The default file format for cached data is 'RData'. This can be set to 'qs' in order to benefit from the quick serialization of R objects provided by qs. |
If the config/globals.dcf
is missing some items (for example because it was created under an
old version of ProjectTemplate
, then the following configuration is used for any missing items
during load.project()
:
data_loading | TRUE |
data_loading_header | TRUE |
data_ignore | |
cache_loading | TRUE |
recursive_loading | FALSE |
munging | TRUE |
logging | FALSE |
logging_level | INFO |
load_libraries | FALSE |
libraries | reshape2, plyr, tidyverse, stringr, lubridate |
as_factors | FALSE |
tables_type | tibble |
attach_internal_libraries | TRUE |
cache_loaded_data | FALSE |
sticky_variables | NONE |
underscore_variables | FALSE |
cache_file_format | RData |
When a new project is created using create.project()
, the following values are pre-populated:
version | 0.11.0 |
data_loading | TRUE |
data_loading_header | TRUE |
data_ignore | |
cache_loading | TRUE |
recursive_loading | FALSE |
munging | TRUE |
logging | FALSE |
logging_level | INFO |
load_libraries | FALSE |
libraries | reshape2, plyr, tidyverse, stringr, lubridate |
as_factors | FALSE |
tables_type | tibble |
attach_internal_libraries | FALSE |
cache_loaded_data | TRUE |
sticky_variables | NONE |
underscore_variables | TRUE |
cache_file_format | RData |
Value
The current project configuration is displayed.
See Also
Reload or reset a project
Description
This function will clear the global environment and reload a project. This is
useful when you've updated your data sets or changed your preprocessing scripts.
Any sticky_variables
configuration parameter in project.config
will remain both in memory and (if present) in the cache by default. If the reset
parameter is TRUE
, then all variables are cleared from both the global
environment and the cache.
Usage
reload.project(..., reset = FALSE)
Arguments
... |
Optional parameters passed to |
reset |
A boolean value, which if set |
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: load.project()
reload.project()
## End(Not run)
Reload Project
Description
Call this function as an addin to load the library and run 'reload.project()'
Usage
reloadproject_addin()
Require a package for use in the project
Description
This functions will require the given package. If the package is not installed it will stop execution and print a message to the user instructing them which package to install and which function caused the error.
Usage
require.package(package.name, attach = TRUE)
Arguments
package.name |
A character vector containing the package name. Must be a valid package name installed on the system. |
attach |
Should the package be attached to the search path (as with
|
Details
The function .require.package
is called by internal code. It will
attach the package to the search path (with a warning) only if the
compatibility configuration attach_internal_libraries
is set to
TRUE
. Normally, packages used for loading data are not
needed on the search path, but not loading them might break existing code.
In a forthcoming version this compatibility setting will be removed,
and no packages will be attached to the search path by internal code.
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: require.package('PackageName')
Run all of the analyses in the src
directory.
Description
This function will run each of the analyses in the src
directory in separate processes. At present, this is done serially, but
future versions of this function will provide a means of running
the analyses in parallel.
Usage
run.project()
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: run.project()
Show information about the current project.
Description
This function will show the user all of the information that
ProjectTemplate has about the current project. This information is
gathered when load.project
is called. At present,
ProjectTemplate keeps a record of the project's configuration settings,
all packages that were loaded automatically and all of the data sets that
were loaded automatically. The information about autoloaded data sets
is used by the cache.project
function.
Usage
show.project()
Value
No value is returned; this function is called for its side effects.
See Also
create.project
, load.project
,
get.project
, cache.project
Examples
library('ProjectTemplate')
## Not run: load.project()
show.project()
## End(Not run)
Generate unit tests for your helper functions.
Description
This function will parse all of the functions defined in files inside
of the lib
directory and will generate a trivial unit test for
each function. The resulting tests are stored in the file
tests/autogenerated.R
. Every test is excepted to fail by default,
so you should edit them before calling test.project
.
Usage
stub.tests()
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: stub.tests()
Run all unit tests for this project.
Description
This function will run all of the testthat
style unit tests
for the current project that are defined inside of the tests
directory. The tests will be run in the order defined by the filenames
for the tests: it is recommend that each test begin with a number
specifying its position in the sequence.
Usage
test.project()
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: load.project()
test.project()
## End(Not run)
Read a DCF file into an R list.
Description
This function will read a DCF file and translate the resulting data frame into a list. The DCF format is used throughout ProjectTemplate for configuration settings and ad hoc file format specifications.
Usage
translate.dcf(filename)
Arguments
filename |
A character vector specifying the DCF file to be translated. |
Details
The content of the DCF file are stored as character strings. If the content is placed between the back tick character , then the content is evaluated as R code and the result returned in a string
Value
Returns a list containing the entries from the DCF file.
Examples
library('ProjectTemplate')
## Not run: translate.dcf(file.path('config', 'global.dcf'))