Help for package ProjectTemplate

Type:

Package

Title:

Automates the Creation of New Statistical Analysis Projects

Version:

0.11.0

Date:

2024-07-01

Description:

Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing.

License:

GPL-3 | file LICENSE

Language:

en-US

LazyLoad:

yes

Encoding:

UTF-8

Depends:

R (≥ 2.7), digest, tibble

Imports:

methods

Suggests:

foreign, feather, reshape, plyr, formatR, qs, stringr, ggplot2, lubridate, log4r (≥ 0.1-5), DBI, RMySQL, RSQLite, gdata, RODBC, RJDBC, readxl, xlsx, tuneR, pixmap, data.table, RPostgreSQL, GetoptLong, whisker, testthat (≥ 3.0.0), reticulate

URL:

http://projecttemplate.net

BugReports:

https://github.com/KentonWhite/ProjectTemplate/issues

Collate:

'ProjectTemplate-package.R' 'add.config.R' 'preinstalled.readers.R' 'add.extension.R' 'addins.R' 'arff.reader.R' 'get.project.R' 'cache.R' 'cache.name.R' 'cache.project.R' 'clean.variable.name.R' 'clear.R' 'clear.cache.R' 'translate.dcf.R' 'config.R' 'create.project.R' 'create.project.rstudio.R' 'create.template.R' 'csv.reader.R' 'csv2.reader.R' 'db.reader.R' 'dbf.reader.R' 'epiinfo.reader.R' 'feather.reader.R' 'file.reader.R' 'list.data.R' 'load.project.R' 'migrate.project.R' 'migrate.template.R' 'mp3.reader.R' 'mtp.reader.R' 'octave.reader.R' 'ppm.reader.R' 'project.config.R' 'r.reader.R' 'rdata.reader.R' 'rds.reader.R' 'reload.project.R' 'require.package.R' 'run.project.R' 'show.project.R' 'spss.reader.R' 'sql.reader.R' 'stata.reader.R' 'stopifnotproject.R' 'stub.tests.R' 'systat.reader.R' 'test.project.R' 'tsv.reader.R' 'url.reader.R' 'wsv.reader.R' 'xls.reader.R' 'xlsx.reader.R' 'xport.reader.R'

RoxygenNote:

7.3.1

Config/testthat/edition:

NeedsCompilation:

Packaged:

2024-07-01 18:05:05 UTC; kwhite

Author:

Aleksandar Blagotic [ctb], Diego Valle-Jones [ctb], Jeffrey Breen [ctb], Joakim Lundborg [ctb], John Myles White [aut, cph], Josh Bode [ctb], Kenton White [ctb, cre], Kirill Mueller [ctb], Matteo Redaelli [ctb], Noah Lorang [ctb], Patrick Schalk [ctb], Dominik Schneider [ctb], Gerold Hepp [ctb], Zunaira Jamil [ctb], Glen Falk [ctb]

Maintainer:

Kenton White <jkentonwhite@gmail.com>

Repository:

CRAN

Date/Publication:

2024-07-01 18:30:06 UTC

ProjectTemplate: Automates the Creation of New Statistical Analysis Projects

Description

Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing.

Author(s)

Maintainer: Kenton White jkentonwhite@gmail.com [contributor]

Authors:

John Myles White [copyright holder]

Other contributors:

Aleksandar Blagotic [contributor]
Diego Valle-Jones [contributor]
Jeffrey Breen [contributor]
Joakim Lundborg [contributor]
Josh Bode [contributor]
Kirill Mueller [contributor]
Matteo Redaelli [contributor]
Noah Lorang [contributor]
Patrick Schalk [contributor]
Dominik Schneider [contributor]
Gerold Hepp [contributor]
Zunaira Jamil [contributor]
Glen Falk [contributor]

Associate a reader function with an extension.

Description

This function will associate an extension with a custom reader function.

Usage

.add.extension(extension, reader)

Arguments

extension

The extension of the new data file.

reader

The function to use when reading the data file. It should accept three arguments: data.file, filename and variable.name (in that order). The function should read the contents of the file filename, and save it into the workspace under the name variable.name. The data.file argument is just a relative file name and can be ignored.

Value

No value is returned; this function is called for its side effects.

Warning

This interface should not be considered as stable and is likely to be replaced by a different mechanism in a forthcoming version of this package.

Examples

## Not run: .add.extension('foo', foo.reader)

Attach a package or add a namespace

Description

Internal method to attach a package or only add the namespace.

Usage

.attach.or.add.namespace(package.name, attach)

Arguments

package.name

name of the package to load, as a character vector

attach

boolean indicating whether to attach the package in the global namespace

Value

Boolean indicating whether the package was successfully loaded

Construct the file names for the cache and hash

Description

Construct the file names for the cache and hash

Usage

.cache.filename(variable, cache_format)

Arguments

variable

Variable name for which to construct file names

cache_format

expression as returned by .cache.format

Details

The returned object is a list with two fields:

data: The path to the file in which the variable contents will be saved;
hash: The path to the file in which the cache metadata will be stored.

Value

A list with file names

Get configured cache file format strategy

Description

Get configured cache file format strategy

Usage

.cache.format()

Value

A named object of mode expression.

Calculate the hash of the data stored in a variable

Description

Calculate the hash of the data stored in a variable

Usage

.cache.hash(variables, env = .TargetEnv)

Arguments

variables

character vector of variable names

env

environment from which to load the variable

Details

The hashes are calculated using the digest::digest function.

Value

data.frame with the variable names and the corresponding hashes

Print the current cache status

Description

Print the current cache status

Usage

.cache.status()

Value

No value is returned; this function is called for its side effects.

List all cached variables

Description

List all variables for which files are available in the cache. The info is purely based on the files in the cache directory. There is no guarantee the variable can actually be loaded from the cache.

Usage

.cached.variables()

Value

Character vector of cached variables

Compare the project version with the current ProjectTemplate version

Description

Compare the project version with the current ProjectTemplate version

Usage

.check.version(config, warn.migrate = TRUE)

Arguments

config

Project configuration

warn.migrate

Logical indicating whether a warning should be raised if the project version is older than the installed version of ProjectTemplate.

Value

0 if the numbers are equal, -1 if b is later and 1 if a is later (analogous to the C function strcmp).

Convert one or more data sets to data.tables

Description

Converts all base::data.frames referred to in the input to data.tables. The resulting data set is stored in the .TargetEnv.

Usage

.convert.to.data.table(data.sets)

Arguments

data.sets

A character vector of variable names.

Value

No value is returned; this function is called for its side effects.

Convert one or more data sets to tibbles

Description

Converts all base::data.frames referred to in the input to tibbles. The resulting data set is stored in the .TargetEnv.

Usage

.convert.to.tibble(data.sets)

Arguments

data.sets

A character vector of variable names.

Value

No value is returned; this function is called for its side effects.

Create a data.frame with the cache metadata

Description

Create a data.frame with the cache metadata

Usage

.create.cache.hash(variable, depends, CODE)

Arguments

variable

Name of the variable to be cached

depends

Vector of variable names of dependencies for the variable to be cached, optional.

CODE

Code block to generate variable, registered as a dependency, optional.

Details

The hashes for the various objects are calculated using the .cache.hash function.

Value

data.frame containing the variable name and its dependencies, with the corresponding hashes appended.

Create a project structure

Description

.create.project.existing creates a project directory structure inside an existing directory with the default files from a given template.

.create.project.new first creates a new directory and then passes further control to .create.project.existing. In case the project creation fails, the newly created directory is cleaned up.

Usage

.create.project.existing(
  project.name,
  merge.strategy,
  template,
  rstudio.project
)

.create.project.new(project.name, template, rstudio.project)

Arguments

project.name

Character vector with the name of the project directory

merge.strategy

Character vector determining whether the directory should be empty or is allowed to contain non-conflicting files

template

Name of the template from which the project should be created

rstudio.project

Logical indicating whether an .Rproj file should be created

Value

No value is returned; this function is called for its side effects.

Check if a directory is empty

Description

Checks if the directory listing by .list.files.and.dirs is empty.

Usage

.dir.empty(path)

Arguments

path

Character vector containing the path to the directory to check.

Value

Logical indicating whether the passed directory was empty.

Run code and assign the results to variable

Description

Run code and assign the results to variable

Usage

.evaluate.code(variable, CODE)

Arguments

variable

variable name in which to store the result of CODE

CODE

code block that returns a result which can be stored in a variable

Details

No error handling is done on the executed code, nor is the

Get the location of a template from its name

Description

Checks the configured option('ProjectTemplate.templatedir') for the template. If no matching template was found the system templates are checked, and finally the current directory is checked. If no template was found with the given name an error is raised.

Usage

.get.template(template)

Arguments

template

Character vector containing the name of the template

Value

Character vector containing the location of the template. If no template was found by the given name an error is raised.

Check if the project was loaded

Description

Currently does a very basic check to see if the variable project.info exists in the .TargetEnv. No check is performed on the contents of the variable.

Usage

.has.project()

Value

Logical indicating whether the project was loaded.

Initialize the logger for the project

Description

Creates a log4r::logger and provides a default log file log/project.log.

Usage

.init.logger(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.

Test whether a given path is a ProjectTemplate project

Description

Test whether a given path is a ProjectTemplate project

Usage

.is.ProjectTemplate(path = getwd())

Arguments

path

Directory to check, defaults to the current working directory.

Value

Logical indicating whether the given path is a valid project.

Check whether the cache is empty

Description

Check whether the cache is empty

Usage

.is.cache.empty()

Value

Logical indicating whether the cache is empty

Check whether variables are cached

Description

Check whether variables are cached

Usage

.is.cached(varnames)

Arguments

varnames

Character vector of variable names

Value

Logical vector indicating whether the variable is in the cache.

Check if path is an existing directory

Description

Checks if a given path exists, and if so if it is a directory.

Usage

.is.dir(path)

Arguments

path

Character vector containing the path to the directory to check.

Value

Logical indicating a valid directory was passed.

Build the list of data available for loading into memory

Description

This function produces a data.frame of all data files in the project, with meta data on if and how the file will be loaded by load.project.

Usage

.list.data(config)

Arguments

config

List containing the configuration to use.

Details

The returned data.frame contains the following variables, with one observation per file in data/:

`filename`	Character variable containing the filename relative to `data/` directory.
`varname`	Character variable containing the name of the variable into which the file will be imported. *
`is_ignored`	Logical variable that indicates whether the file. is ignored through the `data_ignore` option in the configuration
`is_directory`	Logical variable that indicates whether the file is a directory.
`is_cached`	Logical variable that indicates whether the file is already available in the `cache/` directory.
`cached_only`	Logical variable that indicates whether the variable is only available in the `cache/` directory. This occurs when calling the cache function with a code fragment in a munge script.
`reader`	Character variable containing the name of the reader function that will be used to load the data. Contains a `character(0)` if no suitable reader was found.

* Note that some readers return more than one variable, usually with the listed variable name as prefix. This is true for for example the xls.reader and xlsx.reader.

Value

A data.frame listing the available data, with relevant meta data

List all files and directories, excluding .. and .

Description

Creates a directory listing of a given path, including hidden files and subdirectories, but excluding the .. and . aliases.

Usage

.list.files.and.dirs(path)

Arguments

path

Character vector indicating the path to the parent folder of which the contents should be listed.

Value

Directory listing of path

Load the data from the cache and data directories

Description

Gets the list of available variables in cache/ and data/ and loads the data in memory. Data from the cache is loaded first, then in alphabetical order.

Usage

.load.data(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.

Load the helper functions

Description

Sources all helper scripts in lib. If lib/globals.R exists this is loaded first, all other scripts are sourced in alphabetical order.

Usage

.load.helpers(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.

Load the libraries listed in the configuration into memory

Description

Load the libraries listed in the libraries entry in global.dcf and add the library names to the project.info.

Usage

.load.libraries(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.

Source all munge scripts

Description

Sources all munge scripts in the munge directory in alphabetical order.

Usage

.munge.data(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.

Get the current ProjectTemplate version

Description

Reads the installed version of ProjectTemplate from the DESCRIPTION file.

Usage

.package.version()

Value

Version as a character vector.

Match readers to the extensions of the data files

Description

Match readers to the extensions of the data files

Usage

.parse.extensions(data.files, config)

Arguments

data.files

a vector of paths to data files

Value

A list of readers and varnames

Prepare a regular expression for matching files to be ignored

Description

Constructs a single regular expression for matching file names in data that should not be imported. It can detect literal names, globs with wildcards and regular expressions.

Usage

.prepare.data.ignore.regex(ignore_files)

Arguments

ignore_files

A comma separated character vector that lists all patterns to be matched for ignoring

Value

A chained regular expression that matches all patterns in the ignore_files variable.

Make sure a required directory exists before usage

Description

Checks if the requested directory exists, and if not creates the directory. In the latter case a warning is raised.

Usage

.provide.directory(name)

Arguments

name

Character vector containing the name of the required directory.

Value

No value is returned; this function is called for its side effects.

Stop silently

Description

Temporarily disable option(show.error.messages) and stop execution.

Usage

.quietstop()

Value

No value is returned; this function is called for its side effects.

Read metadata for a variable in the cache

Description

Read metadata for a variable in the cache

Usage

.read.cache.info(variable)

Arguments

variable

Variable name for which to look up the metadata

Details

The returned object is a list with two fields:

in.cache: Logical indicating whether the requested variable was found in the cache
hash: A data.frame as was created by .create.cache.hash

Value

list with metadata, see Details for more info.

Remove variables to keep from a list of candidates for removal

Description

Remove variables to keep from a list of candidates for removal

Usage

.remove.sticky.vars(names, keep)

Arguments

names

character vector of variable names that are candidate for removal

keep

character vector of variable names that should not be removed

Details

If the sticky_variables option is part of the config variable the config variable itself is added to the list of variables to keep. Also all variables listed in config$sticky_variables in a comma separated list are added to keep.

Value

A character vector containing the variables to remove.

Require internal package

Description

Internal method to require a package that is necessary for the internal functioning of ProjectTemplate. Never attaches the package unless configured to do so in global.dcf (which throws a warning).

Usage

.require.package(package.name)

Arguments

package.name

name of the package to load, as a character vector

Value

No value is returned; this function is called for its side effects.

Return an RStudio project file as character vector

Description

Return an RStudio project file as character vector

Usage

.rstudioprojectfile()

Value

Character vector with the contents of an empty RStudio project file

Raise an error if given path is not a valid project

Description

Function to stop processing if the path is not a Project Template return the project name if it is a Project Template directory.

Usage

.stopifnotproject(additional_message = "", path = getwd())

Arguments

additional_message

Optional message to show if the given path is not a valid project

path

Path to check if it is a valid project

Value

Project name if it is a valid Project.

Raise an error if given path is a valid project

Description

Function to stop processing if the path is a Project Template.

Usage

.stopifproject(additional_message = "", path = getwd())

Arguments

additional_message

Optional message to show if the given path is not a valid project

path

Path to check if it is a valid project

Value

No value is returned; this function is called for its side effects

Unload the project variables keeping the data

Description

Removes the config, logger and project.info variables from memory, leaving all data variables in place.

Usage

.unload.project()

Value

No value is returned; this function is called for its side effects.

Compare sets of variable names

Description

Compare the variables (excluding functions) in the global env with a passed in string of names and return the set difference.

Usage

.var.diff.from(given.var.list = "", env = .TargetEnv)

Arguments

given.var.list

Character vector of variable names

env

Environment in which to compare the sets of variables

Write a variable and its metadata to cache

Description

Write a variable and its metadata to cache

Usage

.write.cache(cache.hash, ...)

Arguments

cache.hash

a data.frame with metadata about the variable, see details for more information.

...

extra parameters passed to save.

Details

cache.hash is a data frame with two columns: variable and hash.
Row name VAR is the name of the variable to save.
Row name CODE is the hash value of the code to compute variable.
Row name DEPENDS.* are the dependent variables that CODE depends on.c
The helper function .create.cache.hash creates a suitable dataframe

Value

No value is returned, this function is called for its side effects.

Add project specific config to the global config

Description

Enables project specific configuration to be added to the global config object. The allowable format is key value pairs which are appended to the end of the config object, which is accessible from the global environment.

Usage

add.config(..., apply.override = FALSE)

Arguments

...

A series of key-value pairs containing the configuration. The key is the name that gets added to the config object. These can be overridden at load time through the ... argument to load.project.

apply.override

A boolean indicating whether overrides should be applied. This can be used to add a setting disregarding arguments to load.project

Details

Once defined, the value can be accessed from any ProjectTemplate script by referencing config$my_project_var.

Examples

library('ProjectTemplate')
## Not run: 
add.config(
    keep_bigdata=TRUE,     # Whether to keep the big data file in memory
    parse=7                # number of fields to parse
)

if (config$keep_bigdata) ...

## End(Not run)

Cache a data set for faster loading.

Description

This function will store a copy of the named data set in the cache directory. This cached copy of the data set will then be given precedence at load time when calling load.project. Cached data sets are stored as .RData or optionally as .qs files.

Usage

cache(variable = NULL, CODE = NULL, depends = NULL, ...)

Arguments

variable

A character string containing the name of the variable to be saved. If the CODE parameter is defined, it is evaluated and saved, otherwise the variable with that name in the global environment is used.

CODE

A sequence of R statements enclosed in {..} which produce the object to be cached. Requires suggested package formatR

depends

A character vector of other global environment objects that the CODE depends upon. Caching will be forced if those objects have changed since last caching

...

Additional arguments passed on to save or optionally to qsave. See project.config for further information.

Details

Usually you will want to cache datasets during munging. This can be the raw data just loaded, or it can be the result of further processing during munge. Either way, it can take a while to cache large variables, so cache will only cache when it needs to. The clear.cache("variable") command can be run to flush individual items from the cache.

Calling cache() with no arguments returns the current status of the cache.

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')
## Not run: create.project('tmp-project')

setwd('tmp-project')

dataset1 <- 1:5
cache('dataset1')

setwd('..')
unlink('tmp-project')
## End(Not run)

Translate a variable name into a file name for caching.

Description

This function will translate a variable name into a form that is suitable as a filename on most OS's.

Usage

cache.name(data.filename)

Arguments

data.filename

The variable name to be translated into a filename.

Value

A translated variable name.

Examples

library('ProjectTemplate')

## Not run: cache.name('example.1')

Cache a project's data sets in binary format.

Description

This function will cache all of the data sets that were loaded by the load.project function in a binary format that is easier to load quickly. This is particularly useful for data sets that you've modified during a slow munging process that does not need to be repeated.

Usage

cache.project()

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')
## Not run: load.project()

cache.project()
## End(Not run)

Translate a file name into a valid R variable name.

Description

This function will translate a file name into a name that is a valid variable name in R. Non-alphabetic characters on the boundaries of the file name will be stripped; non-alphabetic characters inside of the file name will be replaced with dots.

Usage

clean.variable.name(variable.name, config = .load.config())

Arguments

variable.name

A character vector containing a variable's proposed name that should be standardized.

config

A list of configuration variables. Defaults to those loaded by load.project

Value

A translated variable name.

Examples

library('ProjectTemplate')

## Not run: clean.variable.name('example_1')

Clear objects from the global environment

Description

This function removes specific (or all by default) named objects from the global environment. If used within a ProjectTemplate project, then any variables defined in the config$sticky_variables will remain.

Usage

clear(..., keep = c(), force = FALSE)

Arguments

...

A sequence of character strings of the objects to be removed from the global environment. If none given, then all items except those in keep will be deleted. This includes items beginning with .

keep

A character vector of variables that should remain in the global environment

force

If TRUE, then variables will be deleted even if specified in keep or config$sticky_variables

Value

The variables kept and removed are reported

Examples

library('ProjectTemplate')
## Not run: 
clear("x", "y", "z")
clear(keep="a")
clear()

## End(Not run)

Clear data sets from the cache

Description

This function remove specific (or all by default) named data sets from the cache directory. This will force that data to be read in from the data directory next time load.project is called.

Usage

clear.cache(...)

Arguments

...

A sequence of character strings of the variables to be removed from the cache. If none given, then all items in the cache will be removed.

Value

Success or failure is reported

Examples

library('ProjectTemplate')
## Not run: 
clear.cache("x", "y", "z")

## End(Not run)

Create a new project.

Description

This function will create all of the scaffolding for a new project. It will set up all of the relevant directories and their initial contents. For those who only want the minimal functionality, the template argument can be set to minimal to create a subset of ProjectTemplate's default directories. For those who want to dump all of ProjectTemplate's functionality into a directory for extensive customization, the dump argument can be set to TRUE.

Usage

create.project(
  project.name = "new-project",
  template = "full",
  dump = FALSE,
  merge.strategy = c("require.empty", "allow.non.conflict"),
  rstudio.project = FALSE
)

Arguments

project.name

A character vector containing the name for this new project. Must be a valid directory name for your file system.

template

A character vector containing the name of the template to use for this project. By default a full and minimal template are provided, but custom templates can be created using create.template.

dump

A boolean value indicating whether the entire functionality of ProjectTemplate should be written out to flat files in the current project.

merge.strategy

What should happen if the target directory exists and is not empty? If "force.empty", the target directory must be empty; if "allow.non.conflict", the method succeeds if no files or directories with the same name exist in the target directory.

rstudio.project

A boolean value indicating whether the project should also be an 'RStudio Project'. Defaults to FALSE. If TRUE, then a 'projectname.Rproj' with usable defaults is added to the ProjectTemplate directory.

Details

If the target directory does not exist, it is created. Otherwise, it can only contain files and directories allowed by the merge strategy.

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: create.project('MyProject')

Create a new template

Description

This function writes a skeleton directory structure for creating your own custom templates.

Usage

create.template(target, source = "minimal")

Arguments

target

Name of the new template. It is created under the directory specified by options('ProjectTemplate.templatedir'), or, when missing, in the current directory.

source

Name of an existing template to copy, defaults to the built in 'minimal' template.

Show information about the current project.

Description

This function will return all of the information that ProjectTemplate has about the current project. This information is gathered when load.project is called. At present, ProjectTemplate keeps a record of the project's configuration settings, all packages that were loaded automatically and all of the data sets that were loaded automatically. The information about autoloaded data sets is used by the cache.project function.

Usage

get.project()

Details

In previous releases this information has been available through the global variable project.info. Using this variable is now deprecated and will result in a warning.

Value

A named list.

Examples

library('ProjectTemplate')

## Not run: load.project()

get.project()
## End(Not run)

Listing the data for the current project

Description

This function produces a data.frame of all data files in the project, with meta data on if and how the file will be loaded by load.project.

Usage

list.data(...)

Arguments

...

Named arguments to override configuration from config/global.dcf and lib/global.R.

Details

The returned data.frame contains the following variables, with one observation per file in data/:

`filename`	Character variable containing the filename relative to `data/` directory.
`varname`	Character variable containing the name of the variable into which the file will be imported. *
`is_ignored`	Logical variable that indicates whether the file. is ignored through the `data_ignore` option in the configuration
`is_directory`	Logical variable that indicates whether the file is a directory.
`is_cached`	Logical variable that indicates whether the file is already available in the `cache/` directory.
`cached_only`	Logical variable that indicates whether the variable is only available in the `cache/` directory. This occurs when calling the cache function with a code fragment in a munge script.
`reader`	Character variable containing the name of the reader function that will be used to load the data. Contains a `character(0)` if no suitable reader was found.

* Note that some readers return more than one variable, usually with the listed variable name as prefix. This is true for for example the xls.reader and xlsx.reader.

Value

A data.frame listing the available data, with relevant meta data

Examples

library('ProjectTemplate')

## Not run: list.data()

Automatically load data and packages for a project.

Description

This function automatically load all of the data and packages used by the project from which it is called. The behavior can be controlled by adjusting the project.config configuration.

Usage

load.project(...)

Arguments

...

Named arguments to override configuration from config/global.dcf and lib/global.R.

Details

... can take an argument override.config or a single named list for backward compatibility. This cannot be mixed with the new style override. When a named argument override.config is present it takes precedence over the other options. If any of the provided arguments is unnamed an error is raised.

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: load.project()

Load Project

Description

Call this function as an addin to load the library and run 'load.project()'

Usage

loadproject_addin()

Migrates a project from a previous version of ProjectTemplate

Description

This function automatically performs all necessary steps to migrate an existing project so that it is compatible with this version of ProjectTemplate

Usage

migrate.project()

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: migrate.project()

Migrate a template to a new version of ProjectTemplate

Description

This function updates a skeleton project to the current version of ProjectTemplate.

Usage

migrate.template(template)

Arguments

template

Name of the template to upgrade.

Automatically read data into memory

Description

The preinstalled readers are automatically loaded in the list preinstalled.readers. The reader functions will load a data set stored in the data directory into the specified global variable binding. These functions are not meant to be called directly.

Usage

preinstalled.readers

arff.reader(data.file, filename, variable.name)

csv.reader(data.file, filename, variable.name)

csv2.reader(data.file, filename, variable.name)

db.reader(data.file, filename, variable.name)

dbf.reader(data.file, filename, variable.name)

epiinfo.reader(data.file, filename, variable.name)

feather.reader(data.file, filename, variable.name)

file.reader(data.file, filename, variable.name)

mp3.reader(data.file, filename, variable.name)

mtp.reader(data.file, filename, variable.name)

octave.reader(data.file, filename, variable.name)

ppm.reader(data.file, filename, variable.name)

r.reader(data.file, filename, variable.name)

rdata.reader(data.file, filename, variable.name)

rds.reader(data.file, filename, variable.name)

spss.reader(data.file, filename, variable.name)

sql.reader(data.file, filename, variable.name)

stata.reader(data.file, filename, variable.name)

systat.reader(data.file, filename, variable.name)

tsv.reader(data.file, filename, variable.name)

url.reader(data.file, filename, variable.name)

wsv.reader(data.file, filename, variable.name)

xls.reader(data.file, filename, workbook.name)

xlsx.reader(data.file, filename, workbook.name)

xport.reader(data.file, filename, variable.name)

Arguments

data.file

The name of the data file to be read.

filename

The path to the data set to be loaded.

variable.name

The name to be assigned to in the global environment.

Format

An object of class list of length 55.

Details

Some file formats can contain more than one dataset. In this case all datasets are loaded into separate variables in the format <variable.name>.<subset.name>, where the subset.name is determined by the reader automatically.

The sql.reader function will load data from a SQL database based on configuration information found in the specified .sql file. The .sql file must specify a database to be accessed. All tables from the database, one specific tables or one specific query against any set of tables may be executed to generate a data set.

queries can support string interpolation to execute code snippets using mustache syntax (http://mustache.github.io). This is used to create queries that depend on data from other sources. Code delimited is {{...}}

Example: query: SELECT * FROM my_table WHERE id IN ({{ids}}). Here ids is a vector previously loaded into the Global Environment through ProjectTemplate

Examples of the DCF format and settings used in a .sql file are shown below:

Example 1 type: mysql user: sample_user password: sample_password host: localhost dbname: sample_database table: sample_table

Example 2 type: mysql user: sample_user password: sample_password host: localhost port: 3306 socket: /Applications/MAMP/tmp/mysql/mysql.sock dbname: sample_database table: sample_table

Example 3 type: sqlite dbname: /path/to/sample_database table: sample_table

Example 4 type: sqlite dbname: /path/to/sample_database query: SELECT * FROM users WHERE user_active == 1

Example 5 type: sqlite dbname: /path/to/sample_database table: *

Example 6 type: postgres user: sample_user password: sample_password host: localhost dbname: sample_database table: sample_table

Example 7 type: odbc dsn: sample_dsn user: sample_user password: sample_password dbname: sample_database query: SELECT * FROM sample_table

Example 8 type: oracle user: sample_user password: sample_password dbname: sample_database table: sample_table

Example 9 type: jdbc class: oracle.jdbc.OracleDriver classpath: /path/to/ojdbc5.jar (or set in CLASSPATH) user: scott password: tiger url: jdbc:oracle:thin:@myhost:1521:orcl query: select * from emp

Example 10 type: heroku classpath: /path/to/jdbc4.jar (or set in CLASSPATH) user: scott password: tiger host: heroku.postgres.url port: 1234 dbname: herokudb query: select * from emp

Example 11 In this example RSQLite::initExtension() is automatically called on the established connection.

Liam Healy has written extension-functions.c, which is available on http://www.sqlite.org/contrib. It provides mathematical and string extension functions for SQL queries using the loadable extensions mechanism.

type: sqlite dbname: /path/to/sample_database plugin: extension query: SELECT *,STDEV(value1) FROM example_table

Value

No value is returned; the reader functions are called for its side effects.

Functions

arff.reader(): Read the Weka file format from files with the .arff extension.
csv.reader(): Read a comma separated values file with the .csv extension.
csv2.reader(): Read a semicolon separated values file with the .csv2 extension.

In May 2018, the default behavior of the reader for .csv2 files changed to use R's read.csv2(), where the field separator is assumed to be ';' and the decimal separator to be ','.
db.reader(): Read a SQlite3 database with a .db file extension.

If you want to specify a single table or query to execute against the database, move it elsewhere and use a .sql file interpreted by sql.reader.
dbf.reader(): Read an XBASE file with a .dbf file extension.
epiinfo.reader(): Read an Epi Info file with a .rec file extension.
feather.reader(): Read a feather file in Apache Arrow format with a .feather file extension.
file.reader(): Read an arbitrary file described in a .file file.

A .file file must contain DCF that specifies the path to the data set and which extension should be used from the dispatch table to load the data set.

Examples of the DCF format and settings used in a .file file are shown below:

path: http://www.johnmyleswhite.com/ProjectTemplate/sample_data.csv extension: csv
mp3.reader(): Read an MP3 file with a .mp3 file extension.

This function will load the specified MP3 file into memory using the tuneR package. This is useful for working with music files as a data set.
mtp.reader(): Read a Minitab Portable Worksheet with a .mtp3 file extension.
octave.reader(): Read an Octave file with a .m file extension.

This function will load the specified Octave file into memory using the foreign::read.octave function.
ppm.reader(): Read a PPM file with a .ppm file extension.

Data is loaded using the pixmap::read.pnm function.
r.reader(): Read an R source file with a .R file extension.

This function will call source on the specified R file, executing the code inside of it as a way of generating data sets dynamically, as in many Monte Carlo applications.
rdata.reader(): Read an RData file with a .rdata or .rda file extension.

This function will load the specified RData file into memory using the load function. This may generate many data sets simultaneously.
rds.reader(): Read the RDS file format from files with the .rds extension.
spss.reader(): Read an SPSS file with a .sav file extension.

This function will load the specified SPSS file into memory. It will convert the resulting list object into a data frame before inserting the data set into the global environment.
sql.reader(): Read a database described in a .sql file.
stata.reader(): Read a Stata file with a .stata file extension.
systat.reader(): Read a Systat file with a .sys or .syd file extension.
tsv.reader(): Read a tab separated values file with the .tsv or .tab file extensions.
url.reader(): Read a remote file described in a .url file.

This function will load data from a remote source accessible through HTTP or FTP based on configuration information found in the specified .url file. The .url file must specify the URL of the remote data source and the type of data that is available remotely. Only one data source per .url file is supported currently.

Examples of the DCF format and settings used in a .url file are shown below:

Example 1 url: http://www.johnmyleswhite.com/ProjectTemplate/sample_data.csv separator: ,
wsv.reader(): Read a whitespace separated values file with the .wsv or .txt file extensions.
xls.reader(): Read an Excel file with a .xls file extension.

This function will load the specified Excel file into memory using the readxl package.
xlsx.reader(): Read an Excel 2007 file with a .xlsx file extension.

This function will load the specified Excel file into memory using the readxl package.
xport.reader(): Read an XPort file with a .xport file extension.

ProjectTemplate Configuration file

Description

Every ProjectTemplate project has a configuration file found at config/global.dcf that contains various options that can be tweaked to control runtime behavior. The valid options are shown below, and must be encoded using the DCF format.

Usage

project.config()

Details

Calling the project.config() function will display the current project configuration.

The options that can be configured in the config/global.dcf are shown below

`data_loading`	This can be set to TRUE or FALSE. If data_loading is on, the system will load data from both the cache and data directories with cache taking precedence in the case of name conflict.
`data_loading_header`	This can be set to TRUE or FALSE. If data_loading_header is on, the system will load text data files, such as CSV, TSV, or XLSX, treating the first row as header.
`data_ignore`	A comma separated list of files to be ignored when importing from the `data/` directory. Regular expressions can be used but should be delimited (on both sides) by `/`. Note that filenames and filepaths should never begin with a `/`, entire directories under `data/` can be ignored by adding a trailing `/`.
`cache_loading`	This can be set to TRUE or FALSE. If cache_loading is on, the system will load data from the cache directory before any attempt to load from the data directory.
`recursive_loading`	This can be set to TRUE or FALSE. If recursive_loading is on, the system will load data from the data directory and all its sub directories recursively.
`munging`	This can be set to TRUE or FALSE. If munging is on, the system will execute the files in the munge directory sequentially using the order implied by the sort() function. If munging is FALSE, none of the files in the munge directory will be executed.
`logging`	This can be set to TRUE or FALSE. If logging is on, a logger object using the log4r package is automatically created when you run load.project(). This logger will write to the logs directory.
`logging_level`	The value of logging_level is passed to a logger object using the log4r package during logging when when you run load.project().
`load_libraries`	This can be set to TRUE or FALSE. If load_libraries is on, the system will load all of the R packages listed in the libraries field described below.
`libraries`	This is a comma separated list of all the R packages that the user wants to automatically load when load.project() is called. These packages must already be installed before calling load.project().
`as_factors`	This can be set to TRUE or FALSE. If as_factors is on, the system will convert every character vector into a factor when creating data frames; most importantly, this automatic conversion occurs when reading in data automatically. If FALSE, character vectors will remain character vectors.
`tables_type`	This is the format for default tables. Values can be 'tibble' (default), 'data_table', or 'data_frame'
`attach_internal_libraries`	This can be set to TRUE or FALSE. If attach_internal_libraries is on, then every time a new package is loaded into memory during load.project() a warning will be displayed informing that has happened.
`cache_loaded_data`	This can be set to TRUE or FALSE. If cache_loaded_data is on, then data loaded from the data directory during load.project() will be automatically cached (so it won't need to be reloaded next time load.project() is called).
`sticky_variables`	This is a comma separated list of any project-specific variables that should remain in the global environment after a `clear()` command. This can be used to clear the global environment, but keep any large datasets in place so they are not unnecessarily re-generated during `load.project()`. Note that any this will be over-ridden if the `force=TRUE` parameter is passed to `clear()``.
`underscore_variables`	This can be set to `TRUE` to use underscores ('_') in variable names or `FALSE` to replace underscores ('_') with dots ('.'). The default is `TRUE`. When migrating old projects, `underscore_variables` is set to `FALSE`.
`cache_file_format`	The default file format for cached data is 'RData'. This can be set to 'qs' in order to benefit from the quick serialization of R objects provided by qs.

If the config/globals.dcf is missing some items (for example because it was created under an old version of ProjectTemplate, then the following configuration is used for any missing items during load.project():

`data_loading`	`TRUE`
`data_loading_header`	`TRUE`
`data_ignore`
`cache_loading`	`TRUE`
`recursive_loading`	`FALSE`
`munging`	`TRUE`
`logging`	`FALSE`
`logging_level`	`INFO`
`load_libraries`	`FALSE`
`libraries`	`reshape2, plyr, tidyverse, stringr, lubridate`
`as_factors`	`FALSE`
`tables_type`	`tibble`
`attach_internal_libraries`	`TRUE`
`cache_loaded_data`	`FALSE`
`sticky_variables`	`NONE`
`underscore_variables`	`FALSE`
`cache_file_format`	`RData`

When a new project is created using create.project(), the following values are pre-populated:

`version`	`0.11.0`
`data_loading`	`TRUE`
`data_loading_header`	`TRUE`
`data_ignore`
`cache_loading`	`TRUE`
`recursive_loading`	`FALSE`
`munging`	`TRUE`
`logging`	`FALSE`
`logging_level`	`INFO`
`load_libraries`	`FALSE`
`libraries`	`reshape2, plyr, tidyverse, stringr, lubridate`
`as_factors`	`FALSE`
`tables_type`	`tibble`
`attach_internal_libraries`	`FALSE`
`cache_loaded_data`	`TRUE`
`sticky_variables`	`NONE`
`underscore_variables`	`TRUE`
`cache_file_format`	`RData`

Value

The current project configuration is displayed.

Reload or reset a project

Description

This function will clear the global environment and reload a project. This is useful when you've updated your data sets or changed your preprocessing scripts. Any sticky_variables configuration parameter in project.config will remain both in memory and (if present) in the cache by default. If the reset parameter is TRUE, then all variables are cleared from both the global environment and the cache.

Usage

reload.project(..., reset = FALSE)

Arguments

...

Optional parameters passed to load.project

reset

A boolean value, which if set TRUE clears the cache and everything in the global environment, including any sticky_variables

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: load.project()

reload.project()
## End(Not run)

Reload Project

Description

Call this function as an addin to load the library and run 'reload.project()'

Usage

reloadproject_addin()

Require a package for use in the project

Description

This functions will require the given package. If the package is not installed it will stop execution and print a message to the user instructing them which package to install and which function caused the error.

Usage

require.package(package.name, attach = TRUE)

Arguments

package.name

A character vector containing the package name. Must be a valid package name installed on the system.

attach

Should the package be attached to the search path (as with library) or not (as with loadNamespace)? Defaults to TRUE. (Internal code will use FALSE by default unless a compatibility switch is set, see below.)

Details

The function .require.package is called by internal code. It will attach the package to the search path (with a warning) only if the compatibility configuration attach_internal_libraries is set to TRUE. Normally, packages used for loading data are not needed on the search path, but not loading them might break existing code. In a forthcoming version this compatibility setting will be removed, and no packages will be attached to the search path by internal code.

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: require.package('PackageName')

Run all of the analyses in the `src` directory.

Description

This function will run each of the analyses in the src directory in separate processes. At present, this is done serially, but future versions of this function will provide a means of running the analyses in parallel.

Usage

run.project()

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: run.project()

Show information about the current project.

Description

This function will show the user all of the information that ProjectTemplate has about the current project. This information is gathered when load.project is called. At present, ProjectTemplate keeps a record of the project's configuration settings, all packages that were loaded automatically and all of the data sets that were loaded automatically. The information about autoloaded data sets is used by the cache.project function.

Usage

show.project()

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: load.project()

show.project()
## End(Not run)

Generate unit tests for your helper functions.

Description

This function will parse all of the functions defined in files inside of the lib directory and will generate a trivial unit test for each function. The resulting tests are stored in the file tests/autogenerated.R. Every test is excepted to fail by default, so you should edit them before calling test.project.

Usage

stub.tests()

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: stub.tests()

Run all unit tests for this project.

Description

This function will run all of the testthat style unit tests for the current project that are defined inside of the tests directory. The tests will be run in the order defined by the filenames for the tests: it is recommend that each test begin with a number specifying its position in the sequence.

Usage

test.project()

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: load.project()

test.project()
## End(Not run)

Read a DCF file into an R list.

Description

This function will read a DCF file and translate the resulting data frame into a list. The DCF format is used throughout ProjectTemplate for configuration settings and ad hoc file format specifications.

Usage

translate.dcf(filename)

Arguments

filename

A character vector specifying the DCF file to be translated.

Details

The content of the DCF file are stored as character strings. If the content is placed between the back tick character , then the content is evaluated as R code and the result returned in a string

Value

Returns a list containing the entries from the DCF file.

Examples

library('ProjectTemplate')

## Not run: translate.dcf(file.path('config', 'global.dcf'))

ProjectTemplate: Automates the Creation of New Statistical Analysis Projects

Description

Author(s)

See Also

Associate a reader function with an extension.

Description

Usage

Arguments

Value

Warning

See Also

Examples

Attach a package or add a namespace

Description

Usage

Arguments

Value

Construct the file names for the cache and hash

Description

Usage

Arguments

Details

Value

Get configured cache file format strategy

Description

Usage

Value

Calculate the hash of the data stored in a variable

Description

Usage

Arguments

Details

Value

Print the current cache status

Description

Usage

Value

List all cached variables

Description

Usage

Value

Compare the project version with the current ProjectTemplate version

Description

Usage

Arguments

Value

Convert one or more data sets to data.tables

Description

Usage

Arguments

Value

Convert one or more data sets to tibbles

Description

Usage

Arguments

Value

Create a data.frame with the cache metadata

Description

Usage

Arguments

Details

Value

See Also

Create a project structure

Description

Usage

Arguments

Value

See Also

Check if a directory is empty

Description

Usage

Arguments

Value

Run code and assign the results to variable

Description

Usage

Arguments

Details

Get the location of a template from its name