Type: Package
Title: Automates the Creation of New Statistical Analysis Projects
Version: 0.11.0
Date: 2024-07-01
Description: Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing.
License: GPL-3 | file LICENSE
Language: en-US
LazyLoad: yes
Encoding: UTF-8
Depends: R (≥ 2.7), digest, tibble
Imports: methods
Suggests: foreign, feather, reshape, plyr, formatR, qs, stringr, ggplot2, lubridate, log4r (≥ 0.1-5), DBI, RMySQL, RSQLite, gdata, RODBC, RJDBC, readxl, xlsx, tuneR, pixmap, data.table, RPostgreSQL, GetoptLong, whisker, testthat (≥ 3.0.0), reticulate
URL: http://projecttemplate.net
BugReports: https://github.com/KentonWhite/ProjectTemplate/issues
Collate: 'ProjectTemplate-package.R' 'add.config.R' 'preinstalled.readers.R' 'add.extension.R' 'addins.R' 'arff.reader.R' 'get.project.R' 'cache.R' 'cache.name.R' 'cache.project.R' 'clean.variable.name.R' 'clear.R' 'clear.cache.R' 'translate.dcf.R' 'config.R' 'create.project.R' 'create.project.rstudio.R' 'create.template.R' 'csv.reader.R' 'csv2.reader.R' 'db.reader.R' 'dbf.reader.R' 'epiinfo.reader.R' 'feather.reader.R' 'file.reader.R' 'list.data.R' 'load.project.R' 'migrate.project.R' 'migrate.template.R' 'mp3.reader.R' 'mtp.reader.R' 'octave.reader.R' 'ppm.reader.R' 'project.config.R' 'r.reader.R' 'rdata.reader.R' 'rds.reader.R' 'reload.project.R' 'require.package.R' 'run.project.R' 'show.project.R' 'spss.reader.R' 'sql.reader.R' 'stata.reader.R' 'stopifnotproject.R' 'stub.tests.R' 'systat.reader.R' 'test.project.R' 'tsv.reader.R' 'url.reader.R' 'wsv.reader.R' 'xls.reader.R' 'xlsx.reader.R' 'xport.reader.R'
RoxygenNote: 7.3.1
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2024-07-01 18:05:05 UTC; kwhite
Author: Aleksandar Blagotic [ctb], Diego Valle-Jones [ctb], Jeffrey Breen [ctb], Joakim Lundborg [ctb], John Myles White [aut, cph], Josh Bode [ctb], Kenton White [ctb, cre], Kirill Mueller [ctb], Matteo Redaelli [ctb], Noah Lorang [ctb], Patrick Schalk [ctb], Dominik Schneider [ctb], Gerold Hepp [ctb], Zunaira Jamil [ctb], Glen Falk [ctb]
Maintainer: Kenton White <jkentonwhite@gmail.com>
Repository: CRAN
Date/Publication: 2024-07-01 18:30:06 UTC

ProjectTemplate: Automates the Creation of New Statistical Analysis Projects

Description

Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing.

Author(s)

Maintainer: Kenton White jkentonwhite@gmail.com [contributor]

Authors:

Other contributors:

See Also

Useful links:


Associate a reader function with an extension.

Description

This function will associate an extension with a custom reader function.

Usage

.add.extension(extension, reader)

Arguments

extension

The extension of the new data file.

reader

The function to use when reading the data file. It should accept three arguments: data.file, filename and variable.name (in that order). The function should read the contents of the file filename, and save it into the workspace under the name variable.name. The data.file argument is just a relative file name and can be ignored.

Value

No value is returned; this function is called for its side effects.

Warning

This interface should not be considered as stable and is likely to be replaced by a different mechanism in a forthcoming version of this package.

See Also

preinstalled.readers

Examples

## Not run: .add.extension('foo', foo.reader)

Attach a package or add a namespace

Description

Internal method to attach a package or only add the namespace.

Usage

.attach.or.add.namespace(package.name, attach)

Arguments

package.name

name of the package to load, as a character vector

attach

boolean indicating whether to attach the package in the global namespace

Value

Boolean indicating whether the package was successfully loaded


Construct the file names for the cache and hash

Description

Construct the file names for the cache and hash

Usage

.cache.filename(variable, cache_format)

Arguments

variable

Variable name for which to construct file names

cache_format

expression as returned by .cache.format

Details

The returned object is a list with two fields:

Value

A list with file names


Get configured cache file format strategy

Description

Get configured cache file format strategy

Usage

.cache.format()

Value

A named object of mode expression.


Calculate the hash of the data stored in a variable

Description

Calculate the hash of the data stored in a variable

Usage

.cache.hash(variables, env = .TargetEnv)

Arguments

variables

character vector of variable names

env

environment from which to load the variable

Details

The hashes are calculated using the digest::digest function.

Value

data.frame with the variable names and the corresponding hashes


Print the current cache status

Description

Print the current cache status

Usage

.cache.status()

Value

No value is returned; this function is called for its side effects.


List all cached variables

Description

List all variables for which files are available in the cache. The info is purely based on the files in the cache directory. There is no guarantee the variable can actually be loaded from the cache.

Usage

.cached.variables()

Value

Character vector of cached variables


Compare the project version with the current ProjectTemplate version

Description

Compare the project version with the current ProjectTemplate version

Usage

.check.version(config, warn.migrate = TRUE)

Arguments

config

Project configuration

warn.migrate

Logical indicating whether a warning should be raised if the project version is older than the installed version of ProjectTemplate.

Value

0 if the numbers are equal, -1 if b is later and 1 if a is later (analogous to the C function strcmp).


Convert one or more data sets to data.tables

Description

Converts all base::data.frames referred to in the input to data.tables. The resulting data set is stored in the .TargetEnv.

Usage

.convert.to.data.table(data.sets)

Arguments

data.sets

A character vector of variable names.

Value

No value is returned; this function is called for its side effects.


Convert one or more data sets to tibbles

Description

Converts all base::data.frames referred to in the input to tibbles. The resulting data set is stored in the .TargetEnv.

Usage

.convert.to.tibble(data.sets)

Arguments

data.sets

A character vector of variable names.

Value

No value is returned; this function is called for its side effects.


Create a data.frame with the cache metadata

Description

Create a data.frame with the cache metadata

Usage

.create.cache.hash(variable, depends, CODE)

Arguments

variable

Name of the variable to be cached

depends

Vector of variable names of dependencies for the variable to be cached, optional.

CODE

Code block to generate variable, registered as a dependency, optional.

Details

The hashes for the various objects are calculated using the .cache.hash function.

Value

data.frame containing the variable name and its dependencies, with the corresponding hashes appended.

See Also

.cache.hash


Create a project structure

Description

.create.project.existing creates a project directory structure inside an existing directory with the default files from a given template.

.create.project.new first creates a new directory and then passes further control to .create.project.existing. In case the project creation fails, the newly created directory is cleaned up.

Usage

.create.project.existing(
  project.name,
  merge.strategy,
  template,
  rstudio.project
)

.create.project.new(project.name, template, rstudio.project)

Arguments

project.name

Character vector with the name of the project directory

merge.strategy

Character vector determining whether the directory should be empty or is allowed to contain non-conflicting files

template

Name of the template from which the project should be created

rstudio.project

Logical indicating whether an .Rproj file should be created

Value

No value is returned; this function is called for its side effects.

See Also

create.project, create.template


Check if a directory is empty

Description

Checks if the directory listing by .list.files.and.dirs is empty.

Usage

.dir.empty(path)

Arguments

path

Character vector containing the path to the directory to check.

Value

Logical indicating whether the passed directory was empty.


Run code and assign the results to variable

Description

Run code and assign the results to variable

Usage

.evaluate.code(variable, CODE)

Arguments

variable

variable name in which to store the result of CODE

CODE

code block that returns a result which can be stored in a variable

Details

No error handling is done on the executed code, nor is the


Get the location of a template from its name

Description

Checks the configured option('ProjectTemplate.templatedir') for the template. If no matching template was found the system templates are checked, and finally the current directory is checked. If no template was found with the given name an error is raised.

Usage

.get.template(template)

Arguments

template

Character vector containing the name of the template

Value

Character vector containing the location of the template. If no template was found by the given name an error is raised.


Check if the project was loaded

Description

Currently does a very basic check to see if the variable project.info exists in the .TargetEnv. No check is performed on the contents of the variable.

Usage

.has.project()

Value

Logical indicating whether the project was loaded.


Initialize the logger for the project

Description

Creates a log4r::logger and provides a default log file log/project.log.

Usage

.init.logger(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.


Test whether a given path is a ProjectTemplate project

Description

Test whether a given path is a ProjectTemplate project

Usage

.is.ProjectTemplate(path = getwd())

Arguments

path

Directory to check, defaults to the current working directory.

Value

Logical indicating whether the given path is a valid project.


Check whether the cache is empty

Description

Check whether the cache is empty

Usage

.is.cache.empty()

Value

Logical indicating whether the cache is empty


Check whether variables are cached

Description

Check whether variables are cached

Usage

.is.cached(varnames)

Arguments

varnames

Character vector of variable names

Value

Logical vector indicating whether the variable is in the cache.


Check if path is an existing directory

Description

Checks if a given path exists, and if so if it is a directory.

Usage

.is.dir(path)

Arguments

path

Character vector containing the path to the directory to check.

Value

Logical indicating a valid directory was passed.


Build the list of data available for loading into memory

Description

This function produces a data.frame of all data files in the project, with meta data on if and how the file will be loaded by load.project.

Usage

.list.data(config)

Arguments

config

List containing the configuration to use.

Details

The returned data.frame contains the following variables, with one observation per file in data/:

filename Character variable containing the filename relative to data/ directory.
varname Character variable containing the name of the variable into which the file will be imported. *
is_ignored Logical variable that indicates whether the file. is ignored through the data_ignore option in the configuration
is_directory Logical variable that indicates whether the file is a directory.
is_cached Logical variable that indicates whether the file is already available in the cache/ directory.
cached_only Logical variable that indicates whether the variable is only available in the cache/ directory. This occurs when calling the cache function with a code fragment in a munge script.
reader Character variable containing the name of the reader function that will be used to load the data. Contains a character(0) if no suitable reader was found.

* Note that some readers return more than one variable, usually with the listed variable name as prefix. This is true for for example the xls.reader and xlsx.reader.

Value

A data.frame listing the available data, with relevant meta data


List all files and directories, excluding .. and .

Description

Creates a directory listing of a given path, including hidden files and subdirectories, but excluding the .. and . aliases.

Usage

.list.files.and.dirs(path)

Arguments

path

Character vector indicating the path to the parent folder of which the contents should be listed.

Value

Directory listing of path


Load the data from the cache and data directories

Description

Gets the list of available variables in cache/ and data/ and loads the data in memory. Data from the cache is loaded first, then in alphabetical order.

Usage

.load.data(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.


Load the helper functions

Description

Sources all helper scripts in lib. If lib/globals.R exists this is loaded first, all other scripts are sourced in alphabetical order.

Usage

.load.helpers(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.


Load the libraries listed in the configuration into memory

Description

Load the libraries listed in the libraries entry in global.dcf and add the library names to the project.info.

Usage

.load.libraries(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.


Source all munge scripts

Description

Sources all munge scripts in the munge directory in alphabetical order.

Usage

.munge.data(config, my.project.info)

Arguments

config

Named list containing the project configuration

my.project.info

Named list containing the project information

Value

Returns my.project.info amended with the new information.


Get the current ProjectTemplate version

Description

Reads the installed version of ProjectTemplate from the DESCRIPTION file.

Usage

.package.version()

Value

Version as a character vector.


Match readers to the extensions of the data files

Description

Match readers to the extensions of the data files

Usage

.parse.extensions(data.files, config)

Arguments

data.files

a vector of paths to data files

Value

A list of readers and varnames


Prepare a regular expression for matching files to be ignored

Description

Constructs a single regular expression for matching file names in data that should not be imported. It can detect literal names, globs with wildcards and regular expressions.

Usage

.prepare.data.ignore.regex(ignore_files)

Arguments

ignore_files

A comma separated character vector that lists all patterns to be matched for ignoring

Value

A chained regular expression that matches all patterns in the ignore_files variable.


Make sure a required directory exists before usage

Description

Checks if the requested directory exists, and if not creates the directory. In the latter case a warning is raised.

Usage

.provide.directory(name)

Arguments

name

Character vector containing the name of the required directory.

Value

No value is returned; this function is called for its side effects.


Stop silently

Description

Temporarily disable option(show.error.messages) and stop execution.

Usage

.quietstop()

Value

No value is returned; this function is called for its side effects.


Read metadata for a variable in the cache

Description

Read metadata for a variable in the cache

Usage

.read.cache.info(variable)

Arguments

variable

Variable name for which to look up the metadata

Details

The returned object is a list with two fields:

Value

list with metadata, see Details for more info.


Remove variables to keep from a list of candidates for removal

Description

Remove variables to keep from a list of candidates for removal

Usage

.remove.sticky.vars(names, keep)

Arguments

names

character vector of variable names that are candidate for removal

keep

character vector of variable names that should not be removed

Details

If the sticky_variables option is part of the config variable the config variable itself is added to the list of variables to keep. Also all variables listed in config$sticky_variables in a comma separated list are added to keep.

Value

A character vector containing the variables to remove.


Require internal package

Description

Internal method to require a package that is necessary for the internal functioning of ProjectTemplate. Never attaches the package unless configured to do so in global.dcf (which throws a warning).

Usage

.require.package(package.name)

Arguments

package.name

name of the package to load, as a character vector

Value

No value is returned; this function is called for its side effects.


Return an RStudio project file as character vector

Description

Return an RStudio project file as character vector

Usage

.rstudioprojectfile()

Value

Character vector with the contents of an empty RStudio project file


Raise an error if given path is not a valid project

Description

Function to stop processing if the path is not a Project Template return the project name if it is a Project Template directory.

Usage

.stopifnotproject(additional_message = "", path = getwd())

Arguments

additional_message

Optional message to show if the given path is not a valid project

path

Path to check if it is a valid project

Value

Project name if it is a valid Project.


Raise an error if given path is a valid project

Description

Function to stop processing if the path is a Project Template.

Usage

.stopifproject(additional_message = "", path = getwd())

Arguments

additional_message

Optional message to show if the given path is not a valid project

path

Path to check if it is a valid project

Value

No value is returned; this function is called for its side effects


Unload the project variables keeping the data

Description

Removes the config, logger and project.info variables from memory, leaving all data variables in place.

Usage

.unload.project()

Value

No value is returned; this function is called for its side effects.


Compare sets of variable names

Description

Compare the variables (excluding functions) in the global env with a passed in string of names and return the set difference.

Usage

.var.diff.from(given.var.list = "", env = .TargetEnv)

Arguments

given.var.list

Character vector of variable names

env

Environment in which to compare the sets of variables


Write a variable and its metadata to cache

Description

Write a variable and its metadata to cache

Usage

.write.cache(cache.hash, ...)

Arguments

cache.hash

a data.frame with metadata about the variable, see details for more information.

...

extra parameters passed to save.

Details

cache.hash is a data frame with two columns: variable and hash.
Row name VAR is the name of the variable to save.
Row name CODE is the hash value of the code to compute variable.
Row name DEPENDS.* are the dependent variables that CODE depends on.c
The helper function .create.cache.hash creates a suitable dataframe

Value

No value is returned, this function is called for its side effects.


Add project specific config to the global config

Description

Enables project specific configuration to be added to the global config object. The allowable format is key value pairs which are appended to the end of the config object, which is accessible from the global environment.

Usage

add.config(..., apply.override = FALSE)

Arguments

...

A series of key-value pairs containing the configuration. The key is the name that gets added to the config object. These can be overridden at load time through the ... argument to load.project.

apply.override

A boolean indicating whether overrides should be applied. This can be used to add a setting disregarding arguments to load.project

Details

Once defined, the value can be accessed from any ProjectTemplate script by referencing config$my_project_var.

Examples

library('ProjectTemplate')
## Not run: 
add.config(
    keep_bigdata=TRUE,     # Whether to keep the big data file in memory
    parse=7                # number of fields to parse
)

if (config$keep_bigdata) ...

## End(Not run)

Cache a data set for faster loading.

Description

This function will store a copy of the named data set in the cache directory. This cached copy of the data set will then be given precedence at load time when calling load.project. Cached data sets are stored as .RData or optionally as .qs files.

Usage

cache(variable = NULL, CODE = NULL, depends = NULL, ...)

Arguments

variable

A character string containing the name of the variable to be saved. If the CODE parameter is defined, it is evaluated and saved, otherwise the variable with that name in the global environment is used.

CODE

A sequence of R statements enclosed in {..} which produce the object to be cached. Requires suggested package formatR

depends

A character vector of other global environment objects that the CODE depends upon. Caching will be forced if those objects have changed since last caching

...

Additional arguments passed on to save or optionally to qsave. See project.config for further information.

Details

Usually you will want to cache datasets during munging. This can be the raw data just loaded, or it can be the result of further processing during munge. Either way, it can take a while to cache large variables, so cache will only cache when it needs to. The clear.cache("variable") command can be run to flush individual items from the cache.

Calling cache() with no arguments returns the current status of the cache.

Value

No value is returned; this function is called for its side effects.

See Also

qsave, project.config

Examples

library('ProjectTemplate')
## Not run: create.project('tmp-project')

setwd('tmp-project')

dataset1 <- 1:5
cache('dataset1')

setwd('..')
unlink('tmp-project')
## End(Not run)


Translate a variable name into a file name for caching.

Description

This function will translate a variable name into a form that is suitable as a filename on most OS's.

Usage

cache.name(data.filename)

Arguments

data.filename

The variable name to be translated into a filename.

Value

A translated variable name.

Examples

library('ProjectTemplate')

## Not run: cache.name('example.1')

Cache a project's data sets in binary format.

Description

This function will cache all of the data sets that were loaded by the load.project function in a binary format that is easier to load quickly. This is particularly useful for data sets that you've modified during a slow munging process that does not need to be repeated.

Usage

cache.project()

Value

No value is returned; this function is called for its side effects.

See Also

create.project, load.project, get.project, show.project

Examples

library('ProjectTemplate')
## Not run: load.project()

cache.project()
## End(Not run)

Translate a file name into a valid R variable name.

Description

This function will translate a file name into a name that is a valid variable name in R. Non-alphabetic characters on the boundaries of the file name will be stripped; non-alphabetic characters inside of the file name will be replaced with dots.

Usage

clean.variable.name(variable.name, config = .load.config())

Arguments

variable.name

A character vector containing a variable's proposed name that should be standardized.

config

A list of configuration variables. Defaults to those loaded by load.project

Value

A translated variable name.

Examples

library('ProjectTemplate')

## Not run: clean.variable.name('example_1')

Clear objects from the global environment

Description

This function removes specific (or all by default) named objects from the global environment. If used within a ProjectTemplate project, then any variables defined in the config$sticky_variables will remain.

Usage

clear(..., keep = c(), force = FALSE)

Arguments

...

A sequence of character strings of the objects to be removed from the global environment. If none given, then all items except those in keep will be deleted. This includes items beginning with .

keep

A character vector of variables that should remain in the global environment

force

If TRUE, then variables will be deleted even if specified in keep or config$sticky_variables

Value

The variables kept and removed are reported

Examples

library('ProjectTemplate')
## Not run: 
clear("x", "y", "z")
clear(keep="a")
clear()

## End(Not run)

Clear data sets from the cache

Description

This function remove specific (or all by default) named data sets from the cache directory. This will force that data to be read in from the data directory next time load.project is called.

Usage

clear.cache(...)

Arguments

...

A sequence of character strings of the variables to be removed from the cache. If none given, then all items in the cache will be removed.

Value

Success or failure is reported

Examples

library('ProjectTemplate')
## Not run: 
clear.cache("x", "y", "z")

## End(Not run)

Create a new project.

Description

This function will create all of the scaffolding for a new project. It will set up all of the relevant directories and their initial contents. For those who only want the minimal functionality, the template argument can be set to minimal to create a subset of ProjectTemplate's default directories. For those who want to dump all of ProjectTemplate's functionality into a directory for extensive customization, the dump argument can be set to TRUE.

Usage

create.project(
  project.name = "new-project",
  template = "full",
  dump = FALSE,
  merge.strategy = c("require.empty", "allow.non.conflict"),
  rstudio.project = FALSE
)

Arguments

project.name

A character vector containing the name for this new project. Must be a valid directory name for your file system.

template

A character vector containing the name of the template to use for this project. By default a full and minimal template are provided, but custom templates can be created using create.template.

dump

A boolean value indicating whether the entire functionality of ProjectTemplate should be written out to flat files in the current project.

merge.strategy

What should happen if the target directory exists and is not empty? If "force.empty", the target directory must be empty; if "allow.non.conflict", the method succeeds if no files or directories with the same name exist in the target directory.

rstudio.project

A boolean value indicating whether the project should also be an 'RStudio Project'. Defaults to FALSE. If TRUE, then a 'projectname.Rproj' with usable defaults is added to the ProjectTemplate directory.

Details

If the target directory does not exist, it is created. Otherwise, it can only contain files and directories allowed by the merge strategy.

Value

No value is returned; this function is called for its side effects.

See Also

load.project, get.project, cache.project, show.project

Examples

library('ProjectTemplate')

## Not run: create.project('MyProject')

Create a new template

Description

This function writes a skeleton directory structure for creating your own custom templates.

Usage

create.template(target, source = "minimal")

Arguments

target

Name of the new template. It is created under the directory specified by options('ProjectTemplate.templatedir'), or, when missing, in the current directory.

source

Name of an existing template to copy, defaults to the built in 'minimal' template.


Show information about the current project.

Description

This function will return all of the information that ProjectTemplate has about the current project. This information is gathered when load.project is called. At present, ProjectTemplate keeps a record of the project's configuration settings, all packages that were loaded automatically and all of the data sets that were loaded automatically. The information about autoloaded data sets is used by the cache.project function.

Usage

get.project()

Details

In previous releases this information has been available through the global variable project.info. Using this variable is now deprecated and will result in a warning.

Value

A named list.

See Also

create.project, load.project, cache.project, show.project

Examples

library('ProjectTemplate')

## Not run: load.project()

get.project()
## End(Not run)

Listing the data for the current project

Description

This function produces a data.frame of all data files in the project, with meta data on if and how the file will be loaded by load.project.

Usage

list.data(...)

Arguments

...

Named arguments to override configuration from config/global.dcf and lib/global.R.

Details

The returned data.frame contains the following variables, with one observation per file in data/:

filename Character variable containing the filename relative to data/ directory.
varname Character variable containing the name of the variable into which the file will be imported. *
is_ignored Logical variable that indicates whether the file. is ignored through the data_ignore option in the configuration
is_directory Logical variable that indicates whether the file is a directory.
is_cached Logical variable that indicates whether the file is already available in the cache/ directory.
cached_only Logical variable that indicates whether the variable is only available in the cache/ directory. This occurs when calling the cache function with a code fragment in a munge script.
reader Character variable containing the name of the reader function that will be used to load the data. Contains a character(0) if no suitable reader was found.

* Note that some readers return more than one variable, usually with the listed variable name as prefix. This is true for for example the xls.reader and xlsx.reader.

Value

A data.frame listing the available data, with relevant meta data

See Also

load.project, show.project, project.config

Examples

library('ProjectTemplate')

## Not run: list.data()

Automatically load data and packages for a project.

Description

This function automatically load all of the data and packages used by the project from which it is called. The behavior can be controlled by adjusting the project.config configuration.

Usage

load.project(...)

Arguments

...

Named arguments to override configuration from config/global.dcf and lib/global.R.

Details

... can take an argument override.config or a single named list for backward compatibility. This cannot be mixed with the new style override. When a named argument override.config is present it takes precedence over the other options. If any of the provided arguments is unnamed an error is raised.

Value

No value is returned; this function is called for its side effects.

See Also

create.project, get.project, cache.project, show.project, project.config

Examples

library('ProjectTemplate')

## Not run: load.project()

Load Project

Description

Call this function as an addin to load the library and run 'load.project()'

Usage

loadproject_addin()

Migrates a project from a previous version of ProjectTemplate

Description

This function automatically performs all necessary steps to migrate an existing project so that it is compatible with this version of ProjectTemplate

Usage

migrate.project()

Value

No value is returned; this function is called for its side effects.

See Also

create.project

Examples

library('ProjectTemplate')

## Not run: migrate.project()

Migrate a template to a new version of ProjectTemplate

Description

This function updates a skeleton project to the current version of ProjectTemplate.

Usage

migrate.template(template)

Arguments

template

Name of the template to upgrade.


Automatically read data into memory

Description

The preinstalled readers are automatically loaded in the list preinstalled.readers. The reader functions will load a data set stored in the data directory into the specified global variable binding. These functions are not meant to be called directly.

Usage

preinstalled.readers

arff.reader(data.file, filename, variable.name)

csv.reader(data.file, filename, variable.name)

csv2.reader(data.file, filename, variable.name)

db.reader(data.file, filename, variable.name)

dbf.reader(data.file, filename, variable.name)

epiinfo.reader(data.file, filename, variable.name)

feather.reader(data.file, filename, variable.name)

file.reader(data.file, filename, variable.name)

mp3.reader(data.file, filename, variable.name)

mtp.reader(data.file, filename, variable.name)

octave.reader(data.file, filename, variable.name)

ppm.reader(data.file, filename, variable.name)

r.reader(data.file, filename, variable.name)

rdata.reader(data.file, filename, variable.name)

rds.reader(data.file, filename, variable.name)

spss.reader(data.file, filename, variable.name)

sql.reader(data.file, filename, variable.name)

stata.reader(data.file, filename, variable.name)

systat.reader(data.file, filename, variable.name)

tsv.reader(data.file, filename, variable.name)

url.reader(data.file, filename, variable.name)

wsv.reader(data.file, filename, variable.name)

xls.reader(data.file, filename, workbook.name)

xlsx.reader(data.file, filename, workbook.name)

xport.reader(data.file, filename, variable.name)

Arguments

data.file

The name of the data file to be read.

filename

The path to the data set to be loaded.

variable.name

The name to be assigned to in the global environment.

Format

An object of class list of length 55.

Details

Some file formats can contain more than one dataset. In this case all datasets are loaded into separate variables in the format <variable.name>.<subset.name>, where the subset.name is determined by the reader automatically.

The sql.reader function will load data from a SQL database based on configuration information found in the specified .sql file. The .sql file must specify a database to be accessed. All tables from the database, one specific tables or one specific query against any set of tables may be executed to generate a data set.

queries can support string interpolation to execute code snippets using mustache syntax (http://mustache.github.io). This is used to create queries that depend on data from other sources. Code delimited is {{...}}

Example: query: SELECT * FROM my_table WHERE id IN ({{ids}}). Here ids is a vector previously loaded into the Global Environment through ProjectTemplate

Examples of the DCF format and settings used in a .sql file are shown below:

Example 1 type: mysql user: sample_user password: sample_password host: localhost dbname: sample_database table: sample_table

Example 2 type: mysql user: sample_user password: sample_password host: localhost port: 3306 socket: /Applications/MAMP/tmp/mysql/mysql.sock dbname: sample_database table: sample_table

Example 3 type: sqlite dbname: /path/to/sample_database table: sample_table

Example 4 type: sqlite dbname: /path/to/sample_database query: SELECT * FROM users WHERE user_active == 1

Example 5 type: sqlite dbname: /path/to/sample_database table: *

Example 6 type: postgres user: sample_user password: sample_password host: localhost dbname: sample_database table: sample_table

Example 7 type: odbc dsn: sample_dsn user: sample_user password: sample_password dbname: sample_database query: SELECT * FROM sample_table

Example 8 type: oracle user: sample_user password: sample_password dbname: sample_database table: sample_table

Example 9 type: jdbc class: oracle.jdbc.OracleDriver classpath: /path/to/ojdbc5.jar (or set in CLASSPATH) user: scott password: tiger url: jdbc:oracle:thin:@myhost:1521:orcl query: select * from emp

Example 10 type: heroku classpath: /path/to/jdbc4.jar (or set in CLASSPATH) user: scott password: tiger host: heroku.postgres.url port: 1234 dbname: herokudb query: select * from emp

Example 11 In this example RSQLite::initExtension() is automatically called on the established connection.

Liam Healy has written extension-functions.c, which is available on http://www.sqlite.org/contrib. It provides mathematical and string extension functions for SQL queries using the loadable extensions mechanism.

type: sqlite dbname: /path/to/sample_database plugin: extension query: SELECT *,STDEV(value1) FROM example_table

Value

No value is returned; the reader functions are called for its side effects.

Functions

See Also

.add.extension


ProjectTemplate Configuration file

Description

Every ProjectTemplate project has a configuration file found at config/global.dcf that contains various options that can be tweaked to control runtime behavior. The valid options are shown below, and must be encoded using the DCF format.

Usage

project.config()

Details

Calling the project.config() function will display the current project configuration.

The options that can be configured in the config/global.dcf are shown below

data_loading This can be set to TRUE or FALSE. If data_loading is on, the system will load data from both the cache and data directories with cache taking precedence in the case of name conflict.
data_loading_header This can be set to TRUE or FALSE. If data_loading_header is on, the system will load text data files, such as CSV, TSV, or XLSX, treating the first row as header.
data_ignore A comma separated list of files to be ignored when importing from the data/ directory. Regular expressions can be used but should be delimited (on both sides) by /. Note that filenames and filepaths should never begin with a /, entire directories under data/ can be ignored by adding a trailing /.
cache_loading This can be set to TRUE or FALSE. If cache_loading is on, the system will load data from the cache directory before any attempt to load from the data directory.
recursive_loading This can be set to TRUE or FALSE. If recursive_loading is on, the system will load data from the data directory and all its sub directories recursively.
munging This can be set to TRUE or FALSE. If munging is on, the system will execute the files in the munge directory sequentially using the order implied by the sort() function. If munging is FALSE, none of the files in the munge directory will be executed.
logging This can be set to TRUE or FALSE. If logging is on, a logger object using the log4r package is automatically created when you run load.project(). This logger will write to the logs directory.
logging_level The value of logging_level is passed to a logger object using the log4r package during logging when when you run load.project().
load_libraries This can be set to TRUE or FALSE. If load_libraries is on, the system will load all of the R packages listed in the libraries field described below.
libraries This is a comma separated list of all the R packages that the user wants to automatically load when load.project() is called. These packages must already be installed before calling load.project().
as_factors This can be set to TRUE or FALSE. If as_factors is on, the system will convert every character vector into a factor when creating data frames; most importantly, this automatic conversion occurs when reading in data automatically. If FALSE, character vectors will remain character vectors.
tables_type This is the format for default tables. Values can be 'tibble' (default), 'data_table', or 'data_frame'
attach_internal_libraries This can be set to TRUE or FALSE. If attach_internal_libraries is on, then every time a new package is loaded into memory during load.project() a warning will be displayed informing that has happened.
cache_loaded_data This can be set to TRUE or FALSE. If cache_loaded_data is on, then data loaded from the data directory during load.project() will be automatically cached (so it won't need to be reloaded next time load.project() is called).
sticky_variables This is a comma separated list of any project-specific variables that should remain in the global environment after a clear() command. This can be used to clear the global environment, but keep any large datasets in place so they are not unnecessarily re-generated during load.project(). Note that any this will be over-ridden if the force=TRUE parameter is passed to clear()`.
underscore_variables This can be set to TRUE to use underscores ('_') in variable names or FALSE to replace underscores ('_') with dots ('.'). The default is TRUE. When migrating old projects, underscore_variables is set to FALSE.
cache_file_format The default file format for cached data is 'RData'. This can be set to 'qs' in order to benefit from the quick serialization of R objects provided by qs.

If the config/globals.dcf is missing some items (for example because it was created under an old version of ProjectTemplate, then the following configuration is used for any missing items during load.project():

data_loading TRUE
data_loading_header TRUE
data_ignore
cache_loading TRUE
recursive_loading FALSE
munging TRUE
logging FALSE
logging_level INFO
load_libraries FALSE
libraries reshape2, plyr, tidyverse, stringr, lubridate
as_factors FALSE
tables_type tibble
attach_internal_libraries TRUE
cache_loaded_data FALSE
sticky_variables NONE
underscore_variables FALSE
cache_file_format RData

When a new project is created using create.project(), the following values are pre-populated:

version 0.11.0
data_loading TRUE
data_loading_header TRUE
data_ignore
cache_loading TRUE
recursive_loading FALSE
munging TRUE
logging FALSE
logging_level INFO
load_libraries FALSE
libraries reshape2, plyr, tidyverse, stringr, lubridate
as_factors FALSE
tables_type tibble
attach_internal_libraries FALSE
cache_loaded_data TRUE
sticky_variables NONE
underscore_variables TRUE
cache_file_format RData

Value

The current project configuration is displayed.

See Also

load.project


Reload or reset a project

Description

This function will clear the global environment and reload a project. This is useful when you've updated your data sets or changed your preprocessing scripts. Any sticky_variables configuration parameter in project.config will remain both in memory and (if present) in the cache by default. If the reset parameter is TRUE, then all variables are cleared from both the global environment and the cache.

Usage

reload.project(..., reset = FALSE)

Arguments

...

Optional parameters passed to load.project

reset

A boolean value, which if set TRUE clears the cache and everything in the global environment, including any sticky_variables

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: load.project()

reload.project()
## End(Not run)

Reload Project

Description

Call this function as an addin to load the library and run 'reload.project()'

Usage

reloadproject_addin()

Require a package for use in the project

Description

This functions will require the given package. If the package is not installed it will stop execution and print a message to the user instructing them which package to install and which function caused the error.

Usage

require.package(package.name, attach = TRUE)

Arguments

package.name

A character vector containing the package name. Must be a valid package name installed on the system.

attach

Should the package be attached to the search path (as with library) or not (as with loadNamespace)? Defaults to TRUE. (Internal code will use FALSE by default unless a compatibility switch is set, see below.)

Details

The function .require.package is called by internal code. It will attach the package to the search path (with a warning) only if the compatibility configuration attach_internal_libraries is set to TRUE. Normally, packages used for loading data are not needed on the search path, but not loading them might break existing code. In a forthcoming version this compatibility setting will be removed, and no packages will be attached to the search path by internal code.

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: require.package('PackageName')

Run all of the analyses in the src directory.

Description

This function will run each of the analyses in the src directory in separate processes. At present, this is done serially, but future versions of this function will provide a means of running the analyses in parallel.

Usage

run.project()

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: run.project()

Show information about the current project.

Description

This function will show the user all of the information that ProjectTemplate has about the current project. This information is gathered when load.project is called. At present, ProjectTemplate keeps a record of the project's configuration settings, all packages that were loaded automatically and all of the data sets that were loaded automatically. The information about autoloaded data sets is used by the cache.project function.

Usage

show.project()

Value

No value is returned; this function is called for its side effects.

See Also

create.project, load.project, get.project, cache.project

Examples

library('ProjectTemplate')

## Not run: load.project()

show.project()
## End(Not run)

Generate unit tests for your helper functions.

Description

This function will parse all of the functions defined in files inside of the lib directory and will generate a trivial unit test for each function. The resulting tests are stored in the file tests/autogenerated.R. Every test is excepted to fail by default, so you should edit them before calling test.project.

Usage

stub.tests()

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: stub.tests()

Run all unit tests for this project.

Description

This function will run all of the testthat style unit tests for the current project that are defined inside of the tests directory. The tests will be run in the order defined by the filenames for the tests: it is recommend that each test begin with a number specifying its position in the sequence.

Usage

test.project()

Value

No value is returned; this function is called for its side effects.

Examples

library('ProjectTemplate')

## Not run: load.project()

test.project()
## End(Not run)

Read a DCF file into an R list.

Description

This function will read a DCF file and translate the resulting data frame into a list. The DCF format is used throughout ProjectTemplate for configuration settings and ad hoc file format specifications.

Usage

translate.dcf(filename)

Arguments

filename

A character vector specifying the DCF file to be translated.

Details

The content of the DCF file are stored as character strings. If the content is placed between the back tick character , then the content is evaluated as R code and the result returned in a string

Value

Returns a list containing the entries from the DCF file.

Examples

library('ProjectTemplate')

## Not run: translate.dcf(file.path('config', 'global.dcf'))

mirror server hosted at Truenetwork, Russian Federation.