Help for package logib

Type:

Package

Title:

Salary Analysis by the Swiss Federal Office for Gender Equality

Version:

0.2.0

Description:

Implementation of the Swiss Confederation's standard analysis model for salary analyses https://www.ebg.admin.ch/en/equal-pay-analysis-with-logib in R. The analysis is run at company-level and the model is intended for medium-sized and large companies. It can technically be used with 50 or more employees (apprentices, trainees/interns and expats are not included in the analysis). Employees with at least 100 employees are required by the Gender Equality Act to conduct an equal pay analysis. This package allows users to run the equal salary analysis in R, providing additional transparency with respect to the methodology and simple automation possibilities.

License:

GPL (≥ 3)

Depends:

R (≥ 3.1)

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.2

Imports:

lubridate, readxl, stats, utils

Suggests:

testthat

URL:

https://github.com/admin-ebg/logib

BugReports:

https://github.com/admin-ebg/logib/issues

NeedsCompilation:

Packaged:

2024-12-19 17:03:56 UTC; marcc

Author:

Marc Stöckli [aut, cre], Jonathan Chassot [aut], Jeremy Kolly [ctb], Federal Office for Gender Equality of Switzerland [cph, fnd]

Maintainer:

Marc Stöckli <marc.stoeckli@ebg.admin.ch>

Repository:

CRAN

Date/Publication:

2024-12-20 09:10:02 UTC

Column names

Description

List of column names used in the code, from the datalist and exportfiles in all four languages (de, fr, it, en)

Usage

all_column_names

Format

An object of class list of length 3.

Run a Salary Analysis

Description

Runs a salary analysis according to the Swiss standard analysis model

Usage

analysis(
  data,
  reference_month,
  reference_year,
  usual_weekly_hours = NULL,
  female_spec = "F",
  male_spec = "M",
  age_spec = NULL,
  entry_date_spec = NULL
)

Arguments

data

a data.frame of employees as produced by read_data

reference_month

an integer representing the reference month, i.e. the month for which we analyze the salaries

reference_year

an integer representing the reference year, i.e. the year for which we analyze the salaries

usual_weekly_hours

an optional numeric representing the usual weekly working hours (missing values in weekly_hours are replaced by usual_weekly_hours; if NULL, the missing values are not replaced)

female_spec

an optional string or numeric representing the way women are encoded in the data

male_spec

an optional string or numeric representing the way men are encoded in the data

age_spec

an optional string to specify the way age is encoded in the data (NULL will try to automatically infer the age format, "age" implies that the age is specified as the age of a person, "birthyear" implies that the age is specified as the year of birth of a person, and "birthdate" implies that the age is specified as the date of birth of a person)

entry_date_spec

an optional string to specify the way entry_date is encoded in the data (NULL will try to automatically infer the format, "years" implies that the entry_date is specified as the number of years for which the person has been in the company, "entry_year" implies that the entry_date is specified as the year of the entry date of the person, "entry_date" implies that the age is specified as the date of entry of the person)

Value

object of type analysis_model with the following elements

params:

The set of original parameters passed to the function

data_original:

The original data passed by the user in the data parameter

data_clean:

The cleaned up data which was used for the analysis

data_errors:

The list of errors which were found upon checking the data

results:

The result of the standard analysis model

Examples

results <- analysis(data = datalist_example, reference_month = 1,
   reference_year = 2019, usual_weekly_hours = 40, female_spec = "F",
   male_spec = "M", age_spec = "age")

Build column name mappings

Description

build_custom_mapping creates a vector of column name mappings for the user to read her or his custom dataframe

Usage

build_custom_mapping(data, language = "de", prompt_mapping = TRUE)

Arguments

data

the custom dataframe for which the user wants to build a custom mapping

language

a character string representing the language in which the columns will be displayed during the mapping prompt ("de" or "fr" or "it" or "en")

prompt_mapping

a boolean indicating whether the function prompts the user for the exact mapping of his dataframe or whether the columns are mapped automatically by order

Details

Builds a mapping from the custom column names of a given data.frame to the variable names used in the standard analysis model. If prompt_mapping is set to TRUE, the function prompts the mapping for each column of the data.frame. If prompt_mapping is set to FALSE, the mapping is built using the order of the columns of the given data.frame.

Value

A named vector of characters, where the name indicates the column name in the original data.frame and the value indicates the column name as used by the standard analysis model.

Builds a dataframe of errors

Description

build_errors builds a dataframe of errors as used by the function check_data.

Usage

build_errors(rows, pers_id, vals, column, description, importance)

Arguments

rows

a vector of numbers representing the rows which contain an error

pers_id

a vector of strings of the personal ID which contain an error

vals

a vector of the erroneous values

column

the name of the column containing the error

description

the description of the error occurring

importance

the importance of the error occurring

Value

a dataframe of errors with the columns column, description, importance

Check a dataframe

Description

check_data checks a dataframe (as produced by read_data).

Usage

check_data(data)

Arguments

data

data.frame to be checked

Details

This function checks a dataframe (as produced by read_data for correctness and consistency)

Value

a data.frame with information concerning each incorrect data point in the data data.frame

Compute age values

Description

Computes the age given a birth year or a birth date

Usage

compute_age(x, age_spec = NULL, reference_year = NULL)

Arguments

x

a string or number vector to be transformed

age_spec

a string indicating the age specification, can be one of NULL, "age", "birthyear", or "date_of_birth". If this parameter is set to NULL, the function automatically tries to infers the specification

reference_year

a number indicating the reference year in order to compute the age from a birth year or birth date. If age_spec is "age", this parameter can be ignored.

Value

a numeric vector of ages

Compute years_of_service value

Description

Computes the years of service given an entry date or entry year

Usage

compute_years_of_service(
  x,
  entry_date_spec = NULL,
  reference_year = NULL,
  reference_month = NULL
)

Arguments

x

a string or number vector to be transformed

entry_date_spec

a string indicating the entry_date specification, can be one of NULL, "years", "entry_year", or "entry_date". If this parameter is set to NULL, the function automatically tries to infers the specification

reference_year

a number indicating the reference year in order to compute the years of service from an entry date. If entry_date_spec is "years", this parameter can be ignored.

reference_month

a number indicating the reference month in order to compute the years of service from an entry date. If entry_date_spec is "years" or "entry_years", this parameter can be ignored.

Value

a numeric vector of years of service

Example datalist

Description

Fictional dataset containing the necessary information to run an equal pay analysis.

Usage

datalist_example

Format

A data frame with 318 rows and 23 variables:

personal_number: personal number of the employee, alphanumeric

age: age, in years

sex: sex, 1 = male, 2 = female

years_of_service: years of service, in years

training: training code, 1-8

professional_function: function / job

level_of_requirements: level of requirements code, 1-4

professional_position: professional position / hierarchy code, 1-5

activity_rate: activity rate, in percent

paid_hours: paid hours, in hours

basic_wage: basic wage, in CHF

allowances: allowances, in CHF

monthly_wage_13: 13th monthly wage, in CHF

special_payments: special payments, in CHF

weekly_hours: weekly contractual hours, in hours

annual_hours: annual contractual hours, in hours

population: analysis population code, 1-5

comments: comments for the employee

supplement1: additional remarks (1 of 5)

supplement2: additional remarks (2 of 5)

supplement3: additional remarks (3 of 5)

supplement4: additional remarks (4 of 5)

supplement5: additional remarks (5 of 5)

Download official Excel datalists

Description

Downloads an empty version of the latest official Excel datalist in the specified language to the given path.

Usage

download_datalist(file, language = "de")

Arguments

file

a character string representing the file path to which the downloaded datalist will be saved.

language

a character string representing the language of the datalist to be download ("de" or "fr" or "it" or "en").

Value

None

Download official filled-in sample Excel datalists

Description

Downloads a filled-in version of the latest official Excel datalist in the specified language to the given path.

Usage

download_example_datalist(file, language = "de")

Arguments

file

a character string representing the file path to which the downloaded datalist will be saved.

language

a character string representing the language of the datalist to be download ("de" or "fr" or "it" or "en").

Value

None

Check if the interval between two dates contains February 29

Description

Check if the interval between two dates contains February 29

Usage

feb29_between(date1, date2)

Arguments

date1

the first date by chronological order

date2

the second date by chronological order

Value

a boolean

Kennedy Estimator

Description

Computes the consistent and almost unbiased estimator for dummy variables in semi-logarithmic regressions proposed by Kennedy, P.E. (1981). Estimation with correctly interpreted dummy variables in semi-logarithmic equations. American Economic Review, 71, 801.

Usage

get_kennedy_estimator(coefficient, variance)

Arguments

coefficient

numeric value of the estimated coefficient for a dummy variable in a semi-logarithmic regression

variance

numeric value of the variance of this estimated coefficient

Details

Given a semi-logarithmic regression with a dummy variable and its estimated coefficient c with a variance v, the consistent and almost unbiased estimator proposed by Kennedy is computed as k = exp(c) / exp(v / 2) - 1

Value

a numeric value representing the so-called "Kennedy estimator"

Prepares a dataframe for the analysis

Description

Prepares a dataframe for the analysis in three steps:

Checks whether sex, age, and entry_date have the correct format and whether their specifications are plausible
Build the dataframe used for the analysis
Check each row of the dataframe for correctness and plausibility

Usage

prepare_data(
  data,
  reference_month,
  reference_year,
  usual_weekly_hours,
  female_spec = "F",
  male_spec = "M",
  age_spec = NULL,
  entry_date_spec = NULL
)

Arguments

data

a dataframe object as produced by read_data which is to be used in the analysis

reference_month

a number indicating the reference month of the analysis

reference_year

a number indicating the reference year of the analysis

usual_weekly_hours

an optional numeric representing the usual weekly working hours

female_spec

a string or number indicating the way females are specified in the dataset

male_spec

a string or number indicating the way males are specified in the dataset

age_spec

entry_date_spec

Value

a data.frame which has no incorrect rows left and can be used to estimate the standard analysis model

Create the dataframe object used for the standard analysis model

Description

Reads either a custom dataframe object or an official Excel file (datalist or data export) and transforms it to a dataframe object which can be used for the standard analysis model

Usage

read_data(
  data_path = NULL,
  custom_data = NULL,
  prompt_mapping = TRUE,
  language = "de"
)

Arguments

data_path

a string indicating the path for an official Excel file, if this parameter is set to NULL, the function reads the dataframe object provided in the parameter custom_data instead

custom_data

a dataframe which was imported by the user beforehand, if this parameter is set to NULL, the function import the data from the path provided in the parameter data_path instead

prompt_mapping

a boolean indicating whether the function prompts the user for the exact mapping of his dataframe or whether the columns are mapped automatically by order. This parameter is only relevant when custom_data is not set to NULL

language

a character string representing the language in which the columns will be displayed during the mapping prompt ("de" or "fr" or "it" or "en"). This parameter is only relevant when custom_data is not set to NULL

Details

Exactly one of data_path or custom_data must be NULL.

Value

a dataframe which can be used to compute the standard analysis model

Read official datalist or data_export Excel file

Description

Reads an official datalist or data_export file into a dataframe object.

Usage

read_official_excel(path)

Arguments

path

a character string indicating the path of the Excel file to be read

Value

a dataframe with the contents of the datalist or data_export

Standard Analysis Model

Description

Estimates the Swiss Confederation standard analysis model (a linear regression) for salary equality between women and men.

Usage

run_standard_analysis_model(data, sex_neutral = FALSE)

Arguments

data

data.frame as produced by prepare_data

sex_neutral

boolean indicating whether the linear regression is to be run using the sex_neutral model or the standard one.

Details

The standard analysis model's formula is the following:

log(standardized_salary) ~ years_of_training + years_of_service + years_of_earning + years_of_earning^2 + level_of_requirements + professional_position + sex

The sex_neutral parameter can be used to run the sex neutral model, i.e. a linear regression without the sex coefficient.

Value

an object of class "lm"

Summary of the Salary Analysis

Description

Summary of an estimated salary analysis object of class analysis_model

Usage

## S3 method for class 'analysis_model'
summary(object, ...)

Arguments

object

estimated salary analysis object of class analysis_model

...

further arguments passed to or from other methods

Details

summary.analysis_model provides a short summary of the wage analysis according to the Standard Analysis Model. The summary describes the number of records used for the analysis, the Kennedy estimate of the wage difference under otherwise equal circumstances and the summary of the linear regression.

Value

Nothing

Examples

# Estimate standard analysis model
results <- analysis(data = datalist_example, reference_month = 1,
   reference_year = 2019, usual_weekly_hours = 40, female_spec = "F",
   male_spec = "M", age_spec = "age")

# Show summary of the salary analysis
summary(results)

Transform a data.frame according to the requirements of the model

Description

Transforms specific columns of a data.frame to match the requirements of the standard analysis model.

Usage

transform_data(
  data,
  reference_year,
  usual_weekly_hours,
  female_spec = "F",
  male_spec = "M",
  age_spec = NULL,
  entry_date_spec = NULL
)

Arguments

data

a dataframe object as produced by read_data which is to be transformed

reference_year

a number indicating the reference year of the analysis

usual_weekly_hours

an optional numeric representing the usual weekly working hours

female_spec

a string or number indicating the way females are specified in the dataset.

male_spec

a string or number indicating the way males are specified in the dataset

age_spec

entry_date_spec

Check if the interval between two dates is less than a year

Description

Check if the interval between two dates is less than a year

Usage

within_a_year(date1, date2)

Arguments

date1

the first date by chronological order

date2

the second date by chronological order

Value

a boolean

Time difference between two dates in fractional year terms

Description

Computes the time difference between date1 and date2 in fractional year terms. This is equivalent to the YEARFRAC() method from Excel, with the parameter "Actual/Actual"

Usage

yearfrac(date1, date2)

Arguments

date1

the first date

date2

the second date

Value

fractional years between date1 and date2