Type: | Package |
Title: | Salary Analysis by the Swiss Federal Office for Gender Equality |
Version: | 0.2.0 |
Description: | Implementation of the Swiss Confederation's standard analysis model for salary analyses https://www.ebg.admin.ch/en/equal-pay-analysis-with-logib in R. The analysis is run at company-level and the model is intended for medium-sized and large companies. It can technically be used with 50 or more employees (apprentices, trainees/interns and expats are not included in the analysis). Employees with at least 100 employees are required by the Gender Equality Act to conduct an equal pay analysis. This package allows users to run the equal salary analysis in R, providing additional transparency with respect to the methodology and simple automation possibilities. |
License: | GPL (≥ 3) |
Depends: | R (≥ 3.1) |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | lubridate, readxl, stats, utils |
Suggests: | testthat |
URL: | https://github.com/admin-ebg/logib |
BugReports: | https://github.com/admin-ebg/logib/issues |
NeedsCompilation: | no |
Packaged: | 2024-12-19 17:03:56 UTC; marcc |
Author: | Marc Stöckli [aut, cre], Jonathan Chassot [aut], Jeremy Kolly [ctb], Federal Office for Gender Equality of Switzerland [cph, fnd] |
Maintainer: | Marc Stöckli <marc.stoeckli@ebg.admin.ch> |
Repository: | CRAN |
Date/Publication: | 2024-12-20 09:10:02 UTC |
Column names
Description
List of column names used in the code, from the datalist and exportfiles in all four languages (de, fr, it, en)
Usage
all_column_names
Format
An object of class list
of length 3.
Run a Salary Analysis
Description
Runs a salary analysis according to the Swiss standard analysis model
Usage
analysis(
data,
reference_month,
reference_year,
usual_weekly_hours = NULL,
female_spec = "F",
male_spec = "M",
age_spec = NULL,
entry_date_spec = NULL
)
Arguments
data |
a data.frame of employees as produced by |
reference_month |
an integer representing the reference month, i.e. the month for which we analyze the salaries |
reference_year |
an integer representing the reference year, i.e. the year for which we analyze the salaries |
usual_weekly_hours |
an optional numeric representing the usual weekly
working hours (missing values in |
female_spec |
an optional string or numeric representing the way women
are encoded in the |
male_spec |
an optional string or numeric representing the way men are
encoded in the |
age_spec |
an optional string to specify the way |
entry_date_spec |
an optional string to specify the way
|
Value
object of type analysis_model
with the following
elements
params: |
The set of original parameters passed to the function |
data_original: |
The original data passed by the user in the
|
data_clean: |
The cleaned up data which was used for the analysis |
data_errors: |
The list of errors which were found upon checking the data |
results: |
The result of the standard analysis model |
Examples
results <- analysis(data = datalist_example, reference_month = 1,
reference_year = 2019, usual_weekly_hours = 40, female_spec = "F",
male_spec = "M", age_spec = "age")
Build column name mappings
Description
build_custom_mapping
creates a vector of column name mappings for the
user to read her or his custom dataframe
Usage
build_custom_mapping(data, language = "de", prompt_mapping = TRUE)
Arguments
data |
the custom dataframe for which the user wants to build a custom mapping |
language |
a character string representing the language in which the
columns will be displayed during the mapping prompt ( |
prompt_mapping |
a boolean indicating whether the function prompts the user for the exact mapping of his dataframe or whether the columns are mapped automatically by order |
Details
Builds a mapping from the custom column names of a given data.frame to the
variable names used in the standard analysis model. If prompt_mapping
is set to TRUE
, the function prompts the mapping for each column
of the data.frame. If prompt_mapping
is set to FALSE
, the
mapping is built using the order of the columns of the given data.frame.
Value
A named vector of characters, where the name indicates the column name in the original data.frame and the value indicates the column name as used by the standard analysis model.
Builds a dataframe of errors
Description
build_errors
builds a dataframe of errors as used by the function
check_data
.
Usage
build_errors(rows, pers_id, vals, column, description, importance)
Arguments
rows |
a vector of numbers representing the rows which contain an error |
pers_id |
a vector of strings of the personal ID which contain an error |
vals |
a vector of the erroneous values |
column |
the name of the column containing the error |
description |
the description of the error occurring |
importance |
the importance of the error occurring |
Value
a dataframe of errors with the columns column
,
description
, importance
Check a dataframe
Description
check_data
checks a dataframe (as produced by
read_data
).
Usage
check_data(data)
Arguments
data |
data.frame to be checked |
Details
This function checks a dataframe (as produced by read_data
for correctness and consistency)
Value
a data.frame with information concerning each incorrect data point
in the data
data.frame
Compute age values
Description
Computes the age given a birth year or a birth date
Usage
compute_age(x, age_spec = NULL, reference_year = NULL)
Arguments
x |
a string or number vector to be transformed |
age_spec |
a string indicating the age specification, can be one of
|
reference_year |
a number indicating the reference year in order to
compute the age from a birth year or birth date. If |
Value
a numeric vector of ages
Compute years_of_service value
Description
Computes the years of service given an entry date or entry year
Usage
compute_years_of_service(
x,
entry_date_spec = NULL,
reference_year = NULL,
reference_month = NULL
)
Arguments
x |
a string or number vector to be transformed |
entry_date_spec |
a string indicating the entry_date specification, can
be one of |
reference_year |
a number indicating the reference year in order to
compute the years of service from an entry date. If |
reference_month |
a number indicating the reference month in order to
compute the years of service from an entry date. If |
Value
a numeric vector of years of service
Example datalist
Description
Fictional dataset containing the necessary information to run an equal pay analysis.
Usage
datalist_example
Format
A data frame with 318 rows and 23 variables:
- personal_number
personal number of the employee, alphanumeric
,
- age
age, in years
,
- sex
sex, 1 = male, 2 = female
,
- years_of_service
years of service, in years
,
- training
training code, 1-8
,
- professional_function
function / job
,
- level_of_requirements
level of requirements code, 1-4
,
- professional_position
professional position / hierarchy code, 1-5
,
- activity_rate
activity rate, in percent
,
- paid_hours
paid hours, in hours
,
- basic_wage
basic wage, in CHF
,
- allowances
allowances, in CHF
,
- monthly_wage_13
13th monthly wage, in CHF
,
- special_payments
special payments, in CHF
,
- weekly_hours
weekly contractual hours, in hours
,
- annual_hours
annual contractual hours, in hours
,
- population
analysis population code, 1-5
,
- comments
comments for the employee
,
- supplement1
additional remarks (1 of 5)
,
- supplement2
additional remarks (2 of 5)
,
- supplement3
additional remarks (3 of 5)
,
- supplement4
additional remarks (4 of 5)
,
- supplement5
additional remarks (5 of 5)
Download official Excel datalists
Description
Downloads an empty version of the latest official Excel datalist in the
specified language to the given path
.
Usage
download_datalist(file, language = "de")
Arguments
file |
a character string representing the file path to which the downloaded datalist will be saved. |
language |
a character string representing the language of the datalist
to be download ( |
Value
None
Download official filled-in sample Excel datalists
Description
Downloads a filled-in version of the latest official Excel datalist in the
specified language to the given path
.
Usage
download_example_datalist(file, language = "de")
Arguments
file |
a character string representing the file path to which the downloaded datalist will be saved. |
language |
a character string representing the language of the datalist
to be download ( |
Value
None
Check if the interval between two dates contains February 29
Description
Check if the interval between two dates contains February 29
Usage
feb29_between(date1, date2)
Arguments
date1 |
the first date by chronological order |
date2 |
the second date by chronological order |
Value
a boolean
Kennedy Estimator
Description
Computes the consistent and almost unbiased estimator for dummy variables in semi-logarithmic regressions proposed by Kennedy, P.E. (1981). Estimation with correctly interpreted dummy variables in semi-logarithmic equations. American Economic Review, 71, 801.
Usage
get_kennedy_estimator(coefficient, variance)
Arguments
coefficient |
numeric value of the estimated coefficient for a dummy variable in a semi-logarithmic regression |
variance |
numeric value of the variance of this estimated coefficient |
Details
Given a semi-logarithmic regression with a dummy variable and its estimated
coefficient c
with a variance v
, the consistent and almost
unbiased estimator proposed by Kennedy is computed as
k = exp(c) / exp(v / 2) - 1
Value
a numeric value representing the so-called "Kennedy estimator"
Prepares a dataframe for the analysis
Description
Prepares a dataframe for the analysis in three steps:
Checks whether
sex
,age
, andentry_date
have the correct format and whether their specifications are plausibleBuild the dataframe used for the analysis
Check each row of the dataframe for correctness and plausibility
Usage
prepare_data(
data,
reference_month,
reference_year,
usual_weekly_hours,
female_spec = "F",
male_spec = "M",
age_spec = NULL,
entry_date_spec = NULL
)
Arguments
data |
a dataframe object as produced by |
reference_month |
a number indicating the reference month of the analysis |
reference_year |
a number indicating the reference year of the analysis |
usual_weekly_hours |
an optional numeric representing the usual weekly working hours |
female_spec |
a string or number indicating the way females are specified in the dataset |
male_spec |
a string or number indicating the way males are specified in the dataset |
age_spec |
a string indicating the age specification, can be one of
|
entry_date_spec |
a string indicating the entry_date specification, can
be one of |
Value
a data.frame which has no incorrect rows left and can be used to estimate the standard analysis model
Create the dataframe object used for the standard analysis model
Description
Reads either a custom dataframe object or an official Excel file (datalist or data export) and transforms it to a dataframe object which can be used for the standard analysis model
Usage
read_data(
data_path = NULL,
custom_data = NULL,
prompt_mapping = TRUE,
language = "de"
)
Arguments
data_path |
a string indicating the path for an official Excel file,
if this parameter is set to |
custom_data |
a dataframe which was imported by the user beforehand,
if this parameter is set to |
prompt_mapping |
a boolean indicating whether the function prompts the
user for the exact mapping of his dataframe or whether the columns are
mapped automatically by order. This parameter is only relevant when
|
language |
a character string representing the language in which the
columns will be displayed during the mapping prompt ( |
Details
Exactly one of data_path
or custom_data
must be NULL
.
Value
a dataframe which can be used to compute the standard analysis model
Read official datalist or data_export Excel file
Description
Reads an official datalist or data_export file into a dataframe object.
Usage
read_official_excel(path)
Arguments
path |
a character string indicating the path of the Excel file to be read |
Value
a dataframe with the contents of the datalist or data_export
Standard Analysis Model
Description
Estimates the Swiss Confederation standard analysis model (a linear regression) for salary equality between women and men.
Usage
run_standard_analysis_model(data, sex_neutral = FALSE)
Arguments
data |
data.frame as produced by |
sex_neutral |
boolean indicating whether the linear regression is to be run using the sex_neutral model or the standard one. |
Details
The standard analysis model's formula is the following:
log(standardized_salary) ~ years_of_training + years_of_service +
years_of_earning + years_of_earning^2 + level_of_requirements + professional_position +
sex
The sex_neutral
parameter can be used to run the sex neutral model,
i.e. a linear regression without the sex coefficient.
Value
Summary of the Salary Analysis
Description
Summary of an estimated salary analysis object of class
analysis_model
Usage
## S3 method for class 'analysis_model'
summary(object, ...)
Arguments
object |
estimated salary analysis object of class
|
... |
further arguments passed to or from other methods |
Details
summary.analysis_model
provides a short summary of the wage
analysis according to the Standard Analysis Model. The summary describes the
number of records used for the analysis, the Kennedy estimate of the wage
difference under otherwise equal circumstances and the summary of the linear
regression.
Value
Nothing
Examples
# Estimate standard analysis model
results <- analysis(data = datalist_example, reference_month = 1,
reference_year = 2019, usual_weekly_hours = 40, female_spec = "F",
male_spec = "M", age_spec = "age")
# Show summary of the salary analysis
summary(results)
Transform a data.frame according to the requirements of the model
Description
Transforms specific columns of a data.frame to match the requirements of the standard analysis model.
Usage
transform_data(
data,
reference_year,
usual_weekly_hours,
female_spec = "F",
male_spec = "M",
age_spec = NULL,
entry_date_spec = NULL
)
Arguments
data |
a dataframe object as produced by |
reference_year |
a number indicating the reference year of the analysis |
usual_weekly_hours |
an optional numeric representing the usual weekly working hours |
female_spec |
a string or number indicating the way females are specified in the dataset. |
male_spec |
a string or number indicating the way males are specified in the dataset |
age_spec |
a string indicating the age specification, can be one of
|
entry_date_spec |
a string indicating the entry_date specification, can
be one of |
Check if the interval between two dates is less than a year
Description
Check if the interval between two dates is less than a year
Usage
within_a_year(date1, date2)
Arguments
date1 |
the first date by chronological order |
date2 |
the second date by chronological order |
Value
a boolean
Time difference between two dates in fractional year terms
Description
Computes the time difference between date1
and date2
in
fractional year terms. This is equivalent to the YEARFRAC() method from
Excel, with the parameter "Actual/Actual"
Usage
yearfrac(date1, date2)
Arguments
date1 |
the first date |
date2 |
the second date |
Value
fractional years between date1
and date2