Help for package speech

Type:

Package

Title:

Legislative Speeches

Version:

0.1.5

Description:

Converts the floor speeches of Uruguayan legislators, extracted from the parliamentary minutes, to tidy data.frame where each observation is the intervention of a single legislator.

License:

GPL-3

Encoding:

UTF-8

Depends:

R (≥ 3.6.0)

URL:

https://github.com/Nicolas-Schmidt/speech

Imports:

dplyr, lubridate, magrittr, purrr, stringr, tibble, tm, tidyr, pdftools, rvest

RoxygenNote:

7.1.2

NeedsCompilation:

Packaged:

2022-10-03 17:12:15 UTC; Nicolás Schmidt

Author:

Nicolas Schmidt

[aut, cre], Diego Lujan [aut], Juan Andres Moraes [aut], Elina Gomez [ctb]

Maintainer:

Nicolas Schmidt <nschmidt@cienciassociales.edu.uy>

Repository:

CRAN

Date/Publication:

2022-10-03 17:30:02 UTC

Transform speeches in pdf to data.frame

Description

It allows to extract the individual speeches of each legislator in a document and obtain a data.frame.

Usage

speech_build(
  file,
  add.error.sir = NULL,
  rm.error.leg = NULL,
  compiler = FALSE,
  quality = FALSE,
  param = list(char = 6500, drop.page = 2)
)

Arguments

file

list or character vector specifying the path or URL to a PDF file. It can be one or more files.

add.error.sir

character vector. It allows to specify different ways in which the term that orders the speeches could be miswritten: sir. By default it is NULL.

rm.error.leg

character vector. It allows to add legislator's names to be eliminated. By default it is NULL. By default, "PRESIDENTE", "SECRETARIO", "SUBSECRETARIO", and "MINISTRO" are eliminated.

compiler

logical. When the checking of the process of conversion from pdf to data frame is completed, it is necessary to compile the data frame. To compile implies to unite all the speeches of each of the legislators for each document. As it is an operation that must be carried out after making corrections, it is necessary to opt for it. By default it is FALSE.

quality

logical. If TRUE, two quality indicators are added about the process, according to the quality of the document.

index_1: Proportion of the text recovered according to the original document (param = list(char = 6500, drop.page = 2)) that must have the document.
index_2: Proportion of the final text as a function of the recovered text. It is the proportion of the document in which there are only interventions by legislators.

param

list of length 2 with magnitudes for arguments "character for page" and "drop page non evaluate" respectively. The default values are the median characters of 8500 documents that make up the speech datasets.

Details

This function converts PDF documents to data.frame. The conversion is made by seeking interventions of legislators from the word "SENOR". As the quality of PDF files is not always the best it is recommended to verify that no legislator is omitted in the data.frame construction process. To make corrections of the word "SENOR" is that the argument add.error.sir should be used. The function has a long list of different ways in which the word "SENOR" may be written in a document, but not all possible future problems are covered. When the PDF document is a scan that was treated with an OCR, it should be checked with greater caution to ensure that the operation was performed correctly.

Value

data.frame class puy with the following variables:

legislator: name of the legislators
speech: speeches by legislators
date: session date
id: name file
legislature: legislature id (period of government)
sex: sex
chamber: chamber to which the document belongs. It can be: Chamber of Representatives, Senate, General Assembly or Permanent Commission.

If quality is TRUE, the following are added:

index_1: index_1
index_2: index_2

Examples


# url <- speech::speech_url(chamber = "C", from = "17-09-2019", to = "17-09-2019")
# out <- speech_build(file = url)

# out <- speech_build(file = url, compiler = FALSE,
#                     quality = TRUE,
#                     add.error.sir = c("SEf'IOR"),
#                     rm.error.leg = c("PRtSIDENTE", "SUB", "PRfSlENTE"),
#                     param = list(char = 6000, drop.page = 3))

# out <- list.files(pattern = "*.pdf") %>% speech_build()

# out <- list.files(pattern = "*.pdf") %>%
#     speech_build(., compiler = TRUE, param = list(char = 4500, drop.page = 3))

Check the names of legislators

Description

It allows to check that the names of the legislators are correctly written before compiling the documents in speech_build.

Usage

speech_check(tidy_speech, initial, expand = FALSE)

Arguments

tidy_speech

data.frame.

initial

character vector. Initial of the legislators' names. If no initial is entered, all will be checked.

expand

logical. If TRUE, the legislature to which the name of the legislator belongs is shown. By default By default is FALSE.

Value

list with a data.frame for each initial of legislators' names.

Examples


# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url)
# speech_check(out, initial = c("A", "M"), expand = FALSE)

Rename legislators

Description

allows to modify the legislators' name prior to compiling the data.

Usage

speech_legis_replace(tidy_speech, old, new, id = NULL)

Arguments

tidy_speech

data.frame class puy.

old

old legislator's name.

new

new legislator's name.

id

id 'floor speech'.

Value

data.frame.

Examples


# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url)
# speech_check(out, "G")
# out <- speech_legis_replace(out, old = "GOI",  new = "GONI")

Speech recompiler

Description

It allows to recompile the datasets speech or a data.frame built with speech_build to which the variable political party was added.

Usage

speech_recompiler(
  tidy_speech,
  compiler_by = c("legislator", "legislature", "chamber", "date", "id", "sex")
)

Arguments

tidy_speech

data.frame.

compiler_by

character vector. Variables for which you may want to recompile the data frame.

Details

The default compilation is that of \ code speech_build (., compiler = TRUE). This function allows to recompile the data by different levels of aggregation: chamber, legislature or other variables.

Value

data.frame.

Examples


# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url)
# out2 <- speech_recompiler(out)
# out2 <- speech_recompiler(out, compiler_by = c("legislator", "legislature", "chamber"))

Detects roll-call

Description

Detects roll-call in floor speeches and converts them to a dataset.

Returns a summary of a rollcall vote object.

Usage

speech_rollcall(file, add.error.sir = NULL, rm.error.leg = NULL)

## S3 method for class 'nominal'
summary(object, ...)

Arguments

file

list or character vector specifying the path or URL to a PDF file. It can be one or more files.

add.error.sir

character vector. It allows to specify different ways in which the term that orders the speeches could be miswritten: sir. By default it is NULL.

rm.error.leg

character vector. It allows to add legislator's names to be eliminated. By default it is NULL. By default, "PRESIDENTE", "SECRETARIO", "SUBSECRETARIO", and "MINISTRO" are eliminated.

object

an object of class nominal, the output of speech_rollcall.

...

additional parameter.

Details

This function detects roll-call votes on floor speeches. It only detects votes where the vote can be affirmative or negative. This leaves out a set of roll-call votes, such as those for the allocation of positions in the chamber.

Value

data.frame with the following variables:

legislator: Name of the legislator
vote: Voting, 1 = affirmative, 0 = Negative
argument:If the legislator justifies the vote, it is worth 1, otherwise 0.
speech: Speech
chamber: Chamber
date: Date
legislature: Legislature
rollcall: Number of roll-call in session
id: Id
sex: Sex of legislator

data.frame with the following variables:

Chamber: Chamber
Date: Date
Legislators: Number of legislators in the voting
Affirmative: Number of affirmative votes
Negative: Number of negative votes
prop_AF: Proportion of affirmative votes
prop_NG: Proportion of negative votes
prop_women: Proportion of women in the voting
prop_arg: Proportion of legislators justifying the vote
rc: Number of roll-call in session

Examples


# url <- speech::speech_url(chamber = "D", from = "14-04-2004", to = "14-04-2004")
# out <- speech_rollcall(file = url)
# summary(out)

Speech uncompiler

Description

It allows to undo the compilation of a floor speech.

Usage

speech_uncompiler(tidy_speech)

Arguments

tidy_speech

data.

Value

data.frame.

Examples


# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url, compiler = TRUE)
# out2 <- speech_uncompiler(out)

url vectors

Description

Allows to create a vector of url to download within a period within a legislature.

Usage

speech_url(chamber, from, to, days = NULL)

Arguments

chamber

chamber:

S: Camara de Senadores
D: Camara de Representantes (Diputados)
A: Asamblea General
C: Comision Permanente

from

character vector. Date in DD-MM-YYYY format

to

character vector. Date in DD-MM-YYYY format

days

character vector. Date in DD-MM-YYYY format.

Value

character vector

Examples

# speech_url(chamber = "D",
#            from    = "15-02-2015",
#            to      = "15-03-2015")
#
# speech_url(chamber = "D",
#            from    = "15-02-2015",
#            to      = "15-02-2015")
#
# speech_url(chamber = "D",
#            days   = "15-02-2015")
#
# speech_url(chamber = "D",
#            days    = c("12-06-2002", "14-04-2004"))
#

View control speech

Description

Allows to see the legislators' names with problems prior to compiling the data.

Usage

speech_view(tidy_speech, legis = character(), view = FALSE)

Arguments

tidy_speech

data.frame class puy.

legis

name of the legislator.

view

logical. If TRUE View displays datasets containing legislators' interventions (legis). By default is FALSE.

Value

data.frame.

Examples


# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url)
# speech_view(tidy_speech = out, legis = c("ABDALA", "LAZO"), view = FALSE)

Number of words

Description

Word count.

Usage

speech_word_count(
  string,
  rm.name = FALSE,
  exclude = NULL,
  min.char = 0L,
  rm.long = Inf,
  rm.num = FALSE,
  replace.punct = ""
)

Arguments

string

character of length equal to or greater than one.

rm.name

by default is FALSE. Remove word 'SENOR' and name of legislator.

exclude

words that are to be excluded from counting.

min.char

integer that determines the words that have less than a certain number of characters.

rm.long

integer that determines the number of characters from which words have to be deleted from the count.

rm.num

logical. Indicates whether the numbers in the count will be eliminated.

replace.punct

by default is "".

Value

integer.

Examples

vec <- "Hello world!"
speech_word_count(vec)

vec2 <- "Hello.world!"
speech_word_count(vec2)
speech_word_count(vec2, replace.punct = " ")

vec3 <- "Hello.world!, HelloHelloHelloHelloHelloHello"
speech_word_count(vec3, replace.punct = " ", rm.long = 20)

speech_word_count("R version", min.char = 1)

r <- "R version 3.5.2 (2018-12-20) -- 'Eggshell Igloo'"
speech_word_count(r, rm.num = TRUE)

speech_word_count(NA)


# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url, compiler = TRUE)
# out$word <- speech_word_count(out$speech, rm.name = TRUE)
# out$word2 <- speech_word_count(out$speech)