Type: | Package |
Title: | Legislative Speeches |
Version: | 0.1.5 |
Description: | Converts the floor speeches of Uruguayan legislators, extracted from the parliamentary minutes, to tidy data.frame where each observation is the intervention of a single legislator. |
License: | GPL-3 |
Encoding: | UTF-8 |
Depends: | R (≥ 3.6.0) |
URL: | https://github.com/Nicolas-Schmidt/speech |
Imports: | dplyr, lubridate, magrittr, purrr, stringr, tibble, tm, tidyr, pdftools, rvest |
RoxygenNote: | 7.1.2 |
NeedsCompilation: | no |
Packaged: | 2022-10-03 17:12:15 UTC; Nicolás Schmidt |
Author: | Nicolas Schmidt |
Maintainer: | Nicolas Schmidt <nschmidt@cienciassociales.edu.uy> |
Repository: | CRAN |
Date/Publication: | 2022-10-03 17:30:02 UTC |
Transform speeches in pdf to data.frame
Description
It allows to extract the individual speeches of each legislator in a document and obtain a data.frame.
Usage
speech_build(
file,
add.error.sir = NULL,
rm.error.leg = NULL,
compiler = FALSE,
quality = FALSE,
param = list(char = 6500, drop.page = 2)
)
Arguments
file |
list or character vector specifying the path or URL to a PDF file. It can be one or more files. |
add.error.sir |
character vector. It allows to specify different ways in which
the term that orders the speeches could be miswritten: sir. By default it is |
rm.error.leg |
character vector. It allows to add legislator's names
to be eliminated. By default it is |
compiler |
logical. When the checking of the process of conversion from pdf to data frame
is completed, it is necessary to compile the data frame. To compile implies to unite all the
speeches of each of the legislators for each document. As it is an operation
that must be carried out after making corrections, it is necessary to opt for it.
By default it is |
quality |
logical. If
|
param |
list of length 2 with magnitudes for arguments "character for page" and "drop page non evaluate" respectively. The default values are the median characters of 8500 documents that make up the speech datasets. |
Details
This function converts PDF documents to data.frame. The conversion is
made by seeking interventions of legislators from the word "SENOR". As the
quality of PDF files is not always the best it is recommended to verify that
no legislator is omitted in the data.frame construction process. To make
corrections of the word "SENOR" is that the argument add.error.sir
should be used. The function has a long list of different ways in which
the word "SENOR" may be written in a document, but not all possible future
problems are covered. When the PDF document is a scan that was treated with
an OCR, it should be checked with greater caution to ensure that the operation
was performed correctly.
Value
data.frame class puy
with the following variables:
legislator
: name of the legislatorsspeech
: speeches by legislatorsdate
: session dateid
: namefile
legislature
: legislature id (period of government)sex
: sexchamber
: chamber to which the document belongs. It can be: Chamber of Representatives, Senate, General Assembly or Permanent Commission.
If quality is TRUE, the following are added:
index_1
: index_1index_2
: index_2
Examples
# url <- speech::speech_url(chamber = "C", from = "17-09-2019", to = "17-09-2019")
# out <- speech_build(file = url)
# out <- speech_build(file = url, compiler = FALSE,
# quality = TRUE,
# add.error.sir = c("SEf'IOR"),
# rm.error.leg = c("PRtSIDENTE", "SUB", "PRfSlENTE"),
# param = list(char = 6000, drop.page = 3))
# out <- list.files(pattern = "*.pdf") %>% speech_build()
# out <- list.files(pattern = "*.pdf") %>%
# speech_build(., compiler = TRUE, param = list(char = 4500, drop.page = 3))
Check the names of legislators
Description
It allows to check that the names of the legislators are
correctly written before compiling the documents in speech_build
.
Usage
speech_check(tidy_speech, initial, expand = FALSE)
Arguments
tidy_speech |
data.frame. |
initial |
character vector. Initial of the legislators' names. If no initial is entered, all will be checked. |
expand |
logical. If |
Value
list with a data.frame for each initial of legislators' names.
Examples
# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url)
# speech_check(out, initial = c("A", "M"), expand = FALSE)
Rename legislators
Description
allows to modify the legislators' name prior to compiling the data.
Usage
speech_legis_replace(tidy_speech, old, new, id = NULL)
Arguments
tidy_speech |
data.frame class |
old |
old legislator's name. |
new |
new legislator's name. |
id |
id 'floor speech'. |
Value
data.frame.
Examples
# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url)
# speech_check(out, "G")
# out <- speech_legis_replace(out, old = "GOI", new = "GONI")
Speech recompiler
Description
It allows to recompile the datasets speech or a data.frame built with
speech_build
to which the variable political party was added.
Usage
speech_recompiler(
tidy_speech,
compiler_by = c("legislator", "legislature", "chamber", "date", "id", "sex")
)
Arguments
tidy_speech |
data.frame. |
compiler_by |
character vector. Variables for which you may want to recompile the data frame. |
Details
The default compilation is that of \ code speech_build (., compiler = TRUE). This function allows to recompile the data by different levels of aggregation: chamber, legislature or other variables.
Value
data.frame.
Examples
# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url)
# out2 <- speech_recompiler(out)
# out2 <- speech_recompiler(out, compiler_by = c("legislator", "legislature", "chamber"))
Detects roll-call
Description
Detects roll-call in floor speeches and converts them to a dataset.
Returns a summary of a rollcall vote object.
Usage
speech_rollcall(file, add.error.sir = NULL, rm.error.leg = NULL)
## S3 method for class 'nominal'
summary(object, ...)
Arguments
file |
list or character vector specifying the path or URL to a PDF file. It can be one or more files. |
add.error.sir |
character vector. It allows to specify different ways in which
the term that orders the speeches could be miswritten: sir. By default it is |
rm.error.leg |
character vector. It allows to add legislator's names
to be eliminated. By default it is |
object |
an object of class |
... |
additional parameter. |
Details
This function detects roll-call votes on floor speeches. It only detects votes where the vote can be affirmative or negative. This leaves out a set of roll-call votes, such as those for the allocation of positions in the chamber.
Value
data.frame with the following variables:
legislator
: Name of the legislatorvote
: Voting, 1 = affirmative, 0 = Negativeargument
:If the legislator justifies the vote, it is worth 1, otherwise 0.speech
: Speechchamber
: Chamberdate
: Datelegislature
: Legislaturerollcall
: Number of roll-call in sessionid
: Idsex
: Sex of legislator
data.frame with the following variables:
Chamber
: ChamberDate
: DateLegislators
: Number of legislators in the votingAffirmative
: Number of affirmative votesNegative
: Number of negative votesprop_AF
: Proportion of affirmative votesprop_NG
: Proportion of negative votesprop_women
: Proportion of women in the votingprop_arg
: Proportion of legislators justifying the voterc
: Number of roll-call in session
Examples
# url <- speech::speech_url(chamber = "D", from = "14-04-2004", to = "14-04-2004")
# out <- speech_rollcall(file = url)
# summary(out)
Speech uncompiler
Description
It allows to undo the compilation of a floor speech.
Usage
speech_uncompiler(tidy_speech)
Arguments
tidy_speech |
data. |
Value
data.frame.
Examples
# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url, compiler = TRUE)
# out2 <- speech_uncompiler(out)
url vectors
Description
Allows to create a vector of url to download within a period within a legislature.
Usage
speech_url(chamber, from, to, days = NULL)
Arguments
chamber |
chamber:
|
from |
character vector. Date in DD-MM-YYYY format |
to |
character vector. Date in DD-MM-YYYY format |
days |
character vector. Date in DD-MM-YYYY format. |
Value
character vector
Examples
# speech_url(chamber = "D",
# from = "15-02-2015",
# to = "15-03-2015")
#
# speech_url(chamber = "D",
# from = "15-02-2015",
# to = "15-02-2015")
#
# speech_url(chamber = "D",
# days = "15-02-2015")
#
# speech_url(chamber = "D",
# days = c("12-06-2002", "14-04-2004"))
#
View control speech
Description
Allows to see the legislators' names with problems prior to compiling the data.
Usage
speech_view(tidy_speech, legis = character(), view = FALSE)
Arguments
tidy_speech |
data.frame class |
legis |
name of the legislator. |
view |
logical. If |
Value
data.frame.
Examples
# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url)
# speech_view(tidy_speech = out, legis = c("ABDALA", "LAZO"), view = FALSE)
Number of words
Description
Word count.
Usage
speech_word_count(
string,
rm.name = FALSE,
exclude = NULL,
min.char = 0L,
rm.long = Inf,
rm.num = FALSE,
replace.punct = ""
)
Arguments
string |
character of length equal to or greater than one. |
rm.name |
by default is |
exclude |
words that are to be excluded from counting. |
min.char |
integer that determines the words that have less than a certain number of characters. |
rm.long |
integer that determines the number of characters from which words have to be deleted from the count. |
rm.num |
logical. Indicates whether the numbers in the count will be eliminated. |
replace.punct |
by default is "". |
Value
integer.
Examples
vec <- "Hello world!"
speech_word_count(vec)
vec2 <- "Hello.world!"
speech_word_count(vec2)
speech_word_count(vec2, replace.punct = " ")
vec3 <- "Hello.world!, HelloHelloHelloHelloHelloHello"
speech_word_count(vec3, replace.punct = " ", rm.long = 20)
speech_word_count("R version", min.char = 1)
r <- "R version 3.5.2 (2018-12-20) -- 'Eggshell Igloo'"
speech_word_count(r, rm.num = TRUE)
speech_word_count(NA)
# url <- "http://bit.ly/35AUVF4"
# out <- speech_build(file = url, compiler = TRUE)
# out$word <- speech_word_count(out$speech, rm.name = TRUE)
# out$word2 <- speech_word_count(out$speech)