Title: An R Wrapper for Jagger
Version: 0.0.2
Description: A wrapper for Jagger, a morphological analyzer proposed in Yoshinaga (2023) <doi:10.48550/arXiv.2305.19045>. Jagger uses patterns derived from morphological dictionaries and training data sets and applies them from the beginning of the input. This simultaneous and deterministic process enables it to effectively perform tokenization, POS tagging, and lemmatization.
License: GPL-2
Depends: R (≥ 4.0)
Encoding: UTF-8
SystemRequirements: C++17
RoxygenNote: 7.2.3
URL: https://shusei-e.github.io/RcppJagger/, https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jagger/index.en.html
BugReports: https://github.com/Shusei-E/RcppJagger/issues
LinkingTo: Rcpp
Imports: cli (≥ 3.6.1), purrr (≥ 1.0.0), Rcpp (≥ 1.0.7), rlang (≥ 1.1.0)
Suggests: dplyr (≥ 1.1.0), testthat (≥ 3.1.5), tibble
Config/testthat/edition: 3
LazyData: TRUE
NeedsCompilation: yes
Packaged: 2023-06-08 19:03:16 UTC; shusei
Author: Shusei Eshima ORCID iD [aut, cre], Naoki Yoshinaga [ctb]
Maintainer: Shusei Eshima <shuseieshima@gmail.com>
Repository: CRAN
Date/Publication: 2023-06-08 22:22:56 UTC

An R wrapper for Jagger's lemmatizer

Description

An R wrapper for Jagger's lemmatizer

Usage

lemmatize(input, model_path = NULL, keep = NULL, concat = TRUE)

Arguments

input

an input.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

concat

logical. If TRUE, the function returns a concatenated string. Default is TRUE.

Value

a vector (if concat = TRUE) or a list (if concat = FALSE).

Examples

 data(sentence_example)
 res_lemmatize <- lemmatize(sentence_example$text)

Lemmatize (a vector input)

Description

Lemmatize (a vector input)

Usage

lemmatize_cpp_vec(inputs, model_path, keep_vec, keep_all)

An R wrapper for Jagger's lemmatizer (a tibble input)

Description

An R wrapper for Jagger's lemmatizer (a tibble input)

Usage

lemmatize_tbl(tbl, column, model_path = NULL, keep = NULL)

Arguments

tbl

a tibble object.

column

a column name of the tibble to tokenize.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

Value

a tibble.

Examples

 data(sentence_example)
 res_lemmatize <- lemmatize_tbl(tibble::as_tibble(sentence_example), "text")

An R wrapper for Jagger's POS tagger

Description

An R wrapper for Jagger's POS tagger

Usage

pos(input, model_path = NULL, keep = NULL, format = c("list", "data.frame"))

Arguments

input

an input.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

format

a format of the output. Default is list.

Value

a list object.

Examples

 data(sentence_example)
 res_pos <- pos(sentence_example$text)

POS tagging in C++

Description

POS tagging in C++

Usage

pos_cpp_vec(inputs, model_path, keep_vec, keep_all)

An R wrapper for Jagger's POS tagger (only returning POS)

Description

An R wrapper for Jagger's POS tagger (only returning POS)

Usage

pos_simple(
  input,
  model_path = NULL,
  keep = NULL,
  format = c("list", "data.frame")
)

Arguments

input

an input.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

format

a format of the output. Default is list.

Value

a list object.

Examples

 data(sentence_example)
 res_pos <- pos_simple(sentence_example$text)

POS tagging in C++ (only token and pos)

Description

POS tagging in C++ (only token and pos)

Usage

pos_simple_cpp_vec(inputs, model_path, keep_vec, keep_all)

An example sentence

Description

An example sentence

Usage

sentence_example

Format

A data.frame with a single row and a single column:

text

a sentence in Japanese

Source

Aozora Bunko: https://www.aozora.gr.jp/


An R wrapper for Jagger's tokenizer

Description

An R wrapper for Jagger's tokenizer

Usage

tokenize(input, model_path = NULL, keep = NULL, concat = TRUE)

Arguments

input

an input.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

concat

logical. If TRUE, the function returns a concatenated string. Default is TRUE.

Value

a vector (if concat = TRUE) or a list (if concat = FALSE).

Examples

 data(sentence_example)
 res_tokenize <- tokenize(sentence_example$text)

Tokenizer (a vector input)

Description

Tokenizer (a vector input)

Usage

tokenize_cpp_vec(inputs, model_path, keep_vec, keep_all)

An R wrapper for Jagger's tokenizer (a tibble input)

Description

An R wrapper for Jagger's tokenizer (a tibble input)

Usage

tokenize_tbl(tbl, column, model_path = NULL, keep = NULL)

Arguments

tbl

a tibble.

column

a column name of the tibble to tokenize.

model_path

a path to the model.

keep

a vector of POS(s) to keep. Default is NULL.

Value

a tibble.

Examples

 data(sentence_example)
 res_tokenize <- tokenize_tbl(tibble::as_tibble(sentence_example), "text")

mirror server hosted at Truenetwork, Russian Federation.