Type: | Package |
Title: | Handling Methods for Naver News Text Crawling |
Version: | 0.8.4 |
Description: | Provides some functions to get Korean text sample from news articles in Naver which is popular news portal service https://news.naver.com/ in Korea. |
License: | MIT + file LICENSE |
URL: | https://forkonlp.github.io/N2H4/, https://github.com/forkonlp/N2H4 |
BugReports: | https://github.com/forkonlp/N2H4/issues |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 3.5.0) |
Encoding: | UTF-8 |
Suggests: | testthat, devtools, usethis |
Imports: | rvest, jsonlite, tibble, lifecycle, httr2, magrittr |
NeedsCompilation: | no |
Packaged: | 2024-02-25 16:04:32 UTC; mrchypark |
Author: | Chanyub Park |
Maintainer: | Chanyub Park <mrchypark@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-02-25 16:20:02 UTC |
N2H4: Handling Methods for Naver News Text Crawling
Description
Provides some functions to get Korean text sample from news articles in Naver which is popular news portal service https://news.naver.com/ in Korea.
Author(s)
Maintainer: Chanyub Park mrchypark@gmail.com (ORCID)
See Also
Useful links:
Report bugs at https://github.com/forkonlp/N2H4/issues
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling 'rhs(lhs)'.
Get All Comment
Description
Get all comments from the provided news article url on naver
Usage
getAllComment(turl)
Arguments
turl |
character. News article on 'Naver' such as <https://n.news.naver.com/mnews/article/023/0003712918>. News article url that is not on Naver.com domain will generate an error. |
Details
Works just like getComment, but this function executed in a fashion where it finds and extracts all comments from the given url.
Value
a [tibble][tibble::tibble-package]
Examples
## Not run:
getAllComment("https://n.news.naver.com/mnews/article/214/0001195110")
## End(Not run)
Get All Comment History
Description
Get All Comment History
Usage
getAllCommentHistory(turl, commentNo)
Arguments
turl |
character. News article on 'Naver' such as <https://n.news.naver.com/mnews/article/001/0009205077?sid=102>. News articl url that is not on Naver.com domain will generate an error. |
commentNo |
Parent Comment No. |
Value
a [tibble][tibble::tibble-package]
Examples
## Not run:
getAllComment("https://n.news.naver.com/mnews/article/214/0001195110?sid=103")
## End(Not run)
News Category
Description
News Category
Usage
getCategory(fresh = FALSE)
Arguments
fresh |
get data from online. Default is FALSE using cached built-in data. |
Get Comment
Description
Get naver news comments. if you want to get data only comment, enter command like below. getComment(url)$result$commentList[[1]]
Usage
getComment(turl, count = 10, type = c("df", "list"))
Arguments
turl |
like <https://n.news.naver.com/mnews/article/023/0003712918>. |
count |
is a number of comments. Defualt is 10. "all" works to get all comments. |
type |
type return df or list. Defualt is df. df return part of data not all. |
Value
a [tibble][tibble::tibble-package]
Examples
## Not run:
getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100")
## End(Not run)
Get Comment History
Description
Get naver news comments on user histories.
Usage
getCommentHistory(turl, commentNo, count = 10, type = c("df", "list"))
Arguments
turl |
character. News article on 'Naver' such as <https://n.news.naver.com/mnews/article/001/0009205077?sid=102>. News articl url that is not on Naver.com domain will generate an error. |
commentNo |
Parent Comment No. |
count |
is a number of comments. Defualt is 10. "all" works to get all comments. |
type |
type return df or list. Defult is df. df return part of data not all. |
Value
a [tibble][tibble::tibble-package]
Examples
## Not run:
cno <- getComment("https://n.news.naver.com/mnews/article/421/0002484966?sid=100")
getCommentHistory("https://n.news.naver.com/mnews/article/421/0002484966?sid=100",
cno$commnetNo[1])
## End(Not run)
Get Content
Description
Get naver news content from links.
Usage
getContent(
turl,
col = c("url", "original_url", "section", "datetime", "edittime", "press", "title",
"body")
)
Arguments
turl |
is naver news link. |
col |
is what you want to get from news. Defualt is all. |
Value
a [tibble][tibble::tibble-package]
Examples
## Not run:
getContent("https://n.news.naver.com/mnews/article/214/0001195110?sid=103")
## End(Not run)
Get News Main Categories
Description
Get naver news main category names and ids recently.
Usage
getMainCategory()
Value
a [tibble][tibble::tibble-package]
Examples
## Not run:
getMainCategory()
## End(Not run)
Get Max Page Number
Description
Get Max Page Number
Usage
getMaxPageNum(turl, max = 100)
Arguments
turl |
is target url include sid1, sid2, date like below. <http://news.naver.com/main/list.nhn?sid2=265&sid1=100&mid=shm&mode=LS2D&date=20161102> |
max |
is also interval to try max page number is numeric. Default is 100. |
Value
Get numeric
Examples
## Not run:
getMaxPageNum("https://news.naver.com/main/list.naver?mode=LS2D&mid=shm&sid1=103&sid2=376")
## End(Not run)
Get News Sub Categories
Description
Get naver news sub category names and urls recently.
Usage
getSubCategory(sid1 = 100)
Arguments
sid1 |
Main category id in naver news url. Only 1 value is passible. Default is 100 means Politics. |
Value
a [tibble][tibble::tibble-package]
Examples
## Not run:
getSubCategory(100)
getSubCategory(100, FALSE)
## End(Not run)
Get Url List By Category
Description
Get naver news titles and links from target url.
Usage
getUrlList(turl, col = c("titles", "links"))
Arguments
turl |
is target url naver news. |
col |
is what you want to get from news. Defualt is all. |
Value
a [tibble][tibble::tibble-package]
Examples
## Not run:
getUrlList("https://news.naver.com/main/list.naver?mode=LS2D&mid=shm&sid1=103&sid2=376")
## End(Not run)