Package: bdpar 3.1.0

bdpar: Big Data Preprocessing Architecture

Provide a tool to easily build customized data flows to pre-process large volumes of information from different sources. To this end, 'bdpar' allows to (i) easily use and create new functionalities and (ii) develop new data source extractors according to the user needs. Additionally, the package provides by default a predefined data flow to extract and pre-process the most relevant information (tokens, dates, ... ) from some textual sources (SMS, Email, YouTube comments).

Authors:Miguel Ferreiro-Díaz [aut, cre], David Ruano-Ordás [aut, ctr], Tomás R. Cotos-Yañez [aut, ctr], José Ramón Méndez Reboredo [aut, ctr], University of Vigo [cph]

bdpar_3.1.0.tar.gz
bdpar_3.1.0.zip(r-4.5)bdpar_3.1.0.zip(r-4.4)bdpar_3.1.0.zip(r-4.3)
bdpar_3.1.0.tgz(r-4.4-any)bdpar_3.1.0.tgz(r-4.3-any)
bdpar_3.1.0.tar.gz(r-4.5-noble)bdpar_3.1.0.tar.gz(r-4.4-noble)
bdpar_3.1.0.tgz(r-4.4-emscripten)bdpar_3.1.0.tgz(r-4.3-emscripten)
bdpar.pdf |bdpar.html
bdpar/json (API)
NEWS

# Install 'bdpar' in R:
install.packages('bdpar', repos = c('https://miferreiro.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/miferreiro/bdpar/issues

Datasets:
  • bdparData - Example of the content of the files to be preprocessed.
  • emojisData - Emojis codes and descriptions data.

On CRAN:

custom-flowcustom-pipespreprocessingr6

34 exports 8 stars 1.41 score 7 dependencies 15 scripts 383 downloads

Last updated 9 months agofrom:e92df857b0. Checks:OK: 1 NOTE: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 10 2024
R-4.5-winNOTESep 10 2024
R-4.5-linuxNOTESep 10 2024
R-4.4-winNOTESep 10 2024
R-4.4-macNOTESep 10 2024
R-4.3-winNOTESep 10 2024
R-4.3-macNOTESep 10 2024

Exports:%>|%AbbreviationPipeBdparbdpar.logbdpar.OptionsConnectionsContractionPipeDefaultPipelineDynamicPipelineExtractorEmlExtractorFactoryExtractorSmsExtractorYtbidFile2PipeFindEmojiPipeFindEmoticonPipeFindHashtagPipeFindUrlPipeFindUserNamePipeGenericPipeGenericPipelineGuessDatePipeGuessLanguagePipeInstanceInterjectionPipeMeasureLengthPipeResourceHandlerrunPipelineSlangPipeStopWordPipeStoreFileExtPipeTargetAssigningPipeTeeCSVPipeToLowerCasePipe

Dependencies:data.tabledigestjsonliteR6rlistXMLyaml

A Brief Introduction to bdpar

Rendered frombdpar.Rmdusingknitr::rmarkdownon Sep 10 2024.

Last update: 2023-12-12
Started: 2019-07-09

Basic example using bdpar package

Rendered frombdparExample.Rmdusingknitr::rmarkdownon Sep 10 2024.

Last update: 2020-11-25
Started: 2020-02-20

Image processing example using bdpar package

Rendered frombdparExampleImage.Rmdusingknitr::rmarkdownon Sep 10 2024.

Last update: 2020-11-25
Started: 2020-11-25

Readme and manuals

Help Manual

Help pageTopics
Class to find and/or replace the abbreviations on the data field of an InstanceAbbreviationPipe
Class to manage the preprocess of the files throughout the flow of pipesBdpar
Write messages to the log at a given priority level using the custom bdpar logbdpar.log
Object to handle the keys/attributes/options common to all pipeline flowbdpar.Options
Example of the content of the files to be preprocessed.bdparData
Class to manage the connections with YouTubeConnections
Class to find and/or replace the contractions on the data field of a InstanceContractionPipe
Class implementing a default pipelining process.DefaultPipeline
Class implementing a dynamic pipelining processDynamicPipeline
Emojis codes and descriptions data.emojisData
Class to handle email files with eml extensionExtractorEml
Class to handle the creation of Instance typesExtractorFactory
Class to handle SMS files with tsms extensionExtractorSms
Class to handle comments of YouTube files with ytbid extensionExtractorYtbid
Class to obtain the source field of an InstanceFile2Pipe
Class to find and/or replace the emoji on the data field of an InstanceFindEmojiPipe
Class to find and/or remove the emoticons on the data field of an InstanceFindEmoticonPipe
Class to find and/or remove the hashtags on the data field of an InstanceFindHashtagPipe
Class to find and/or remove the URLs on the data field of an InstanceFindUrlPipe
Class to find and/or remove the users on the data field of an InstanceFindUserNamePipe
Abstract super class that handles the management of the PipesGenericPipe
Abstract super class implementing the pipelining processGenericPipeline
Class to obtain the date field of an InstanceGuessDatePipe
Class to guess the language of an InstanceGuessLanguagePipe
Abstract super class that handles the management of the InstancesInstance
Class to find and/or remove the interjections on the data field of an InstanceInterjectionPipe
Class to obtain the length of the data field of an InstanceMeasureLengthPipe
bdpar customized forward-pipe operator%>|% operator-pipe
Class that handles different types of resourcesResourceHandler
Initiates the pipelining processrunPipeline
Class to find and/or replace the slangs on the data field of an InstanceSlangPipe
Class to find and/or remove the stop words on the data field of an InstanceStopWordPipe
Class to get the file's extension field of an InstanceStoreFileExtPipe
Class to get the target field of the InstanceTargetAssigningPipe
Class to handle a CSV with the properties field of the preprocessed InstanceTeeCSVPipe
Class to convert the data field of an Instance to lower caseToLowerCasePipe