Title: | Big Data Preprocessing Architecture |
---|---|
Description: | Provide a tool to easily build customized data flows to pre-process large volumes of information from different sources. To this end, 'bdpar' allows to (i) easily use and create new functionalities and (ii) develop new data source extractors according to the user needs. Additionally, the package provides by default a predefined data flow to extract and pre-process the most relevant information (tokens, dates, ... ) from some textual sources (SMS, Email, YouTube comments). |
Authors: | Miguel Ferreiro-Díaz [aut, cre], David Ruano-Ordás [aut, ctr], Tomás R. Cotos-Yañez [aut, ctr], José Ramón Méndez Reboredo [aut, ctr], University of Vigo [cph] |
Maintainer: | Miguel Ferreiro-Díaz <[email protected]> |
License: | GPL-3 |
Version: | 3.1.0 |
Built: | 2024-11-09 03:52:49 UTC |
Source: | https://github.com/miferreiro/bdpar |
AbbreviationPipe
class is responsible for detecting
the existing abbreviations in the data field of each Instance
.
Identified abbreviations are stored inside the abbreviation field of
Instance
class. Moreover if needed, is able to perform inline
abbreviations replacement.
AbbreviationPipe
class requires the resource files (in json format)
containing the correspondence between abbreviations and meaning. To this end,
the language of the text indicated in the propertyLanguageName should
be contained in the resource file name (ie. abbrev.xxx.json where xxx is the
value defined in the propertyLanguageName ). The location of the
resources should be defined in the "resources.abbreviations.path"
field of bdpar.Options variable.
AbbreviationPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> AbbreviationPipe
new()
Creates a AbbreviationPipe
object.
AbbreviationPipe$new( propertyName = "abbreviation", propertyLanguageName = "language", alwaysBeforeDeps = list("GuessLanguagePipe"), notAfterDeps = list(), replaceAbbreviations = TRUE, resourcesAbbreviationsPath = NULL )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
propertyLanguageName
A character
value. Name of the
language property.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
replaceAbbreviations
A logical
value. Indicates if
the abbreviations are replaced or not.
resourcesAbbreviationsPath
A character
value. Path
of resource files (in json format) containing the correspondence between
abbreviations and meaning.
pipe()
Preprocesses the Instance
to obtain/replace
the abbreviations. The abbreviations found in the data are added to the
list of properties of the Instance
.
AbbreviationPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findAbbreviation()
Checks if the abbreviation is in the data.
AbbreviationPipe$findAbbreviation(data, abbreviation)
A logical
value depending on whether the
abbreviation is in the data.
replaceAbbreviation()
Replaces the abbreviation in the data for the extendedAbbreviation.
AbbreviationPipe$replaceAbbreviation(abbreviation, extendedAbbreviation, data)
The data with the abbreviations replaced.
getPropertyLanguageName()
Gets the name of property language.
AbbreviationPipe$getPropertyLanguageName()
Value of name of property language.
getResourcesAbbreviationsPath()
Gets the path of abbreviations resources.
AbbreviationPipe$getResourcesAbbreviationsPath()
Value of path of abbreviations resources.
setResourcesAbbreviationsPath()
Sets the path of abbreviations resources.
AbbreviationPipe$setResourcesAbbreviationsPath(path)
path
A character
value. The new value of the path of
abbreviations resources.
clone()
The objects of this class are cloneable with this method.
AbbreviationPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
bdpar.Options
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, FindUserNamePipe
,
GuessDatePipe
, GuessLanguagePipe
,
Instance
, InterjectionPipe
,
MeasureLengthPipe
, GenericPipe
,
ResourceHandler
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
Bdpar
class provides the static variables required
to perform the whole data flow process. To this end Bdpar
is
in charge of (i) initialize the objects of handle the connections to APIs
(Connections
) and handles json resources (ResourceHandler
)
and (ii) executing the flow of pipes (inherited from GenericPipeline
class)
passed as argument.
In the case that some pipe, defined on the workflow, needs some type of configuration, it can be defined through bdpar.Options variable which have different methods to support the functionality of different pipes.
(Connections) object that handles the connections with YouTube and Twitter.
(ResourceHandler) object that handles the json resources files.
new()
Creates a Bdpar object. Initializes the static variables: connections and resourceHandler.
Bdpar$new()
execute()
Preprocess files through the indicated flow of pipes.
Bdpar$execute( path, extractors = ExtractorFactory$new(), pipeline = DefaultPipeline$new(), cache = TRUE, verbose = FALSE, summary = FALSE )
path
A character
value. The path where the files to
be processed are located.
extractors
A ExtractorFactory
value. Class which
implements the createInstance
method to choose which type of
Instance
is created.
pipeline
A GenericPipeline
value. Subclass of
GenericPipeline
, which implements the execute
method.
By default, it is the DefaultPipeline
pipeline.
cache
(logical) flag indicating if the status of the instances will be stored after each pipe. This allows to avoid rejections of previously executed tasks, if the order and configuration of the pipe and pipeline is the same as what is stored in the cache.
verbose
(logical) flag indicating for printing messages, warnings and errors.
summary
(logical) flag indicating if a summary of the pipeline execution is provided or not.
In case of wanting to parallelize, it is necessary to indicate the number of cores to be used through bdpar.Options$set("numCores", numCores)
The list of Instances
that have been preprocessed.
clone()
The objects of this class are cloneable with this method.
Bdpar$clone(deep = FALSE)
deep
Whether to make a deep clone.
bdpar.Options
, Connections
,
DefaultPipeline
, DynamicPipeline
,
GenericPipeline
, Instance
,
ExtractorFactory
, ResourceHandler
,
runPipeline
## Not run: #If it is necessary to indicate any configuration, do it through: #bdpar.Options$set(key, value) #If the key is not initialized, do it through: #bdpar.Options$add(key, value) #If it is necessary parallelize, do it through: #bdpar.Options$set("numCores", numCores) #If it is necessary to change the behavior of the log, do it through: #bdpar.Options$configureLog(console = TRUE, threshold = "INFO", file = NULL) #Folder with the files to preprocess path <- system.file("example", package = "bdpar") #Object which decides how creates the instances extractors <- ExtractorFactory$new() #Object which indicates the pipes' flow pipeline <- DefaultPipeline$new() objectBdpar <- Bdpar$new() #Starting file preprocessing... objectBdpar$execute(path = path, extractors = extractors, pipeline = pipeline, cache = FALSE, verbose = FALSE, summary = TRUE) ## End(Not run)
## Not run: #If it is necessary to indicate any configuration, do it through: #bdpar.Options$set(key, value) #If the key is not initialized, do it through: #bdpar.Options$add(key, value) #If it is necessary parallelize, do it through: #bdpar.Options$set("numCores", numCores) #If it is necessary to change the behavior of the log, do it through: #bdpar.Options$configureLog(console = TRUE, threshold = "INFO", file = NULL) #Folder with the files to preprocess path <- system.file("example", package = "bdpar") #Object which decides how creates the instances extractors <- ExtractorFactory$new() #Object which indicates the pipes' flow pipeline <- DefaultPipeline$new() objectBdpar <- Bdpar$new() #Starting file preprocessing... objectBdpar$execute(path = path, extractors = extractors, pipeline = pipeline, cache = FALSE, verbose = FALSE, summary = TRUE) ## End(Not run)
bdpar.log
is responsible for managing the messages to
show on the log.
bdpar.log(message, level = "INFO", className = NULL, methodName = NULL)
bdpar.log(message, level = "INFO", className = NULL, methodName = NULL)
message |
A string to be printed to the log with the corresponding priority level. |
level |
The desired priority level (DEBUG,INFO,WARN,ERROR and FATAL). In the case of the FATAL level will be call to the stop function. Also, if the level is WARN, the message will be a warning. |
className |
A string to indicated in which class is called to the log. If the value is NULL, this field is not shown in the log. |
methodName |
A string to indicated in which method is called to the log. If the value is NULL, this field is not shown in the log. |
The format output is as following:
[currentTime][className][methodName][level] message
The type of message changes according to the level indicated:
- The DEBUG,INFO and ERROR levels return a text
using the message
function.
- The WARN level returns a text using the warning
function.
- The FATAL level returns a text using the stop
function.
In the case of multithreading, the log will only be by file.
## Not run: # First step, configure the behavior of log bdpar.options$configureLog(console = TRUE, threshold = "DEBUG", file = NULL) message <- "Message example" className <- "Class name example" methodName <- "Method name example" bdpar.log(message = message, level = "DEBUG", className = NULL, methodName = NULL) bdpar.log(message = message, level = "INFO", className = className, methodName = methodName) bdpar.log(message = message, level = "WARN", className = className, methodName = NULL) bdpar.log(message = message, level = "ERROR", className = NULL, methodName = NULL) bdpar.log(message = message, level = "FATAL", className = NULL, methodName = methodName) ## End(Not run)
## Not run: # First step, configure the behavior of log bdpar.options$configureLog(console = TRUE, threshold = "DEBUG", file = NULL) message <- "Message example" className <- "Class name example" methodName <- "Method name example" bdpar.log(message = message, level = "DEBUG", className = NULL, methodName = NULL) bdpar.log(message = message, level = "INFO", className = className, methodName = methodName) bdpar.log(message = message, level = "WARN", className = className, methodName = NULL) bdpar.log(message = message, level = "ERROR", className = NULL, methodName = NULL) bdpar.log(message = message, level = "FATAL", className = NULL, methodName = methodName) ## End(Not run)
This class provides the necessary methods to manage a list of keys or options used along the pipe flow, both those provided by the default library and those implemented by the user.
bdpar.Options
bdpar.Options
By default, the application initializes the object named bdpar.Options
of type BdparOptions
which is in charge of initializing the
options used in the defined pipes.
The default fields on bdpar.Options
are initialized, if needed,
as shown bellow:
[eml]
- bdpar.Options$set("extractorEML.mpaPartSelected", <<PartSelectedOnMPAlternative>>)
[resources]
- bdpar.Options$set("resources.abbreviations.path", <<abbreviation.path>>)
- bdpar.Options$set("resources.contractions.path", <<contractions.path>>)
- bdpar.Options$set("resources.interjections.path", <<interjections.path>>)
- bdpar.Options$set("resources.slangs.path", <<slangs.path>>)
- bdpar.Options$set("resources.stopwords.path", <<stopwords.path>>)
[teeCSVPipe]
- bdpar.Options$set("teeCSVPipe.output.path", <<outputh.path>>)
[youtube]
- bdpar.Options$set("youtube.app.id", <<app_id>>)
- bdpar.Options$set("youtube.app.password", <<app_password>>)
- bdpar.Options$set("cache.youtube.path", <<cache.path>>)
[cache]
- bdpar.Options$set("cache", <<status_cache>>)
- bdpar.Options$set("cache.folder", <<cache.path>>)
[parallel]
- bdpar.Options$set("numCores", <<num_cores>>)
[verbose]
- bdpar.Options$set("verbose", <<status_verbose>>)
If the bdpar cache is configured through the "cache" and "cache.folder" options, the status of the instances will be stored after each pipe. This allows to avoid rejections of previously executed tasks, if the order and configuration of the pipe and pipeline is the same as what is stored in the cache.
If you want to remove the cache, the cleanCache
method does
this task.
The parallelization of instances is configured through the "numCores" option, which indicates the number of cores that will be used in the processing.
In the case of parallelisation, only the log by file will work to allow collecting all the information produced by the cores.
The bdpar log is configured through the configureLog
function.
This system manages both the place to display the messages and the priority
level of each message showing only the messages with a higher level than
indicated in the threshold variable.
If you want to deactivate the bdpar log, the disableLog
method in bdpar.Options
does this task.
obtains a specific option.
get(key)
the value of the specific option.
(character) the name of the option to obtain.
adds a option to the list of options
add(key, value)
(character) the name of the new option.
(Object) the value of the new option.
modifies the value of the one option.
set(key, value)
(character) the name of the new option.
(Object) the value of the new option.
removes a specific option.
remove(key)
(character) the name of the option to remove.
gets the list of options.
getAll()
Value of options.
resets the option list to the initial state.
reset()
checks for the existence of an specific option.
isSpecificProperty(key)
A boolean results according to the existence of the specific option in the list of options
(character) the key of the option to check.
Cleans the cache of executed pipelines. Deletes all files and directories that are in the path defined in "cache.folder" option.
cleanCache()
Configures the bdpar log. In the case of parallelisation, only the log by file will work.
configureLog(console = TRUE, threshold = "INFO", file = NULL)
(boolean) Shows the log on console or not.
(character) The logging threshold level. Messages with a lower priority level will be discarded.
(character) The file to write messages to. If it is NULL, the log in file will not be enabled.
Deactivates the bdpar log.
disableLog()
Print the bdpar log configuration.
getLogConfiguration()
AbbreviationPipe
, bdpar.log
,
Connections
, ContractionPipe
,
ExtractorEml
, ExtractorYtbid
,
GuessLanguagePipe
, Instance
,
SlangPipe
, StopWordPipe
,
TeeCSVPipe
, %>|%
A manually collected data set containing e-mails and SMS messages from the nutritional and health domain classified as spam and non-spam (with a ratio of 50%). In addition the dataset contains two variables: (i) path which indicates the location of the target file and, (ii) source which contains the raw text comprising each file.
data(bdparData)
data(bdparData)
A data frame with 20 rows and 2 variables:
File path.
File content.
The tasks of the functions that the Connections
class has are to establish the connections and control the number of requests
that have been made with the API of YouTube.
The way to indicate the keys of YouTube has to be through fields of bdpar.Options variable:
[youtube]
- bdpar.Options$set("youtube.app.id", <<app_id>>)
- bdpar.Options$set("youtube.app.password", <<app_password>>)
Fields of unused connections will be automatically ignored by the platform.
new()
Creates a Connections
object.
Connections$new()
startConnectionWithYoutube()
Function able to establish the connection with YouTube.
Connections$startConnectionWithYoutube()
addNumRequestToYoutube()
Function that increases in one the number of request to YouTube.
Connections$addNumRequestToYoutube()
checkRequestToYoutube()
Handles the connection with YouTube.
Connections$checkRequestToYoutube()
getNumRequestMaxToYoutube()
Gets the number of maximum requests allowed by YouTube API.
Connections$getNumRequestMaxToYoutube()
Value of number maximum of request to YouTube.
clone()
The objects of this class are cloneable with this method.
Connections$clone(deep = FALSE)
deep
Whether to make a deep clone.
ContractionPipe
class is responsible for detecting
the existing contractions in the data field of each Instance
.
Identified contractions are stored inside the contraction field of
Instance
class. Moreover if needed, is able to perform inline
contractions replacement.
ContractionPipe
class requires the resource files (in json format)
containing the correspondence between contractions and meaning. To this end,
the language of the text indicated in the propertyLanguageName should
be contained in the resource file name (ie. contr.xxx.json where xxx is the
value defined in the propertyLanguageName ). The location of the
resources should be defined in the "resources.contractions.path"
field of bdpar.Options variable.
ContractionPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> ContractionPipe
new()
Creates a ContractionPipe
object.
ContractionPipe$new( propertyName = "contractions", propertyLanguageName = "language", alwaysBeforeDeps = list("GuessLanguagePipe"), notAfterDeps = list(), replaceContractions = TRUE, resourcesContractionsPath = NULL )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
propertyLanguageName
A character
value. Name of the
language property.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
replaceContractions
A logical
value. Indicates if
the contractions are replaced or not.
resourcesContractionsPath
A character
value. Path
of resource files (in json format) containing the correspondence between
contractions and meaning.
pipe()
Preprocesses the Instance
to obtain/replace
the contractions. The contractions found in the data are added to the
list of properties of the Instance
.
ContractionPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findContraction()
Checks if the contraction is in the data.
ContractionPipe$findContraction(data, contraction)
A logical
value depending on whether the
contraction is in the data.
replaceContraction()
Replaces the contraction in the data for the extendedContraction.
ContractionPipe$replaceContraction(contraction, extendedContraction, data)
The data with the contractions replaced.
getPropertyLanguageName()
Gets the name of property language.
ContractionPipe$getPropertyLanguageName()
Value of name of property language.
getResourcesContractionsPath()
Gets the path of contractions resources.
ContractionPipe$getResourcesContractionsPath()
Value of path of contractions resources.
setResourcesContractionsPath()
Sets the path of contractions resources.
ContractionPipe$setResourcesContractionsPath(path)
path
A character
value. The new value of the path of
contractions resources.
clone()
The objects of this class are cloneable with this method.
ContractionPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, bdpar.Options
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, FindUserNamePipe
,
GuessDatePipe
, GuessLanguagePipe
,
Instance
, InterjectionPipe
,
MeasureLengthPipe
, GenericPipe
,
ResourceHandler
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
This DefaultPipeline
class inherits from the
GenericPipeline
class. Includes the execute method which
provides a default pipelining implementation.
The default flow is:
instance %>|% TargetAssigningPipe$new() %>|% StoreFileExtPipe$new() %>|% GuessDatePipe$new() %>|% File2Pipe$new() %>|% MeasureLengthPipe$new(propertyName = "length_before_cleaning_text") %>|% FindUserNamePipe$new() %>|% FindHashtagPipe$new() %>|% FindUrlPipe$new() %>|% FindEmoticonPipe$new() %>|% FindEmojiPipe$new() %>|% GuessLanguagePipe$new() %>|% ContractionPipe$new() %>|% AbbreviationPipe$new() %>|% SlangPipe$new() %>|% ToLowerCasePipe$new() %>|% InterjectionPipe$new() %>|% StopWordPipe$new() %>|% MeasureLengthPipe$new(propertyName = "length_after_cleaning_text") %>|% TeeCSVPipe$new()
This class inherits from GenericPipeline
and implements the
execute
abstract function.
bdpar::GenericPipeline
-> DefaultPipeline
new()
Creates a DefaultPipeline
object.
DefaultPipeline$new()
execute()
Function where is implemented the flow of the
GenericPipes
.
DefaultPipeline$execute(instance)
The preprocessed Instance
.
get()
Gets a list with containing the set of
link{GenericPipe}s
of the pipeline,
DefaultPipeline$get()
The set of GenericPipes
containing the pipeline.
print()
Prints pipeline representation. (Override print function)
DefaultPipeline$print(...)
...
Further arguments passed to or from other methods.
toString()
Returns a character
representing the pipeline
DefaultPipeline$toString()
DefaultPipeline
character
representation
clone()
The objects of this class are cloneable with this method.
DefaultPipeline$clone(deep = FALSE)
deep
Whether to make a deep clone.
bdpar.log
, Instance
,
DynamicPipeline
, GenericPipeline
,
GenericPipe
, %>|%
This DynamicPipeline
class inherits from the
GenericPipeline
class. Includes the execute method
which provides a dynamic pipelining implementation.
'
This class inherits from GenericPipeline
and implements the
execute
abstract function.
bdpar::GenericPipeline
-> DynamicPipeline
new()
Creates a DynamicPipeline
object.
DynamicPipeline$new(pipeline = NULL)
pipeline
A list
of GenericPipe
objects. Initializes the flow of GenericPipe
.
add()
Adds a GenericPipe
or a
GenericPipe
list to the pipeline.
DynamicPipeline$add(pipe, pos = NULL)
pipe
A GenericPipe
object or a list
of
GenericPipe
objects.
pos
A (numeric) value. The value of the position to add.
If it is NULL, GenericPipe
is appended to the pipeline.
removeByPos()
Removes GenericPipes
by the position on the
pipeline.
DynamicPipeline$removeByPos(pos)
pos
A (numeric) value. The value of the position to remove.
removeByPipe()
Removes GenericPipes
by its name on the
pipeline.
DynamicPipeline$removeByPipe(pipe.name)
pipe.name
A (character) value. The
GenericPipes
name to remove.
removeAll()
Removes all GenericPipes
included on pipeline.
DynamicPipeline$removeAll()
execute()
Function where is implemented the flow of the
GenericPipes
.
DynamicPipeline$execute(instance)
instance
A (Instance) value. The Instance
that is going to be processed.
get()
Gets a list with containing the set of GenericPipes
of the pipeline.
DynamicPipeline$get()
The set of GenericPipes
containing the pipeline.
print()
Prints pipeline representation. (Override print function)
DynamicPipeline$print(...)
...
Further arguments passed to or from other methods.
toString()
Returns a character
representing the pipeline
DynamicPipeline$toString()
DynamicPipeline
character
representation
clone()
The objects of this class are cloneable with this method.
DynamicPipeline$clone(deep = FALSE)
deep
Whether to make a deep clone.
bdpar.log
, Instance
,
DefaultPipeline
, GenericPipeline
,
GenericPipe
, %>|%
This data comes from "Unicode.org", <http://unicode.org/emoji/charts/full-emoji-list.html>. The data are codes and descriptions of Emojis.
data(emojisData)
data(emojisData)
A data frame with 2623 rows and 2 variables:
Emoji code
Emoji description.
This class inherits from the Instance
class and
implements the functions of extracting the text and the date from an eml type
file.
The way to indicate which part to choose in the email, when is a multipart email,
is through the "extractorEML.mpaPartSelected"
field of bdpar.Options
variable.
To be able to use this class it is necessary to have Python installed.
This class inherits from Instance
and implements the
obtainSource
and obtainDate
abstracts functions.
bdpar::Instance
-> ExtractorEml
bdpar::Instance$addBanPipes()
bdpar::Instance$addFlowPipes()
bdpar::Instance$addProperties()
bdpar::Instance$checkCompatibility()
bdpar::Instance$getBanPipes()
bdpar::Instance$getData()
bdpar::Instance$getDate()
bdpar::Instance$getFlowPipes()
bdpar::Instance$getNamesOfProperties()
bdpar::Instance$getPath()
bdpar::Instance$getProperties()
bdpar::Instance$getSource()
bdpar::Instance$getSpecificProperty()
bdpar::Instance$invalidate()
bdpar::Instance$isInstanceValid()
bdpar::Instance$isSpecificProperty()
bdpar::Instance$setData()
bdpar::Instance$setDate()
bdpar::Instance$setProperties()
bdpar::Instance$setSource()
bdpar::Instance$setSpecificProperty()
new()
Creates a ExtractorEml
object.
ExtractorEml$new(path, PartSelectedOnMPAlternative = NULL)
path
A character
value. Path of the eml file.
PartSelectedOnMPAlternative
A character
value. Configuration to read
the eml files. If it is NULL, checks if is defined in the
"extractorEML.mpaPartSelected" field of bdpar.Options
variable.
obtainDate()
Obtains the date of the eml file. Calls the function read_emails and obtains the date of the file indicated in the path and then transforms it into the generic date format, that is "%a %b %d %H:%M:%S %Z %Y" (Example: "Thu May 02 06:52:36 UTC 2013").
ExtractorEml$obtainDate()
obtainSource()
Obtains the source of the eml file. Calls the function read_emails and obtains the source of the file indicated in the path. In addition, it initializes the data with the initial source.
ExtractorEml$obtainSource()
getPartSelectedOnMPAlternative()
Gets of PartSelectedOnMPAlternative variable.
ExtractorEml$getPartSelectedOnMPAlternative()
Value of PartSelectedOnMPAlternative variable.
setPartSelectedOnMPAlternative()
Gets of PartSelectedOnMPAlternative
variable.
ExtractorEml$setPartSelectedOnMPAlternative(PartSelectedOnMPAlternative)
PartSelectedOnMPAlternative
A character
value. The
new value of PartSelectedOnMPAlternative variable.
toString()
Returns a character
representing the instance
ExtractorEml$toString()
Instance
character
representation
clone()
The objects of this class are cloneable with this method.
ExtractorEml$clone(deep = FALSE)
deep
Whether to make a deep clone.
bdpar.Options
, ExtractorSms
,
ExtractorYtbid
, Instance
ExtractorFactory
class builds the appropriate
Instance
object according to the file extension. In the case
of not finding the registered extension, the default extractor will be used
if it has been previously configured.
new()
Creates a ExtractorFactory
object.
ExtractorFactory$new()
registerExtractor()
Adds an extractor to the list of extensions. If the extension is an empty string (""), the indicated extractor will be the default when there is no extractor associated with an extension.
ExtractorFactory$registerExtractor(extensions, extractor)
extensions
A character
array. The names of the
extension option.
extractor
A Object
value. The extractor of the new
extension.
setExtractor()
Modifies the extractor of the one extension.
ExtractorFactory$setExtractor(extension, extractor)
extension
A character
value. The name of the
extension option.
extractor
A Object
value. The value of the new
extractor.
setDefaultExtractor()
Modifies the extractor of the one extension. Assign NULL value to disable the default extractor.
ExtractorFactory$setDefaultExtractor(defaultExtractor)
defaultExtractor
A Object
value. The value of the default
extractor.
removeExtractor()
Removes a specific extractor thought the extension.
ExtractorFactory$removeExtractor(extension)
extension
A character
value. The name of the
extension to remove.
getAllExtractors()
Gets the list of extractors.
ExtractorFactory$getAllExtractors()
Value of extractors.
getDefaultExtractor()
Gets the default extractor.
ExtractorFactory$getDefaultExtractor()
Value of default extractor.
isSpecificExtractor()
Checks if exists an extractor for a specific extension.
ExtractorFactory$isSpecificExtractor(extension)
extension
A character
value. The name of the
extension to check
Value of extractors.
createInstance()
Builds the Instance
object according to the
file extension. In the case of not finding the registered extension, the
default extractor will be used if it has been previously configured.
ExtractorFactory$createInstance(path)
The Instance
corresponding object according to the
file extension.
reset()
Resets list of extractor to default state.
ExtractorFactory$reset()
print()
Prints pipeline representation. (Override print function)
ExtractorFactory$print(...)
...
Further arguments passed to or from other methods.
clone()
The objects of this class are cloneable with this method.
ExtractorFactory$clone(deep = FALSE)
deep
Whether to make a deep clone.
ExtractorEml
, ExtractorSms
,
Instance
This class that inherits from the Instance
class and
implements the functions of extracting the text and the date of an tsms type file.
Due to the fact that the creation date of the message can not be extracted from the text of an SMS, the date will be initialized to empty.
This class inherits from Instance
and implements the
obtainSource
and obtainDate
abstracts functions.
bdpar::Instance
-> ExtractorSms
bdpar::Instance$addBanPipes()
bdpar::Instance$addFlowPipes()
bdpar::Instance$addProperties()
bdpar::Instance$checkCompatibility()
bdpar::Instance$getBanPipes()
bdpar::Instance$getData()
bdpar::Instance$getDate()
bdpar::Instance$getFlowPipes()
bdpar::Instance$getNamesOfProperties()
bdpar::Instance$getPath()
bdpar::Instance$getProperties()
bdpar::Instance$getSource()
bdpar::Instance$getSpecificProperty()
bdpar::Instance$invalidate()
bdpar::Instance$isInstanceValid()
bdpar::Instance$isSpecificProperty()
bdpar::Instance$setData()
bdpar::Instance$setDate()
bdpar::Instance$setProperties()
bdpar::Instance$setSource()
bdpar::Instance$setSpecificProperty()
new()
Creates a ExtractorSms
object.
ExtractorSms$new(path)
path
A character
value. Path of the tsms file.
obtainDate()
Obtains the date of the SMS file.
ExtractorSms$obtainDate()
obtainSource()
Obtains the source of the SMS file. Reads the file indicated in the path. In addition, it initializes the data field with the initial source.
ExtractorSms$obtainSource()
toString()
Returns a character
representing the instance
ExtractorSms$toString()
Instance
character
representation
clone()
The objects of this class are cloneable with this method.
ExtractorSms$clone(deep = FALSE)
deep
Whether to make a deep clone.
ExtractorEml
, ExtractorYtbid
,
Instance
This class inherits from the Instance
class and
implements the functions of extracting the text and the date of an ytbid type file.
YouTube connection is handled through the Connections
class
which loads the YouTube API credentials from the bdpar.Options object.
Additionally, to increase the processing speed, each Youtube query is stored
in a cache to avoid the execution of duplicated queries. To enable this option,
cache location should be in the "cache.youtube.path" field of
bdpar.Options variable. This variable has to be the
path to store the comments and it is necessary that it has two folder named:
"_spam_" and "_ham_"
This class inherits from Instance
and implements the
obtainSource
and obtainDate
abstracts functions.
bdpar::Instance
-> ExtractorYtbid
bdpar::Instance$addBanPipes()
bdpar::Instance$addFlowPipes()
bdpar::Instance$addProperties()
bdpar::Instance$checkCompatibility()
bdpar::Instance$getBanPipes()
bdpar::Instance$getData()
bdpar::Instance$getDate()
bdpar::Instance$getFlowPipes()
bdpar::Instance$getNamesOfProperties()
bdpar::Instance$getPath()
bdpar::Instance$getProperties()
bdpar::Instance$getSource()
bdpar::Instance$getSpecificProperty()
bdpar::Instance$invalidate()
bdpar::Instance$isInstanceValid()
bdpar::Instance$isSpecificProperty()
bdpar::Instance$setData()
bdpar::Instance$setDate()
bdpar::Instance$setProperties()
bdpar::Instance$setSource()
bdpar::Instance$setSpecificProperty()
new()
Creates a ExtractorYtbid
object.
ExtractorYtbid$new(path, cachePath = NULL)
path
A character
value. Path of the ytbid file.
cachePath
A character
value. Path of the cache
location. If it is NULL, checks if is defined in the
"cache.youtube.path" field of bdpar.Options
variable.
obtainId()
Obtains the ID of the specific Youtube's comment. Reads the ID of the file indicated in the variable path.
ExtractorYtbid$obtainId()
getId()
Gets the ID of an specific Youtube's comment.
ExtractorYtbid$getId()
Value of Youtube's comment ID.
obtainDate()
Obtains the date from a specific comment ID. If the comment has been previously cached the comment date is loaded from cache path. Otherwise, the request is perfomed using YouTube API and the date is then formatted to the established standard.
ExtractorYtbid$obtainDate()
obtainSource()
Obtains the source from a specific comment ID. If the comment has previously been cached the source is loaded from cache path. Otherwise, the request is performed using on YouTube API.
ExtractorYtbid$obtainSource()
toString()
Returns a character
representing the instance
ExtractorYtbid$toString()
Instance
character
representation
clone()
The objects of this class are cloneable with this method.
ExtractorYtbid$clone(deep = FALSE)
deep
Whether to make a deep clone.
bdpar.Options
, Connections
,
ExtractorEml
, ExtractorSms
,
Instance
Obtains the source using the method which implements the
subclass of Instance
.
File2Pipe
will automatically invalidate the
Instance
whenever the obtained source is empty or not in UTF-8 format.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> File2Pipe
new()
Creates a File2Pipe
object.
File2Pipe$new( propertyName = "source", alwaysBeforeDeps = list("TargetAssigningPipe"), notAfterDeps = list() )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
pipe()
Preprocesses the Instance
to obtain the
source.
File2Pipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
clone()
The objects of this class are cloneable with this method.
File2Pipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
FindEmojiPipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
This class is responsible of detecting the existing emojis in the
data field of each Instance
. Identified emojis are
stored inside the emoji field of Instance
class.
Moreover if required, is able to perform inline emoji replacement.
FindEmojiPipe
use the emoji list provided by data(emojisData).
FindEmojiPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> FindEmojiPipe
new()
Creates a FindEmojiPipe
object.
FindEmojiPipe$new( propertyName = "Emojis", alwaysBeforeDeps = list(), notAfterDeps = list(), replaceEmojis = TRUE )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
replaceEmojis
A logical
value. Indicates if the
emojis are replaced.
propertyLanguageName
A character
value. Name of the
language property.
pipe()
Preprocesses the Instance
to obtain/replace
the emojis. The emojis found in the data are added to the
list of properties of the Instance
.
FindEmojiPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findEmoji()
Checks if the emoji is in the data.
FindEmojiPipe$findEmoji(data, emoji)
A logical
value depending on whether the
emoji is in the data.
replaceEmoji()
Replaces the emoji in the data for the extendedEmoji.
FindEmojiPipe$replaceEmoji(emoji, extendedEmoji, data)
The data with the emojis replaced.
clone()
The objects of this class are cloneable with this method.
FindEmojiPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
This class is responsible of detecting the existing emoticons in the
data field of each Instance
. Identified emoticons are
stored inside the emoticon field of Instance
class.
Moreover if required, is able to perform inline emoticon removement.
The regular expression indicated in the emoticonPattern
variable is used to identify emoticons.
FindEmoticonPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> FindEmoticonPipe
emoticonPattern
A character
value. The regular
expression to detect emoticons.
new()
Creates a FindEmoticonPipe
object.
FindEmoticonPipe$new( propertyName = "emoticon", alwaysBeforeDeps = list(), notAfterDeps = list("FindHashtagPipe"), removeEmoticons = TRUE )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
removeEmoticons
A logical
value. Indicates if the
emoticons are removed.
propertyLanguageName
A character
value. Name of the
language property.
pipe()
Preprocesses the Instance
to obtain/remove
the emoticons. The emoticons found in the data are added to the
list of properties of the Instance
.
FindEmoticonPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findEmoticon()
Finds the emoticons in the data.
FindEmoticonPipe$findEmoticon(data)
data
A character
value. The text to search the
emoticons.
The list
with emoticons found.
removeEmoticon()
Removes the emoticons in the data.
FindEmoticonPipe$removeEmoticon(data)
data
A character
value. The text where emoticons
will be removed.
The data with the emoticons removed.
clone()
The objects of this class are cloneable with this method.
FindEmoticonPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
This class is responsible of detecting the existing hashtags in the
data field of each Instance
. Identified hashtags are
stored inside the hashtag field of Instance
class.
Moreover if required, is able to perform inline hashtag removement.
The regular expression indicated in the hashtagPattern
variable is used to identify hashtags.
FindHashtagPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> FindHashtagPipe
hashtagPattern
A character
value. The regular
expression to detect hashtags.
new()
Creates a FindHashtagPipe
object.
FindHashtagPipe$new( propertyName = "hashtag", alwaysBeforeDeps = list(), notAfterDeps = list(), removeHashtags = TRUE )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
removeHashtags
A logical
value. Indicates if the
hashtags are removed.
propertyLanguageName
A character
value. Name of the
language property.
pipe()
Preprocesses the Instance
to obtain/remove
the hashtags. The hashtags found in the data are added to the
list of properties of the Instance
.
FindHashtagPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findHashtag()
Finds the hashtags in the data.
FindHashtagPipe$findHashtag(data)
data
A character
value. The text to search the
hashtags.
The list
with hashtags found.
removeHashtag()
Removes the hashtags in the data.
FindHashtagPipe$removeHashtag(data)
data
A character
value. The text where hashtags
will be removed.
The data with the hashtags removed.
clone()
The objects of this class are cloneable with this method.
FindHashtagPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
This class is responsible of detecting the existing URLs in the
data field of each Instance
. Identified URLs are
stored inside the URLs field of Instance
class.
Moreover if required, is able to perform inline URLs removement.
The regular expressions indicated in the URLPatterns
variable are used to identify URLs.
FindUrlPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> FindUrlPipe
new()
Creates a FindUrlPipe
object.
FindUrlPipe$new( propertyName = "URLs", alwaysBeforeDeps = list(), notAfterDeps = list("FindUrlPipe"), removeUrls = TRUE, URLPatterns = list(self$URLPattern, self$EmailPattern), namesURLPatterns = list("UrlPattern", "EmailPattern") )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
removeUrls
A logical
value. Indicates if the
URLs are removed.
URLPatterns
A list
value. The regex to find URLs.
namesURLPatterns
A list
value. The names of regex.
propertyLanguageName
A character
value. Name of the
language property.
pipe()
Preprocesses the Instance
to obtain/remove
the URLs. The URLs found in the data are added to the
list of properties of the Instance
.
FindUrlPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findUrl()
Finds the URLs in the data.
FindUrlPipe$findUrl(pattern, data)
The list
with URLs found.
removeUrl()
Removes the URL in the data.
FindUrlPipe$removeUrl(pattern, data)
The data with URLs removed.
putNamesURLPattern()
Sets the names to URL patterns result.
FindUrlPipe$putNamesURLPattern(resultOfURLPatterns)
resultOfURLPatterns
A list
value. The list with
URLs found.
The URLs found with the names of URL pattern.
getURLPatterns()
Gets the URL patterns.
FindUrlPipe$getURLPatterns()
Value of URL patterns.
setURLPatterns()
Sets the URL patterns.
FindUrlPipe$setURLPatterns(URLPatterns)
URLPatterns
A list
value. The new value of
the URL patterns.
getNamesURLPatterns()
Gets the names of URLs.
FindUrlPipe$getNamesURLPatterns()
Value of names of URLs.
setNamesURLPatterns()
Sets the names of URLs.
FindUrlPipe$setNamesURLPatterns(namesURLPatterns)
namesURLPatterns
A list
value. The new value of
the names of URLs.
clone()
The objects of this class are cloneable with this method.
FindUrlPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
This class is responsible of detecting the existing use names in the
data field of each Instance
. Identified user names are
stored inside the userName field of Instance
class.
Moreover if required, is able to perform inline user name removement.
The regular expressions indicated in the userPattern
variable are used to identify user names.
FindUserNamePipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> FindUserNamePipe
userPattern
A character
value. The regular
expression to detect name users.
new()
Creates a FindEmoticonPipe
object.
FindUserNamePipe$new( propertyName = "userName", alwaysBeforeDeps = list(), notAfterDeps = list(), removeUser = TRUE )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
removeUser
A logical
value. Indicates if the
name users are removed.
propertyLanguageName
A character
value. Name of the
language property.
pipe()
Preprocesses the Instance
to obtain/remove
the name users. The emoticons found in the data are added to the
list of properties of the Instance
.
FindUserNamePipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findUserName()
Finds the name users in the data.
FindUserNamePipe$findUserName(data)
data
A character
value. The text to search the
name users.
The list
with name users found.
removeUserName()
Removes the name users in the data.
FindUserNamePipe$removeUserName(data)
data
A character
value. The text where name users
will be removed.
The data with the name users removed.
clone()
The objects of this class are cloneable with this method.
FindUserNamePipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
Provides the required methods to successfully handle each
GenericPipe
class.
new()
Creates a GenericPipe object.
GenericPipe$new(propertyName, alwaysBeforeDeps, notAfterDeps)
pipe()
Abstract method to preprocess the Instance
.
GenericPipe$pipe(instance)
The preprocessed Instance
.
getPropertyName()
Gets of name of property.
GenericPipe$getPropertyName()
Value of name of property.
getAlwaysBeforeDeps()
Gets of the dependencies always before.
GenericPipe$getAlwaysBeforeDeps()
Value of dependencies always before.
getNotAfterDeps()
Gets of the dependencies not after.
GenericPipe$getNotAfterDeps()
Value of dependencies not after.
setPropertyName()
Changes the value of property's name.
GenericPipe$setPropertyName(propertyName)
propertyName
A character
value. The new value of the
property's name.
setAlwaysBeforeDeps()
Changes the value of dependencies always before.
GenericPipe$setAlwaysBeforeDeps(alwaysBeforeDeps)
alwaysBeforeDeps
A list
value. The new value of the
dependencies always before.
setNotAfterDeps()
Changes the value of dependencies not after.
GenericPipe$setNotAfterDeps(notAfterDeps)
notAfterDeps
A list
value. The new value of the
dependencies not after.
hash()
Generates an identification of pipe based on its fields.
GenericPipe$hash(algo = "md5")
algo
Algorithm to be applied. Options: "md5", "sha1", "crc32", "sha256", "sha512", "xxhash32", "xxhash64", "murmur32", "spookyhash
clone()
The objects of this class are cloneable with this method.
GenericPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, bdpar.log
,
ContractionPipe
, File2Pipe
,
FindEmojiPipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
ResourceHandler
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
Abstract super class to establish the flow of Pipes.
new()
Creates a GenericPipeline
object.
GenericPipeline$new()
execute()
Function where is implemented the flow of the
GenericPipes
.
GenericPipeline$execute(instance)
The preprocessed Instance
.
get()
Gets a list with containing the set of GenericPipes
of the pipeline.
GenericPipeline$get()
The set of GenericPipes
containing the pipeline.
toString()
Returns a character
representing the pipeline.
GenericPipeline$toString()
This function allows to set a place to define a character
representation of the structure of a pipeline.
GenericPipeline
character
representation
clone()
The objects of this class are cloneable with this method.
GenericPipeline$clone(deep = FALSE)
deep
Whether to make a deep clone.
bdpar.log
, DefaultPipeline
,
DynamicPipeline
, Instance
,
GenericPipe
, %>|%
Obtains the date using the method which implements the
subclass of Instance
.
This class inherit from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> GuessDatePipe
new()
Creates a GuessDatePipe
object.
GuessDatePipe$new( propertyName = "date", alwaysBeforeDeps = list("TargetAssigningPipe"), notAfterDeps = list() )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
pipe()
Preprocesses the Instance
to obtain the date.
GuessDatePipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
clone()
The objects of this class are cloneable with this method.
GuessDatePipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, FindUserNamePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
This class allows guess the language by using language detector of library cld2. Creates the language property which indicates the idiom text.
The Pipe will invalidate the Instance
if the language of the data
can not be detect.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> GuessLanguagePipe
new()
Creates a GuessLanguagePipe
object.
GuessLanguagePipe$new( propertyName = "language", alwaysBeforeDeps = list("StoreFileExtPipe", "TargetAssigningPipe"), notAfterDeps = list() )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
pipe()
Preprocesses the Instance
to obtain the
language of the data.
GuessLanguagePipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
getLanguage()
Guesses the language of data.
GuessLanguagePipe$getLanguage(data)
data
A character
value. The text to guess the
language.
The language guesser. Format: see ISO 639-3:2007.
clone()
The objects of this class are cloneable with this method.
GuessLanguagePipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, bdpar.Options
,
ContractionPipe
, File2Pipe
,
FindEmojiPipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
Instance
, InterjectionPipe
,
MeasureLengthPipe
, GenericPipe
,
SlangPipe
, StopWordPipe
,
StoreFileExtPipe
, TargetAssigningPipe
,
TeeCSVPipe
, ToLowerCasePipe
Provides the required methods to successfully handle each
Instance
class.
new()
Creates a Instance
object.
Instance$new(path)
path
A character
value. Path of the file.
obtainDate()
Abstract function responsible for obtaining the date of the
Instance
.
Instance$obtainDate()
obtainSource()
Abstract function responsible for determining the source of
the Instance
.
Instance$obtainSource()
getDate()
Gets the date.
Instance$getDate()
Value of date.
getSource()
Gets the source.
Instance$getSource()
Value of source.
getPath()
Gets the path.
Instance$getPath()
Value of path.
getData()
Gets the data.
Instance$getData()
Value of data.
getProperties()
Gets the properties
Instance$getProperties()
Value of properties.
setSource()
Modifies the source value.
Instance$setSource(source)
source
A character
value. The new value of source.
setData()
Modifies the data value.
Instance$setData(data)
data
A character
value. The new value of data.
setDate()
Modifies the date value.
Instance$setDate(date)
date
A character
value. The new value of date.
setProperties()
Modifies the properties value.
Instance$setProperties(properties)
properties
A list
value. The new list of properties.
addProperties()
Adds a property to the list of the properties.
Instance$addProperties(propertyValue, propertyName)
propertyValue
A Object
value. The value of the new property.
propertyName
A character
value. The name of the new
property.
getSpecificProperty()
Obtains a specific property.
Instance$getSpecificProperty(propertyName)
propertyName
A character
value. The name of the
property to obtain.
The value of the specific property.
isSpecificProperty()
Checks for the existence of an specific property.
Instance$isSpecificProperty(propertyName)
propertyName
A character
value. The name of the
property to check.
A logical results according to the existence of the specific property in the list of properties.
setSpecificProperty()
Modifies the value of the one property.
Instance$setSpecificProperty(propertyName, propertyValue)
propertyName
A character
value. The name of the
property.
propertyValue
A Object
value. The new value of the property.
getNamesOfProperties()
Gets of the names of all properties.
Instance$getNamesOfProperties()
The names of properties.
isInstanceValid()
Checks if the Instance
is valid.
Instance$isInstanceValid()
Value of isValid flag.
invalidate()
Forces the invalidation of an specific Instance
.
Instance$invalidate()
getFlowPipes()
Gets the list of the flow of GenericPipe
.
Instance$getFlowPipes()
Names of the GenericPipe
used.
addFlowPipes()
Gets the list of the flow of GenericPipe
.
Instance$addFlowPipes(namePipe)
namePipe
A character
value. Name of the new
GenericPipe
to be added in the GenericPipeline
.
getBanPipes()
Gets an array with containing all the ban
GenericPipe
.
Instance$getBanPipes()
Value of ban GenericPipe
array.
addBanPipes()
Added the name of the Pipe to the array that keeps the track
of GenericPipes
having running after restrictions.
Instance$addBanPipes(namePipe)
namePipe
A character
value.
GenericPipe
name to be introduced into the ban array.
checkCompatibility()
Check compatibility between GenericPipes
.
Instance$checkCompatibility(namePipe, alwaysBefore)
namePipe
A character
value. The name of the
GenericPipe
name to check the compatibility.
alwaysBefore
A list
value.
GenericPipes
that the Instance
had to go
through.
toString()
Returns a character
representing the instance
Instance$toString()
Instance
character
representation
clone()
The objects of this class are cloneable with this method.
Instance$clone(deep = FALSE)
deep
Whether to make a deep clone.
ExtractorEml
, ExtractorSms
,
ExtractorYtbid
InterjectionPipe
class is responsible for detecting
the existing interjections in the data field of each Instance
.
Identified interjections are stored inside the interjection field of
Instance
class. Moreover if needed, is able to perform inline
interjections removement.
InterjectionPipe
class requires the resource files (in json format)
containing the list of interjections. To this end, the language of the text
indicated in the propertyLanguageName should be contained in the
resource file name (ie. interj.xxx.json where xxx is the value defined in the
propertyLanguageName ). The location of the resources should be
defined in the "resources.interjections.path" field of
bdpar.Options variable.
InterjectionPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> InterjectionPipe
new()
Creates a InterjectionPipe
object.
InterjectionPipe$new( propertyName = "interjection", propertyLanguageName = "language", alwaysBeforeDeps = list("GuessLanguagePipe"), notAfterDeps = list(), removeInterjections = TRUE, resourcesInterjectionsPath = NULL )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
propertyLanguageName
A character
value. Name of the
language property.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
removeInterjections
A logical
value. Indicates if
the interjections are removed or not.
resourcesInterjectionsPath
A character
value. Path
of resource files (in json format) containing the interjections.
pipe()
Preprocesses the Instance
to obtain/remove
the interjections. The interjections found in the data are added to the
list of properties of the Instance
.
InterjectionPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findInterjection()
Checks if the interjection is in the data.
InterjectionPipe$findInterjection(data, interjection)
A logical
value depending on whether the
interjection is in the data.
removeInterjection()
Removes the interjection in the data.
InterjectionPipe$removeInterjection(interjection, data)
The data with the interjections removed.
getPropertyLanguageName()
Gets the name of property language.
InterjectionPipe$getPropertyLanguageName()
Value of name of property language.
getResourcesInterjectionsPath()
Gets the path of interjections resources.
InterjectionPipe$getResourcesInterjectionsPath()
Value of path of interjections resources.
setResourcesInterjectionsPath()
Sets the path of interjections resources.
InterjectionPipe$setResourcesInterjectionsPath(path)
path
A character
value. The new value of the path of
interjections resources.
clone()
The objects of this class are cloneable with this method.
InterjectionPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, bdpar.Options
,
ContractionPipe
, File2Pipe
,
FindEmojiPipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
MeasureLengthPipe
, GenericPipe
,
ResourceHandler
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
This class is responsible of obtain the length of thedata
field of each Instance
. Creates the length property
which indicates the length of the text. The property's name is customize
thought the class constructor.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> MeasureLengthPipe
new()
Creates a File2Pipe
object.
MeasureLengthPipe$new( propertyName = "length", alwaysBeforeDeps = list(), notAfterDeps = list(), nchar_conf = TRUE )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
nchar_conf
A logical
value. indicates if the pipe
uses nchar or object.size.
pipe()
Preprocesses the Instance
to obtain the
length of data.
MeasureLengthPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
getLength()
Preprocesses the Instance
to obtain the
length of data.
MeasureLengthPipe$getLength(data, nchar_conf = TRUE)
The Instance
with the modifications that have
occurred in the pipe.
clone()
The objects of this class are cloneable with this method.
MeasureLengthPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, FindUserNamePipe
,
GuessDatePipe
, GuessLanguagePipe
,
Instance
, InterjectionPipe
,
GenericPipe
, ResourceHandler
,
SlangPipe
, StopWordPipe
,
StoreFileExtPipe
, TargetAssigningPipe
,
TeeCSVPipe
, ToLowerCasePipe
Defines a customized forward pipe operator extending the
features of classical %>%. Concretely %>|% is able to stop the pipelining
process whenever an Instance
has been invalidated. This issue,
avoids executing the whole pipelining process for the invalidated
Instance
and therefore reduce the time and resources used to
complete the whole process.
lhs %>|% rhs
lhs %>|% rhs
lhs |
an |
rhs |
a function call using the bdpar semantics. |
The Instance
modified by the methods it has traversed.
This is the %>% operator of the modified magrittr library to both
(i) to stop the flow when the Instance
is invalid and (ii)
automatically call the pipe
function of the R6 objects passing
through it (iii) to check the dependencies of the Instance
and
(iv) to manage the pipeline cache.
The usage structure would be as shown below:
instance %>|% pipeObject$new() %>|% pipeObject$new(<<argument1>>, <<argument2>, ...) %>|% pipeObject$new()
Pipelining process is automatically stopped if the Instance
is invalid.
bdpar.Options
, Instance
,
GenericPipe
Class that handles different types of resources.
It is a class that allows store the resources that are needed in the
GenericPipes
to avoid having to repeatedly read from
the file. File resources of type json are read and stored in memory.
new()
Creates a ResourceHandler
object.
ResourceHandler$new()
isLoadResource()
From the resource path, it is checked if they have already been loaded. In this case, the list of the requested resource is returned. Otherwise, the resource variable is added to the list of resources, and the resource list is returned. In the event that the resource file does not exist, NULL is returned.
ResourceHandler$isLoadResource(pathResource)
pathResource
A (character) value. The resource file path.
The resources list is returned, if they exist.
getResources()
Gets of resources variable.
ResourceHandler$getResources()
The value of resources variable.
setResources()
Sets of resources variable.
ResourceHandler$setResources(resources)
resources
The new value of resources.
getNamesResources()
Gets of names of resources
ResourceHandler$getNamesResources()
Value of names of resources.
clone()
The objects of this class are cloneable with this method.
ResourceHandler$clone(deep = FALSE)
deep
Whether to make a deep clone.
runPipeline is responsible for easily initialize the pipelining preprocessing process.
runPipeline(path, extractors = ExtractorFactory$new(), pipeline = DefaultPipeline$new(), cache = TRUE, verbose = FALSE, summary = FALSE)
runPipeline(path, extractors = ExtractorFactory$new(), pipeline = DefaultPipeline$new(), cache = TRUE, verbose = FALSE, summary = FALSE)
path |
(character) path where the files to be preprocessed are located. |
extractors |
(ExtractorFactory) object implementing
the method |
pipeline |
(GenericPipeline) subclass of |
cache |
(logical) flag indicating if the status of the instances will be stored after each pipe. This allows to avoid rejections of previously executed tasks, if the order and configuration of the pipe and pipeline is the same as what is stored in the cache. |
verbose |
(logical) flag indicating for printing messages, warnings and errors. |
summary |
(logical) flag indicating if a summary of the pipeline execution is provided or not. |
List of Instance
that have been preprocessed.
In the case that some pipe, defined on the workflow, needs some type of configuration, it can be defined thought bdpar.Options variable which have different methods to support the functionality of different pipes.
Bdpar
, bdpar.Options
,
Connections
, DefaultPipeline
,
DynamicPipeline
, GenericPipeline
,
Instance
, ExtractorFactory
,
ResourceHandler
## Not run: #If it is necessary to indicate any existing configuration key, do it through: #bdpar.Options$set(key, value) #If the key is not initialized, do it through: #bdpar.Options$add(key, value) #If it is neccesary parallelize, do it through: #bdpar.Options$set("numCores", numCores) #If it is necessary to change the behavior of the log, do it through: #bdpar.Options$configureLog(console = TRUE, threshold = "INFO", file = NULL) #Folder with the files to preprocess path <- system.file("example", package = "bdpar") #Object which decides how creates the instances extractors <- ExtractorFactory$new() #Object which indicates the pipes' flow pipeline <- DefaultPipeline$new() #Starting file preprocessing... runPipeline(path = path, extractors = extractors, pipeline = pipeline, cache = FALSE, verbose = FALSE, summary = TRUE) ## End(Not run)
## Not run: #If it is necessary to indicate any existing configuration key, do it through: #bdpar.Options$set(key, value) #If the key is not initialized, do it through: #bdpar.Options$add(key, value) #If it is neccesary parallelize, do it through: #bdpar.Options$set("numCores", numCores) #If it is necessary to change the behavior of the log, do it through: #bdpar.Options$configureLog(console = TRUE, threshold = "INFO", file = NULL) #Folder with the files to preprocess path <- system.file("example", package = "bdpar") #Object which decides how creates the instances extractors <- ExtractorFactory$new() #Object which indicates the pipes' flow pipeline <- DefaultPipeline$new() #Starting file preprocessing... runPipeline(path = path, extractors = extractors, pipeline = pipeline, cache = FALSE, verbose = FALSE, summary = TRUE) ## End(Not run)
SlangPipe
class is responsible for detecting
the existing slangs in the data field of each Instance
.
Identified slangs are stored inside the slang field of
Instance
class. Moreover if needed, is able to perform inline
slangs replacement.
SlangPipe
class requires the resource files (in json format)
containing the correspondence between slangs and meaning. To this end,
the language of the text indicated in the propertyLanguageName should
be contained in the resource file name (ie. slang.xxx.json where xxx is the
value defined in the propertyLanguageName ). The location of the
resources should be defined in the "resources.slangs.path" field of
bdpar.Options variable.
SlangPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> SlangPipe
new()
Creates a SlangPipe
object.
SlangPipe$new( propertyName = "langpropname", propertyLanguageName = "language", alwaysBeforeDeps = list("GuessLanguagePipe"), notAfterDeps = list(), replaceSlangs = TRUE, resourcesSlangsPath = NULL )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
propertyLanguageName
A character
value. Name of the
language property.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
replaceSlangs
A logical
value. Indicates if
the slangs are replaced or not.
resourcesSlangsPath
A character
value. Path
of resource files (in json format) containing the correspondence between
slangs and meaning.
pipe()
Preprocesses the Instance
to obtain/replace
the slangs. The slangs found in the data are added to the
list of properties of the Instance
.
SlangPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findSlang()
Checks if the slang is in the data.
SlangPipe$findSlang(data, slang)
A logical
value depending on whether the
slang is in the data.
replaceSlang()
Replaces the slang in the data for the extendedSlang.
SlangPipe$replaceSlang(slang, extendedSlang, data)
The data with the slangs replaced.
getPropertyLanguageName()
Gets the name of property language.
SlangPipe$getPropertyLanguageName()
Value of name of property language.
getResourcesSlangsPath()
Gets the path of slangs resources.
SlangPipe$getResourcesSlangsPath()
Value of path of slangs resources.
setResourcesSlangsPath()
Sets the path of slangs resources.
SlangPipe$setResourcesSlangsPath(path)
path
A character
value. The new value of the path of
slangs resources.
clone()
The objects of this class are cloneable with this method.
SlangPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, bdpar.Options
,
ContractionPipe
, File2Pipe
,
FindEmojiPipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, ResourceHandler
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
StopWordPipe
class is responsible for detecting
the existing stop words in the data field of each Instance
.
Identified stop words are stored inside the contraction field of
Instance
class. Moreover if needed, is able to perform inline
stop words removement.
StopWordPipe
class requires the resource files (in json format)
containing the list of stop words. To this end, the language of the text
indicated in the propertyLanguageName should be contained in the
resource file name (ie. xxx.json where xxx is the value defined in the
propertyLanguageName ). The location of the resources should be
defined in the "resources.stopwords.path" field of
bdpar.Options variable.
StopWordPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> StopWordPipe
new()
Creates a StopWordPipe
object.
StopWordPipe$new( propertyName = "stopWord", propertyLanguageName = "language", alwaysBeforeDeps = list("GuessLanguagePipe"), notAfterDeps = list("AbbreviationPipe"), removeStopWords = TRUE, resourcesStopWordsPath = NULL )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
propertyLanguageName
A character
value. Name of the
language property.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
removeStopWords
A logical
value. Indicates if
the stop words are removed or not.
resourcesStopWordsPath
A character
value. Path
of resource files (in json format) containing the stop words.
pipe()
Preprocesses the Instance
to obtain/remove
the stop words. The stop words found in the data are added to the
list of properties of the Instance
.
StopWordPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
findStopWord()
Checks if the stop word is in the data.
StopWordPipe$findStopWord(data, stopWord)
A logical
value depending on whether the
stop word is in the data.
removeStopWord()
Removes the stop word in the data.
StopWordPipe$removeStopWord(stopWord, data)
The data with the stop words removed.
getPropertyLanguageName()
Gets the name of property language.
StopWordPipe$getPropertyLanguageName()
Value of name of property language.
getResourcesStopWordsPath()
Gets the path of stop words resources.
StopWordPipe$getResourcesStopWordsPath()
Value of path of stop words resources.
setResourcesStopWordsPath()
Sets the path of stop words resources.
StopWordPipe$setResourcesStopWordsPath(path)
path
A character
value. The new value of the path of
stop words resources.
clone()
The objects of this class are cloneable with this method.
StopWordPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, bdpar.Options
,
ContractionPipe
, File2Pipe
,
FindEmojiPipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, ResourceHandler
,
SlangPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe
Gets the extension of a file. Creates the extension property which indicates extension of the file.
StoreFileExtPipe
will automatically invalidate the
Instance
if it is not able to find the
extension from the path field.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> StoreFileExtPipe
new()
Creates a StoreFileExtPipe
object.
StoreFileExtPipe$new( propertyName = "extension", alwaysBeforeDeps = list(), notAfterDeps = list() )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
pipe()
Preprocesses the Instance
to obtain the
extension of Instance
.
StoreFileExtPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
obtainExtension()
Gets of extension of the path.
StoreFileExtPipe$obtainExtension(path)
path
A character
value. The path of the file to get
the extension.
Extension of the path.
clone()
The objects of this class are cloneable with this method.
StoreFileExtPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, FindUserNamePipe
,
GuessDatePipe
, GuessLanguagePipe
,
Instance
, InterjectionPipe
,
MeasureLengthPipe
, GenericPipe
,
ResourceHandler
, SlangPipe
,
StopWordPipe
, TargetAssigningPipe
,
TeeCSVPipe
, ToLowerCasePipe
This class allows searching in the path the target of
the Instance
.
The targets that are searched can be controlled through the constructor of the class where targetsName will be the string that is searched within the path and targets has the values that the property can take.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> TargetAssigningPipe
new()
Creates a TargetAssigningPipe
object.
TargetAssigningPipe$new( targets = list("ham", "spam"), targetsName = list("_ham_", "_spam_"), propertyName = "target", alwaysBeforeDeps = list(), notAfterDeps = list() )
targets
A list
value. Name of the targets property.
targetsName
A list
value. The name of folders.
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
pipe()
Preprocesses the Instance
to obtain the
target.
TargetAssigningPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
getTarget()
Gets the target from a path.
TargetAssigningPipe$getTarget(path)
path
A character
value. The path to analyze.
The target of the path.
checkTarget()
Checks if the target is in the path.
TargetAssigningPipe$checkTarget(target, path)
if the target is found, returns target, else returns "".
getTargets()
Gets of targets.
TargetAssigningPipe$getTargets()
Value of targets.
clone()
The objects of this class are cloneable with this method.
TargetAssigningPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, FindUserNamePipe
,
GuessDatePipe
, GuessLanguagePipe
,
Instance
, InterjectionPipe
,
MeasureLengthPipe
, GenericPipe
,
ResourceHandler
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TeeCSVPipe
, ToLowerCasePipe
Complete a CSV with the properties of the preprocessed
Instance
.
The path to save the properties should be defined in the "teeCSVPipe.output.path" field of bdpar.Options variable.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> TeeCSVPipe
new()
Creates a TeeCSVPipe
object.
TeeCSVPipe$new( propertyName = "", alwaysBeforeDeps = list(), notAfterDeps = list(), withData = TRUE, withSource = TRUE, outputPath = NULL )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
withData
A logical
value. Indicates if the data is
added to CSV.
withSource
A logical
value. Indicates if the source
is added to CSV.
outputPath
A character
value. The path of CSV.
pipe()
Completes the CSV with the preprocessed
Instance
.
TeeCSVPipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
clone()
The objects of this class are cloneable with this method.
TeeCSVPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, bdpar.Options
,
ContractionPipe
, File2Pipe
,
FindEmojiPipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, ResourceHandler
,
SlangPipe
, StopWordPipe
,
StoreFileExtPipe
, TargetAssigningPipe
,
ToLowerCasePipe
Class to convert the data field of an Instance
to lower case.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> ToLowerCasePipe
new()
Creates a ToLowerCasePipe
object.
ToLowerCasePipe$new( propertyName = "", alwaysBeforeDeps = list(), notAfterDeps = list() )
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
pipe()
Preprocesses the Instance
to convert the
data to lower case.
ToLowerCasePipe$pipe(instance)
The Instance
with the modifications that have
occurred in the pipe.
toLowerCase()
Converts the data to lower case
ToLowerCasePipe$toLowerCase(data)
data
A character
value. Text to preprocess.
The data in lower case.
clone()
The objects of this class are cloneable with this method.
ToLowerCasePipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, FindUserNamePipe
,
GuessDatePipe
, GuessLanguagePipe
,
Instance
, InterjectionPipe
,
MeasureLengthPipe
, GenericPipe
,
ResourceHandler
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe