Up: Component summary Component

HTMLExtractor

Extracts HTML parts from a site

Version 0.1
Bundle tools
Categories HTML
Authors Sirkku Karinen (sirkku.karinen@significo.fi)
Issue tracker View/Report issues
Requires Python ; python-lxml (DEB)
Source files component.xml extract.py
Usage Example with default values

Inputs

Name Type Mandatory Description
html1 HTMLFile Mandatory HTML site to extract
html2 HTMLFile Optional HTML site to extract
htmlArray Array<HTMLFile> Optional Array of HTML sites to extract

Outputs

Name Type Description
head HTMLFile Head part
body HTMLFile Body part
script JavaScript JavaScript in external file FROM HEAD PART
style StyleSheet CSS in external file FROM HEAD PART

Parameters

Name Type Default Description
extractBody string "" Element that should be extracted from body. If not set, extracts whole body part. Currently extracts only parts directly under body-tag.
extractHead string "" Element that should be extracted from head. If not set, extracts whole head part. Currently extracts only parts directly under head-tag.

Test cases

Test case Parameters IN
html1
IN
html2
IN
htmlArray
OUT
head
OUT
body
OUT
script
OUT
style
case1 (missing) html1 (missing) (missing) (missing) (missing) script style
case2 properties html1 (missing) (missing) (missing) (missing) script style

extractBody=h1,
extractHead=title


Generated 2018-12-11 07:42:07 by Anduril 2.0.0