-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider Get-PSHTMLDocument #250
Comments
Hi, I tried with the htmlagilitypack and ... well it's html oriented html, and it's almost the same. it also works on pscore (6.2) it's available here: https://html-agility-pack.net
|
here is a working example with htmlagilitypack, and core pshtml with classes like in #218 then, get html code from your favorite page, copy/paste it inside an html file and voila: PS C:\Users\Lx> $x = get-pshtmldocument -html $a
PS C:\Users\Lx> $x
TagName id Class Children
------- -- ----- --------
{$null}
#comment {}
html {, }
PS C:\Users\Lx> $x[2]
TagName id Class Children
PS C:\Users\Lx> $x[2].children[1].children
TagName id Class Children
------- -- ----- --------
script {}
script {var config = { autoCapture: { lineage: true }...
noscript {}
div headerArea uhf {headerRegion}
link {}
link {}
script {}
div page hfeed site {single-wrapper, wrapper-footer}
div a2a_kit a2a_kit_size_32 a2a_floating_style a2a_default_style {, , }
script {var CrayonSyntaxSettings = {"version":"_2.7.2_beta","is_admin":"0...
script {(function (undefined) {var _targetWindow ="prefer-popup"; window....
script {/*{literal}*/window.lightningjs||function(c){function g(b,d){d&&(...
div footerArea uhf {footerRegion}
link {}
link {}
script {}
script {//fix calendar hide when change month var string = window....
script {}
script {window.NREUM||(NREUM={});NREUM.info={"beacon":"bam.nr-data.net","...
PS C:\Users\Lx> the function itself: function get-pshtmldocument {
param (
$html
)
begin {
function HtmlToPSHTMLClass {
param(
$node
)
If ( $node.nodetype -ne 'Text' ) {
$plop = [htmlParentElement]::New()
$plop.SetTagName($node.Name)
$plop.Id = $node.Attributes.where({$_.name -eq 'id'}).Value
$plop.Class = $node.Attributes.where({$_.name -eq 'class'}).Value
If ( $node.hasChildNodes ) {
foreach ( $n in $node.childnodes ) {
##some nodes are 'empty' so i did this ... maybe a bug ???
If ( $n.nodetype -eq 'Text' -and $n.InnerText.trim() -ne '' ) {
$child = $n.InnerText
$plop.AddChild( $child )
} elseif ( $n.nodetype -ne 'Text') {
$child = HtmlToPSHTMLClass -node $n
$plop.AddChild( $child )
}
}
}
}
$plop
}
}
process {
$document = New-Object -TypeName HtmlAgilityPack.HtmlDocument
$document.LoadHtml($html)
Foreach( $node in $document.DocumentNode.ChildNodes ) {
HtmlToPSHTMLClass -node $node
}
}
end {
}
} |
A side note: The HTML Agility Pack (HAP) is MIT licenced. So we could strongly consider it... |
Another side note: It looks like Justin Grote already wrote a powershell implementation of the Agility Pack. |
It would be nice to have a function which could read a HTML page out, and send an object back, which could be developed further, or even converted to an PSHTML Powershell file (is that utopic?)
For that, we will need the ability to parse a HTML document.
This snippet might be an option to do so:
Once it is parsed (or while parsing) we could create for each html element the corrsponding PSHTML Object.
This would assume that this issue is closed and implemented first -> Create core PSHTML object (PSHTML.Document) #218
The text was updated successfully, but these errors were encountered: