{"id":386,"date":"2019-09-19T17:55:27","date_gmt":"2019-09-19T23:55:27","guid":{"rendered":"http:\/\/draith.com\/?p=386"},"modified":"2019-09-19T17:55:27","modified_gmt":"2019-09-19T23:55:27","slug":"powershell-and-parsing-html-code-in-core","status":"publish","type":"post","link":"http:\/\/draith.azurewebsites.net\/?p=386","title":{"rendered":"Powershell and Parsing html code in Core"},"content":{"rendered":"\n\n\n<p>When working on a PowerShell webservice, I came across an interesting problem that I think is only going to crop up more and more.&nbsp; This webservice takes a json payload, performs some simple manipulation on the data, kicks off some automation, and logs some events.&nbsp; Part of the payload is html code, however.&nbsp; Of course the json payload doesn\u2019t care, but when it was coming into PowerShell it was being interpreted as a string.&nbsp; That meant that when you looked at the string, you literally saw html code:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-markup\">&lt;BODY>&lt;H3>OPEN Problem 256 in environment &lt;I>EPG&lt;\/I>&lt;\/H3>\n                               &lt;HR>\n                               &lt;B>1 impacted infrastructure component&lt;\/B>\n                               &lt;HR>\n                               &lt;BR>\n                               &lt;DIV>&lt;SPAN>Process&lt;\/SPAN>&lt;BR>&lt;B>&lt;SPAN style=\"FONT-SIZE: 120%; COLOR: #dc172a\">bosh-dns-health&lt;\/SPAN>&lt;\/B>&lt;BR>\n                               &lt;P style=\"MARGIN-LEFT: 1em\">&lt;B>&lt;SPAN style=\"FONT-SIZE: 110%\">Network problem&lt;\/SPAN>&lt;\/B>&lt;BR>Packet retransmission rate for process bosh-dns-health on host \n                               cloud_controller\/&lt;GUID> has increased to 18 %&lt;\/P>&lt;\/DIV>\n                               &lt;HR>\n                               Root cause\n                               &lt;HR>\n                               \n                               &lt;DIV>&lt;SPAN>Process&lt;\/SPAN>&lt;BR>&lt;B>&lt;SPAN style=\"FONT-SIZE: 120%; COLOR: #dc172a\">bosh-dns-health&lt;\/SPAN>&lt;\/B>&lt;BR>\n                               &lt;P style=\"MARGIN-LEFT: 1em\">&lt;B>&lt;SPAN style=\"FONT-SIZE: 110%\">Network problem&lt;\/SPAN>&lt;\/B>&lt;BR>Packet retransmission rate for process bosh-dns-health on host \n                               cloud_controller\/&lt;GUID> has increased to 18 %&lt;\/P>&lt;\/DIV>\n                               &lt;HR>\n                               \n                               &lt;P>&lt;A href=\"https:\/\/redacted.somewhere.com\/e\/&lt;GUID>\/#problems\/problemdetails;pid=-&lt;GUID>\">Open in Browser&lt;\/A>&lt;\/P>&lt;\/BODY> \n<\/code><\/pre>\n\n\n\n<p>I wanted to put this data in the events I was logging, but I obviously didn\u2019t want it to look like this.&nbsp; There isn\u2019t a PowerShell cmdlet for \u2018ConvertFrom-HTML\u2019, although that would be great.&nbsp; I could have tried to parse the text and remove the formatting, but that would be a pain trying to escape the right characters and account for all of the html tags.&nbsp; I found several ways online to handle this \u2013 namely creating a \u2018HTMLFile\u2019 object, writing the html code into it, and then extracting out the innertext property of the object.&nbsp; This works fine in some environments, but throws a weird error if you don\u2019t have Office installed.&nbsp; Here is the code:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-powershell\">$HTML = New-Object -Com \"HTMLFile\"\n$html.IHTMLDocument2_write($htmlcode)\n<\/code><\/pre>\n\n\n\n<p>And the error it throws when Office isn\u2019t installed:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code>Method invocation failed because [System.__ComObject] does not contain a method named 'IHTMLDocument2_write'.<\/code><\/pre>\n\n\n\n<p>There is also a catch here \u2013 that error will also be thrown if you are trying this in PowerShell Core\/Microsoft PowerShell (not Windows PowerShell \u2013 i.e. anything under 5.x)!&nbsp; Fortunately, there is a quick fix:<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-powershell\">$HTML = New-Object -Com \"HTMLFile\"\ntry {\n    $html.IHTMLDocument2_write($htmlcode)\n}\ncatch {\n    $encoded = [System.Text.Encoding]::Unicode.GetBytes($htmlcode)\n    $html.write($encoded)\n}\n$text = ($html.all | Where-Object { $_.tagname -eq 'body' } | Select-Object -Property innerText).innertext\n<\/code><\/pre>\n\n\n\n<p>A simple try\/catch, and it works great.\u00a0 I am going to create a new repo in GitHub and make this a new function.\u00a0 Hopefully you haven\u2019t wasted too much time trying to track this down \u2013 it took me a while, and it wasn\u2019t until I stumbled upon a similar solution on StackOverflow (<a href=\"https:\/\/nam04.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F46307976%2Funable-to-use-ihtmldocument2&amp;data=02%7C01%7C%7C15eefb2c1d3b4938fa5d08d73d3238e2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637045158214854267&amp;sdata=p%2F50n3YBi1yw59su0PRaevrdVMIIey7EMyTN6P%2FRQ8k%3D&amp;reserved=0\">https:\/\/stackoverflow.com\/questions\/46307976\/unable-to-use-ihtmldocument2<\/a>) that I was able to wrap it up.\u00a0 Expect the cmdlet\/function soon!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Method invocation failed because [System.__ComObject] does not contain a method named &#8216;IHTMLDocument2_write&#8217;.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[6,16],"class_list":["post-386","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-core","tag-powershell"],"_links":{"self":[{"href":"http:\/\/draith.azurewebsites.net\/index.php?rest_route=\/wp\/v2\/posts\/386","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/draith.azurewebsites.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/draith.azurewebsites.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/draith.azurewebsites.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/draith.azurewebsites.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=386"}],"version-history":[{"count":0,"href":"http:\/\/draith.azurewebsites.net\/index.php?rest_route=\/wp\/v2\/posts\/386\/revisions"}],"wp:attachment":[{"href":"http:\/\/draith.azurewebsites.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=386"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/draith.azurewebsites.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=386"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/draith.azurewebsites.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=386"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}