Skip to content

Save html source into file (remove html=0 not working?)

[setup]
enabled=1
default checked=0
engine type=Moja

[URL]
type=url

[some_data]
type=extract
front=<
remove html=0

[STEP1]
link type=Comment-Contextual
write file="C:\Users\x\x\test.txt" "%some_data%"


Output in attachment.

Comments

  • SvenSven www.GSA-Online.de
    some data is missing a "back=..." ?
  • What if </html> is in the body (for example, article about html).

    Is there a better approach to get source?
  • SvenSven www.GSA-Online.de
    just use back=</html> and add it again when saving...
    write file="C:\Users\x\x\test.txt" "%some_data%</html>"
  • andrzejekandrzejek Polska
    edited November 2017
    1. back= is not fixing it...

    2. What if html looks like that...:

    <html>
    <head>
    <body>
    My article is about <html> and </html> tags.
    </head>
    </body>
    </html>


    3. What's better way to get source of page? Looks simple if you save them while debugging...

  • SvenSven www.GSA-Online.de
    ahh i see what problem you have now (just viewed attachment)....you need to add "allow html=1" here..."remove html" is not working.
  • @Sven ok, but let me ask 3rd time  <3  

    2. What if html looks like that...:

    <html>
    <head>
    <body>
    My article is about <html> and </html> tags.
    </head>
    </body>
    important stuff
    </html>


    My extracted data will be:

    <html>
    <head>
    <body>
    My article is about <html> and </html>

    Is that correct?


  • SOLVED
  • SvenSven www.GSA-Online.de
    Just to make it clear for everyone reading up here, yes it will cut things up at the first appearing of the "back" variable in the source. However, there will be no second </html> on a proper html source.
Sign In or Register to comment.