
-----------------------------------
btiffin
Wed May 07, 2008 12:52 am

[REBOL] Script to pull images out of web pages
-----------------------------------
Hello,

   This script will pull images out of a web page.  I wrote it for darkangel, thought I might as well post it.
REBOL [
    Title: "snag images"
]

;; ** change the site **
site: http://www.rebol.com

tags: copy []
page: read site

; pull out all the html tags
parse page [
    some [to "" (append tags tag)]
    to end
]

; look for tags with "img", pull out the filename,
;   maybe append site to get full url, load image
foreach tag tags [
    if find tag "img" [
        attempt [
            start: find/tail tag {src="}
            end: find start {"}
            file: copy/part start end
            unless find file "http" [
                unless equal? first file #"/" [insert head file "/"]
            ]
            url: either find file "http" [file] [join site file]
            url: to url! url       
            img: load url
            print [url "is" img/size]
        ]
    ]
]
If anyone wants it explained, or info on what can be done other than just printing the  url and size (width by height) just ask.  This is a quick write, a lot of the lines could be removed and expressions condensed.  It could be made a function where you pass the site instead of hardcoding etc, etc.   By the way, REBOL's load function is doubleplus good.  Note; this won't handle all cases, by any means; ECMAScript calculated names, CGI hidden, etc etc.

Cheers
