Computer Science Canada

[REBOL] Script to pull images out of web pages

Author:  btiffin [ Wed May 07, 2008 12:52 am ]
Post subject:  [REBOL] Script to pull images out of web pages

Hello,

This script will pull images out of a web page. I wrote it for darkangel, thought I might as well post it.
code:

REBOL [
    Title: "snag images"
]

;; ** change the site **
site: http://www.rebol.com

tags: copy []
page: read site

; pull out all the html tags
parse page [
    some [to "<" copy tag thru ">" (append tags tag)]
    to end
]

; look for tags with "img", pull out the filename,
;   maybe append site to get full url, load image
foreach tag tags [
    if find tag "img" [
        attempt [
            start: find/tail tag {src="}
            end: find start {"}
            file: copy/part start end
            unless find file "http" [
                unless equal? first file #"/" [insert head file "/"]
            ]
            url: either find file "http" [file] [join site file]
            url: to url! url       
            img: load url
            print [url "is" img/size]
        ]
    ]
]

If anyone wants it explained, or info on what can be done other than just printing the url and size (width by height) just ask. This is a quick write, a lot of the lines could be removed and expressions condensed. It could be made a function where you pass the site instead of hardcoding etc, etc. By the way, REBOL's load function is doubleplus good. Note; this won't handle all cases, by any means; ECMAScript calculated names, CGI hidden, etc etc.

Cheers


: