14 April 2008

XML in PHP5: the weather

My favorite weather-related Web site is the weather underground, but their pages can be a bit heavy. Usually I just want a quick summary of current conditions and a forecast for the next day or two. Thankfully, wunderground provides this in XML format. Here's the example for Portland, Oregon: HTML, XML.

I thought it would be fun to write a quick PHP program to download the XML file, parse it, and present it in an easy-to-read format. I decided to use the SimpleXML extension for PHP5, because my XML-parsing needs are pretty modest for this project. And I'll use the curl extension to fetch the XML file.

$url = 'http://rss.wunderground.com/auto/rss_full'
. '/OR/Portland.xml?units=both';

$ch = curl_init($url);
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $ch, CURLOPT_HEADER, 0 );
$xmlstr = curl_exec($ch);
$res_info = curl_getinfo($ch);
curl_close($ch);
if ( $res_info['http_code'] != 200 ) {
header( 'content-type: text/plain' );
die("couldn't open $url");
}

$xml = new SimpleXMLElement($xmlstr);
$epoch = strtotime( $xml->channel->pubDate );
$date = date( 'H:i:s l j F Y', $epoch );
$report_uri = htmlentities(
$xml->channel->item[0]->link );
$content = '';
$forecast_items = array();
foreach ( $xml->channel->item as $item ) {
$desc = strip_tags( $item->description );
$forecast_items[] = array(
'guid' => $item->guid,
'desc' => htmlentities( html_entity_decode($desc) ),
);
}

echo '<html><body><h1>Weather Underground Report</h1>',
'<h2>Portland, Oregon: ', $date, '</h2>';
foreach ( $forecast_items as $item ) {
$id = '';
if ( !empty($item['guid']) ) {
$id = ' id="' . $item['guid'] . '"';
}
echo "<p$id>", $item['desc'], '</p>';
}
echo '<p><a href="', $report_uri, '">Full report</a></p>',
'</body></html>';


There's some magic in the first foreach loop. Just as you should never trust anything typed into a Web form, you should also be skeptical of content from a foreign XML document, hence the strip_tags() and htmlentities() calls. But some of the characters in the wunderground XML are already HTML-encoded (like the degree symbol), so it's useful to call html_entity_decode() first (otherwise the temperature might look like "75&#176;F", rather than "75°F").

The code is otherwise straightforward. If you look at the raw XML, you'll find that the entire report is wrapped in a <channel> container, inside which the report date is wrapped in a <pubDate> container, etc. As its name implies, SimpleXML makes parsing XML pretty easy, and it's a great choice for small projects like this.

No comments: