Haiti RSS Feed Challenge
From CrisisCommons Wiki
(Note - the UN-SPIDER workflow for imagery, video, etc is now a separate project found here, and will consume this project's outputs)
Contents |
[edit] Project Overview
Various groups (including the United Nations and Pentagon) have asked CrisisCommons to collate many media feeds related to the 2010 Haiti earthquake into useful forms for decision support, aggregation, and integration with other systems. News feeds on Haiti (RSS or otherwise) will eventually be displayed on the CrisisCommons news aggregator, news.crisiscommons.com, and tool sets we have used and created will be made accessible to other disaster relief and public safety agencies.
[edit] General Project Requirements
Currently, the working requirements for this RSS Aggregation project include:
- Aggregated RSS feed of news filtered by keywords
- Aggregated RSS feed of Haiti and UN-related Twitter tweets filtered by #tags which we can specify
- Aggregated RSS feed on Haiti UN staff information (who is missing, who has been rescued, who has been confirmed deceased)
- RSS feed on UN-related stories from the ground (Haitian Voices) which can be used in UN media channels
- List of specific issues which require UN assistance (where possible highlighting which UN agency should respond to this issue)
A more detailed list of what we need help can be found HERE
[edit] Project Info
[edit] Project Leads
- Penn, Chris irc:cantor twitter:@cantormath +1(225)590-5847
- Rustad, Roger irc:scubacuda, +1(949)209-9737 / AIM/Y!/Skype:RogerRustad
- Lih, Andrew Twitter/irc/Skype:fuzheado
- Sullivan, Mark
- Yamana, Sean +1(647)892-5498
- Schuback, Pascal +1(206)423-0970
- Nicholas, Chris
- Leson Heather irc: TOheather
[edit] Chat space information
If you want to help, please hop on the various channels and begin asking questions (preferably on IRC)
- IRC - #ccrssa
- http://irc.rhok.net (web interface, flash required)
- irc.rhok.net (72.14.187.155), standard ports (possibly 7000)
- IRC tutorials (CC IRC tutorial here)
- Voice conference line:
- MAIN: 619.276.6333 PIN 227772 ("ccrssa" on keypad)
- RSS feed line 619.276.6333 PIN 2277721 ("ccrssa 1" on keypad)
- Server Side 619.276.6333 PIN 2277722 ("ccrssa 2" on keypad)
- Google Groups - Crisis Commons RSS Aggregator
- Etherpad temporary notes to be rolled into wiki (please ask before editing) Netvibes workspace.
- /ccrssa (09/01/23) Main Room
- /100130-CCRSSA (hackathon end of day notes)
- /rsscategories (09/01/23)
- /rsscategories-fr (09/01/23)
- /rhok (not ours, different project)
[edit] Project Flow
Roughly, this project falls into three categories, with varying degrees of overlap. Much of these efforts are running in parallel and are slowly starting to merge.
[edit] data input
- Volunteers helping input and categorize RSS feeds
- Volunteers helping determine necessary categories
[edit] crunching data
- Processing raw data once it's in the system (combo of human and machine tools, such as Y! Pipes)
- Currently investigating other solutions
- If a website can be converted into an RSS feed, use feed43.com. Here's a tutorial: http://docs.google.com/Doc?docid=0AWh_U6m-obibZGZybmNwdDNfMjNkMzgzOXFkOQ&hl=en
[edit] data output
- misc DIY aggregation tools
- case-by-case basis (we can assist find the right combo of tools)
- keyword detection
- AJAX tools for easily sifting through mountains of data
- Drupal plugins (e.g. Managing News)
- distributing pre-made OPML files
[edit] Online Resources Currently in Use
[edit] RSS Input Forms
RSS Input Form (volunteers, please use this one)
Spreadsheet(s) (where the data goes)
NGO/Organization Registry (for project leads ONLY, please be careful while editing)
[edit] Y! Pipes Labs
Y! Pipes Source (some of these and more are still in progress).
- http://pipes.yahoo.com/pipes/pipe.edit?_id=f534c470b38fe96e23502f5b6c1084ec
- http://pipes.yahoo.com/seanyamana/spreadsheet2rss
- http://pipes.yahoo.com/pipes/pipe.info?_id=00c136ffb313ec0cb47cb6aaca3b7de7
- http://pipes.yahoo.com/pipes/pipe.info?_id=c0ed8176269fd2bc7c5f27307df071b7
[edit] Feed43
Create RSS feeds on the fly, as an alternative to Y! Pipes
- http://feed43.com/
- Tutorial: http://docs.google.com/Doc?docid=0AWh_U6m-obibZGZybmNwdDNfMjNkMzgzOXFkOQ&hl=en&pli=1
CCRSSA - Misc / Unsorted News Sites
[edit] 3rd Party RSS Solutions
The following RSS-related vendors, projects, and services have a lot of potential to greatly extended the functionality of our project.
[edit] Open Source Licensing
- Gregarius: a web-based RSS/RDF/ATOM feed aggregator, designed to run on your web server, allowing you to access your news sources from wherever you want (ask Chris for access to his demo box, if you are interested)
- Pentaho -- an open source data integration (ETL) tool that is a possible replacement for Yahoo Pipes. Pentaho has a graphical UI for integration design similar to yahoo pipes, but requires a hardware install to run. The transformation can run from cron on a server without the graphical UI once built. Pentaho would easily allow an output of the transformation to be a persistent store in a database. See an overview presentation of Pentaho.
- Managing News (MN) -- a free open source Drupal distribution, developed and designed by Development Seed. Managing News is a news and data aggregation engine with pluggable visualization and workflow tools. Features include: Aggregate RSS/Atom news, Show news as list or on a map, Search news, Republish news by bundling articles into channels, Configurable location tagging, Configurable maps. MN example: Afghanistan Election Data. MN video: a demonstration provided by Alex Barth during the November 2009 DC Drupal Meetup. Examples HERE.
- Drupal - Drupal is currently being used by UN-Spider (Video demo here). The UN-Spider architecture, (see also Drupal sites assisting Haiti relief).
- Drupal pieces include:
- Workflow
- Federated Delivery
- cli scripting abilities
[edit] Proprietary Licensing
- Attensa -- Attensa has offered their services on this project. Attensa is a commercial product without source code available. From a preliminarily review of their website and available documentation, Attensa could replace Yahoo Pipes in the v1 architecture. Attensa requires a hardware install and it's unclear if the appropriate hardware resources are available to the team for implementation. The team is currently testing this product. Charles Davidson and Jeff Nadler of Attensa gave the team a great demo via Adobe Acrobat. (While Attensa is proprietary licensing, under the hood it is F/OSSL: Lucene, Java, MySQL, Apache, etc.)
[edit] Project Updates
Most current deliverables, and our reasons for building it that way.
- a web input form for collecting RSS feeds
- a portal of RSS feeds, contributed to by citizens and utilized by relief agencies and ground workers in Haiti for the purpose of situational awareness.
- NGO and relief organizations need something light and extensible to help them search the web in crisis situations.
A brief description of how information flows through our project:
- Users enter RSS feeds using the Google Form
- Data from Google Form goes into Google doc spreadsheet
- Yahoo! Pipes digests web pages from that is aggregated into a final RSS, which can be then used in any RSS reader.
[edit] Saturday Feb 13.2010
Project teams collaborated: Boston, Washington, Kansas City, Toronto and Chicago
Summary: New form ready, 3/4 of the RSS feeds were validated and a new server is up
Next steps:
- Server web application testing continues
- Need UX/UI Testing of (TEST SITE ONLY not to be distributed as live): http://cc.stealcode.com/
- The Remainder of the RSS feeds need to be reviewed
Sub-teams project details: Etherpad: http://etherpad.com/ccrssafeb13
1. FORM:
- Created the next iteration of the input form incorporating new categories and new tags.
- Added Kreyol translations
- This version is now live.
https://spreadsheets.google.com/ccc?key=0AjdNBdiu2rw0dHFjbl9LU2daNFVUZktZcVkza2h6RUE&hl=en
2. RSS Validation:
- Started the validation effort of the RSS feeds already gathered.
Notes below show the color coding used on the spreadsheet to track RSS feed validation effort:
- validating RSS feeds on http://spreadsheets.google.com/ccc?key=0AjdNBdiu2rw0dEtESUJJakczSTY1WWRUS2E4T0dZeFE&hl=en
Instructions: A.Pick a color, then highlight a block of numbers in the sheet[that you will work on, highlight the names of organizations *only* to not allow for confusion.], and start working on checking rss feeds that may be listed B. If none are listed then visit site and see if they have RSS, then add that to spreadsheet and mark it green. If not, mark it as described bellow:
- mark confirmed feeds with green
- mark feeds in french in blue (until translation can confirm it's about Haiti)
- mark pages that could be turned into an RSS feed with orange (until they are converted)
- mark organizations with no page, or no page that could be turned into an RSS feed in Red
C. When you are done working, "unhighlight" the ones you were working on unless you plan on going back to them soon
Here's a tutorial: http://docs.google.com/Doc?docid=0AWh_U6m-obibZGZybmNwdDNfMjNkMzgzOXFkOQ&hl=en
3. TECH (server/db team):
- Server is setup
- Currently installing applications and setup server.
- Building database and testing
- Server work will continue. product not live yet. Need another week of testing
- Testing ux, server web apps, mysql, feed storage.
- The main test-only site is http://cc.stealcode.com/
Test site only, this site will be shut down and replaced by the CrisisCommons. This is a try out tool.
Once the db is up and tested, others will be able to flush out more categories
Problems:
- one form to rule the world because of how the data flows - This may become a usability problem as the current interface is cluttered with the two languages together. Depending on the back-end, it should be considered if/how this can be split into two.
- Tags - is this is the best way to articulate tags? Suggestions welcome and encouraged to improve tags - in terms of the list of tags from which people may choose, and the layout/description of the tags to keep it simple, clear and concise.
RSS Form Feed nice to haves:
- New category: Partial or full feeds
- Audiences: Regional, International, Local
[edit] Saturday Jan 30.2010
- added ~200 more new RSS feeds
- used Xenu to find RSS/XML/ATOMZ from known good group of URLs
- new Google spreadsheet file of RSS URLs of local orgs (not all haiti specific, but could easily make more relevant with right Y! Pipe)
- New planning for RSS form usability and categories Toronto_-_Haitian_RSS_Feed_Challenge (Jan. 30th, 2010)
- parallel efforts (but no concrete deliverables yet) in integrating...
- Server Side Discussion...
- Managing News
- Gregarius
- Y! Pipes integration
- tagging tools / methods
- Server Side Discussion...
[edit] Saturday Jan 23.2010
- Accomplishments:
- Created silo of raw data (RSS, XML) that is importable to more (ideally to more scalabe solution, such as Amazon SimpleDB)
- Completed draft form for RSS feeds (input feed) (with French translation)
- Created "Google Doc + Y! Pipes prototype" to feed various RSS digest engines (along with documented scalability problems)
- Teams collaborated both virtually (Baton Rouge (LA), Boulder (CO), Toronto (ON) (Canada), and Kansas City (MO) and in CrisisCamp cities (Boston (MA), Portland (OR), Seattle (WA), and Los Angeles (CA))
- Vendor relationships and possible roles established (Assenta)
- Established Twitter "sister project" (CC LocalTweet)
- Established admin and FTP login to news.crisiscommons.org and configured Admin Role and Admin Menu (Los Angeles, Chad)
- Types of Contributors:
- Categorization Team (raw RSS input, RSS classification, normalization, form usability)
- Aggegator team (prototype engine, architecture, integration with other vendor)
- Tools used: wiki, EtherPad, google wave, googledocs, gmail, googlegroups, Rondee, Skype, IRC, Yahoo Pipes,Twitter, and uStream.
- Known Technological difficulties:
- access to run a cron to auto import feeds on news.crisiscommons.org to upload 'features' for MN. (Cody currently working on that with ajturner.)
- Finding a better way to collaborate the twitter feeds gdocs. (We do have a working twitter feeds pipe.)
- Next Steps/Needs
- Define keywords and reliability of source (e.g. "Trust measure")
- Define project vision with more detail (see below for brainstormed outline)
- Scenarios or use cases (and users) would be great.
- Define output form (create prototype for beneficiaries of the system; UN etc) UI for the UN;
- Define a process/code for individuals to create their own UI (e.g. API).
- Project Vision - Developed by this TEAM.
- To deliver relevant information from RSS feeds to the UN and other relief / response groups, specifically:
- take input from various RSS feeds submitted via a web form
- sort and categorize information from the submitted feeds
- provide the appropriate sorted information/feeds to groups who can effectively use the information
- Participants Names
- Seattle Participants: Pascal Schuback, Derek Gaw, D. Leigh Higgins, Sharon, Rob Harshman
- Kansas City Participants: Mark Sullivan
- Toronto: Sean Yamana, Heather Leson
- Los Angeles: Roger Rustad, Chad Catacchio, Andrew Lih
- Baton Rouge: Chris Penn
- Chicago: Brian Guthrie
- Portland: Maura Brown
[edit] Things People Can Do
We could use help with:
- Getting MORE and BETTER raw RSS feeds of good, real time data
- "more" means...
- We are at the point of diminishing returns (duplicates, few unique feeds, finding aggregators of dubious value, etc)
- Help thinking about how to make non RSS friendly pages RSS friendly
- French RSS feeds
- "better" means...
- ways to give organizations better degree of granularity
- better ways to current raw list of feeds currently in Google Doc form (e.g. moving Haiti related feeds from Org Reg spreadsheet to RSS feed spreadsheet)
- intelligent dupe detection
- help thinking about feed categorization. ideas include:
- mainstream media (AP, reuters, NYT, Washington Post, etc)
- grass roots media (blogs, Twitter [it's own monster], etc)
- hybrid media (NGO, Twitter, organized grass roots media, etc)
- best of breed aggregation tools (e.g. google blog news search)
- "more" means...
- Thinking of better ways to scale
- Thinking of better ways to make this more accessible to humanitarian organizations
- Consider the Managing News (MN) distro described above. MN could work as a bidirectional feed in collaboration with the UN solution.
- Something server-side that they could easily deploy
- Give good step-by-step directions on rapidly deploying a quick turnkey server-side application (Dreamhost, Amazon EC2, etc)
- RSS formatting and category suggestions
- Using Y! Pipes more intelligently
- Slip Y! Pipes in front of certain feeds to increase quality
- Use Y! Pipes convert HTML pages to RSS friendly ones
- Thinking about better DB solutions, such as SimpleDB, MySQL, Postgre SQL, etc.
- Finding relevant Haitian Creole information
- helping NGOs know how to intelligently roll tools for their custom needs (no one tool/architecture/feed works for every organization)
[edit] Project Summary
- Status: Looking to Complete This weekend
- Project Twitter hashtag: #ccrssa
At the request of the United Nations Development Program, CrisisCommons has been asked to provide an exhaustive list of RSS feeds. More information on current Y!Pipes + GmailDoc solution.
- Project Leads: Penn, Chris, Rustad, Roger, Lih, Andrew, Sullivan, Mark, Yamana, Sean, Schuback, Pascal, Pontes, Deb, John, Evan Goldman, Stacen, Fein, Ian Oo, Whitney Williams, Luke Jones, Maple Kuo
- Customer: UNDP (United Nations Development Programme)
- Location of Lead:
- Collab info
- Google Groups - CCRSSA
- Delicious tag: ccrssa
- IRC - irc.rhok.net #ccrssa
- conference bridge
- US - 619-2-RONDEE (619-276-6333), 227772# ("CCRSSA" on your keypad)
- European Rondee Bridge (Germany) +49 157-02488180, 227772#
- Twitter - CCRSSA
Goal:
- validate RSS feeds (working with contact in Chicago)
- used for future crisis occurences
Current status:
- making a form for adding new RSS projects
- getting translated to French
- checking and verifying links on current database
- Assigned Project Manager(s) Across all Cities:
- Interested participants:[leson, heather]

