Haiti RSS Feed Challenge

From CrisisCommons Wiki

Jump to: navigation, search
Haiti RSS Feed Challenge
Icon for Haiti RSS Feed Challenge

(Note - the UN-SPIDER workflow for imagery, video, etc is now a separate project found here, and will consume this project's outputs)

Contents

[edit] Project Overview

Various groups (including the United Nations and Pentagon) have asked CrisisCommons to collate many media feeds related to the 2010 Haiti earthquake into useful forms for decision support, aggregation, and integration with other systems. News feeds on Haiti (RSS or otherwise) will eventually be displayed on the CrisisCommons news aggregator, news.crisiscommons.com, and tool sets we have used and created will be made accessible to other disaster relief and public safety agencies.

[edit] General Project Requirements

Currently, the working requirements for this RSS Aggregation project include:

  • Aggregated RSS feed of news filtered by keywords
  • Aggregated RSS feed of Haiti and UN-related Twitter tweets filtered by #tags which we can specify
  • Aggregated RSS feed on Haiti UN staff information (who is missing, who has been rescued, who has been confirmed deceased)
  • RSS feed on UN-related stories from the ground (Haitian Voices) which can be used in UN media channels
  • List of specific issues which require UN assistance (where possible highlighting which UN agency should respond to this issue)

A more detailed list of what we need help can be found HERE


[edit] Project Info

[edit] Project Leads

[edit] Chat space information

If you want to help, please hop on the various channels and begin asking questions (preferably on IRC)

  • IRC - #ccrssa
  • Voice conference line:
    • MAIN: 619.276.6333 PIN 227772 ("ccrssa" on keypad)
    • RSS feed line 619.276.6333 PIN 2277721 ("ccrssa 1" on keypad)
    • Server Side 619.276.6333 PIN 2277722 ("ccrssa 2" on keypad)
  • Google Groups - Crisis Commons RSS Aggregator
  • Etherpad temporary notes to be rolled into wiki (please ask before editing) Netvibes workspace.
    • /ccrssa (09/01/23) Main Room
    • /100130-CCRSSA (hackathon end of day notes)
    • /rsscategories (09/01/23)
    • /rsscategories-fr (09/01/23)
    • /rhok (not ours, different project)

[edit] Project Flow

Roughly, this project falls into three categories, with varying degrees of overlap. Much of these efforts are running in parallel and are slowly starting to merge.

[edit] data input

  • Volunteers helping input and categorize RSS feeds
  • Volunteers helping determine necessary categories

[edit] crunching data

[edit] data output

  • misc DIY aggregation tools
  • case-by-case basis (we can assist find the right combo of tools)
  • keyword detection
  • AJAX tools for easily sifting through mountains of data
  • Drupal plugins (e.g. Managing News)
  • distributing pre-made OPML files

[edit] Online Resources Currently in Use

[edit] RSS Input Forms

RSS Input Form (volunteers, please use this one)

Spreadsheet(s) (where the data goes)

NGO/Organization Registry (for project leads ONLY, please be careful while editing)

[edit] Y! Pipes Labs

Y! Pipes Source (some of these and more are still in progress).

[edit] Feed43

Create RSS feeds on the fly, as an alternative to Y! Pipes


CCRSSA - Misc / Unsorted News Sites

[edit] 3rd Party RSS Solutions

The following RSS-related vendors, projects, and services have a lot of potential to greatly extended the functionality of our project.

[edit] Open Source Licensing

[edit] Proprietary Licensing

  • Attensa -- Attensa has offered their services on this project. Attensa is a commercial product without source code available. From a preliminarily review of their website and available documentation, Attensa could replace Yahoo Pipes in the v1 architecture. Attensa requires a hardware install and it's unclear if the appropriate hardware resources are available to the team for implementation. The team is currently testing this product. Charles Davidson and Jeff Nadler of Attensa gave the team a great demo via Adobe Acrobat. (While Attensa is proprietary licensing, under the hood it is F/OSSL: Lucene, Java, MySQL, Apache, etc.)

[edit] Project Updates

Most current deliverables, and our reasons for building it that way.

  • a web input form for collecting RSS feeds
  • a portal of RSS feeds, contributed to by citizens and utilized by relief agencies and ground workers in Haiti for the purpose of situational awareness.
  • NGO and relief organizations need something light and extensible to help them search the web in crisis situations.

A brief description of how information flows through our project:

[edit] Saturday Feb 13.2010

Project teams collaborated: Boston, Washington, Kansas City, Toronto and Chicago

Summary: New form ready, 3/4 of the RSS feeds were validated and a new server is up

Next steps:

  1. Server web application testing continues
  2. Need UX/UI Testing of (TEST SITE ONLY not to be distributed as live): http://cc.stealcode.com/
  3. The Remainder of the RSS feeds need to be reviewed

Sub-teams project details: Etherpad: http://etherpad.com/ccrssafeb13

1. FORM:

  1. Created the next iteration of the input form incorporating new categories and new tags.
  2. Added Kreyol translations
  3. This version is now live.

https://spreadsheets.google.com/ccc?key=0AjdNBdiu2rw0dHFjbl9LU2daNFVUZktZcVkza2h6RUE&hl=en

2. RSS Validation:

  • Started the validation effort of the RSS feeds already gathered.

Notes below show the color coding used on the spreadsheet to track RSS feed validation effort:

Instructions: A.Pick a color, then highlight a block of numbers in the sheet[that you will work on, highlight the names of organizations *only* to not allow for confusion.], and start working on checking rss feeds that may be listed B. If none are listed then visit site and see if they have RSS, then add that to spreadsheet and mark it green. If not, mark it as described bellow:

  • mark confirmed feeds with green
  • mark feeds in french in blue (until translation can confirm it's about Haiti)
  • mark pages that could be turned into an RSS feed with orange (until they are converted)
  • mark organizations with no page, or no page that could be turned into an RSS feed in Red

C. When you are done working, "unhighlight" the ones you were working on unless you plan on going back to them soon

Here's a tutorial: http://docs.google.com/Doc?docid=0AWh_U6m-obibZGZybmNwdDNfMjNkMzgzOXFkOQ&hl=en

3. TECH (server/db team):

  1. Server is setup
  2. Currently installing applications and setup server.
  3. Building database and testing
  4. Server work will continue. product not live yet. Need another week of testing
  5. Testing ux, server web apps, mysql, feed storage.
  6. The main test-only site is http://cc.stealcode.com/

Test site only, this site will be shut down and replaced by the CrisisCommons. This is a try out tool.

Once the db is up and tested, others will be able to flush out more categories

Problems:

  • one form to rule the world because of how the data flows - This may become a usability problem as the current interface is cluttered with the two languages together. Depending on the back-end, it should be considered if/how this can be split into two.
  • Tags - is this is the best way to articulate tags? Suggestions welcome and encouraged to improve tags - in terms of the list of tags from which people may choose, and the layout/description of the tags to keep it simple, clear and concise.

RSS Form Feed nice to haves:

  • New category: Partial or full feeds
  • Audiences: Regional, International, Local

[edit] Saturday Jan 30.2010

  • added ~200 more new RSS feeds
    • used Xenu to find RSS/XML/ATOMZ from known good group of URLs
    • new Google spreadsheet file of RSS URLs of local orgs (not all haiti specific, but could easily make more relevant with right Y! Pipe)
    • New planning for RSS form usability and categories Toronto_-_Haitian_RSS_Feed_Challenge (Jan. 30th, 2010)

[edit] Saturday Jan 23.2010

  • Accomplishments:
    • Created silo of raw data (RSS, XML) that is importable to more (ideally to more scalabe solution, such as Amazon SimpleDB)
    • Completed draft form for RSS feeds (input feed) (with French translation)
    • Created "Google Doc + Y! Pipes prototype" to feed various RSS digest engines (along with documented scalability problems)
    • Teams collaborated both virtually (Baton Rouge (LA), Boulder (CO), Toronto (ON) (Canada), and Kansas City (MO) and in CrisisCamp cities (Boston (MA), Portland (OR), Seattle (WA), and Los Angeles (CA))
    • Vendor relationships and possible roles established (Assenta)
    • Established Twitter "sister project" (CC LocalTweet)
    • Established admin and FTP login to news.crisiscommons.org and configured Admin Role and Admin Menu (Los Angeles, Chad)
  • Types of Contributors:
    • Categorization Team (raw RSS input, RSS classification, normalization, form usability)
    • Aggegator team (prototype engine, architecture, integration with other vendor)
  • Tools used: wiki, EtherPad, google wave, googledocs, gmail, googlegroups, Rondee, Skype, IRC, Yahoo Pipes,Twitter, and uStream.
  • Known Technological difficulties:
    • access to run a cron to auto import feeds on news.crisiscommons.org to upload 'features' for MN. (Cody currently working on that with ajturner.)
    • Finding a better way to collaborate the twitter feeds gdocs. (We do have a working twitter feeds pipe.)
  • Next Steps/Needs
    • Define keywords and reliability of source (e.g. "Trust measure")
    • Define project vision with more detail (see below for brainstormed outline)
    • Scenarios or use cases (and users) would be great.
    • Define output form (create prototype for beneficiaries of the system; UN etc) UI for the UN;
    • Define a process/code for individuals to create their own UI (e.g. API).
  • Project Vision - Developed by this TEAM.
    • To deliver relevant information from RSS feeds to the UN and other relief / response groups, specifically:
    • take input from various RSS feeds submitted via a web form
    • sort and categorize information from the submitted feeds
    • provide the appropriate sorted information/feeds to groups who can effectively use the information
  • Participants Names

[edit] Things People Can Do

We could use help with:

  • Getting MORE and BETTER raw RSS feeds of good, real time data
    • "more" means...
      • We are at the point of diminishing returns (duplicates, few unique feeds, finding aggregators of dubious value, etc)
      • Help thinking about how to make non RSS friendly pages RSS friendly
      • French RSS feeds
    • "better" means...
      • ways to give organizations better degree of granularity
      • better ways to current raw list of feeds currently in Google Doc form (e.g. moving Haiti related feeds from Org Reg spreadsheet to RSS feed spreadsheet)
      • intelligent dupe detection
      • help thinking about feed categorization. ideas include:
        • mainstream media (AP, reuters, NYT, Washington Post, etc)
        • grass roots media (blogs, Twitter [it's own monster], etc)
        • hybrid media (NGO, Twitter, organized grass roots media, etc)
        • best of breed aggregation tools (e.g. google blog news search)
  • Thinking of better ways to scale
  • Thinking of better ways to make this more accessible to humanitarian organizations
    • Consider the Managing News (MN) distro described above. MN could work as a bidirectional feed in collaboration with the UN solution.
    • Something server-side that they could easily deploy
    • Give good step-by-step directions on rapidly deploying a quick turnkey server-side application (Dreamhost, Amazon EC2, etc)
  • RSS formatting and category suggestions
  • Using Y! Pipes more intelligently
    • Slip Y! Pipes in front of certain feeds to increase quality
    • Use Y! Pipes convert HTML pages to RSS friendly ones
  • Thinking about better DB solutions, such as SimpleDB, MySQL, Postgre SQL, etc.
  • Finding relevant Haitian Creole information
  • helping NGOs know how to intelligently roll tools for their custom needs (no one tool/architecture/feed works for every organization)

[edit] Project Summary

  • Status: Looking to Complete This weekend
  • Project Twitter hashtag: #ccrssa

At the request of the United Nations Development Program, CrisisCommons has been asked to provide an exhaustive list of RSS feeds. More information on current Y!Pipes + GmailDoc solution.

Goal:

  • validate RSS feeds (working with contact in Chicago)
  • used for future crisis occurences

Current status:

  • making a form for adding new RSS projects
  • getting translated to French
  • checking and verifying links on current database
  • Assigned Project Manager(s) Across all Cities:
  • Interested participants:[leson, heather]
Personal tools