A Beginner’s Guide to Open Data and Open Source for Environmental Justice

The City of New York’s official Open Data website lets you search for everything from street trees to water quality. NYC law mandates full coverage of NYC public data online by 2018.  Image from NYC Open Data.

The BOP Curriculum Team is hard at work writing and piloting middle school science units that hit state and city standards while engaging students in authentic, hands-on restoration work.  One core belief of our team is that our work should empower students to connect the science they’re learning to meaningful environmental justice work in their own communities.   But we would love to make this work accessible to people of all ages, not just students!  Our latest unit, the “Steward-shed Investigation,” rounds up open data resources that students can use to research pollution in their watersheds and propose suggestions for remediation.  The guide below provides additional context for that work, and is designed to also be a useful explainer for beginners to clarify terms like “open data” and “open source,” examine how the open data and open source movements can support environmental justice, explain how BOP fits with both, and most importantly, how you can participate!

What is “open data?”

Open data is “publicly available data structured for usability and computability that can be universally and readily accessed, used, and redistributed free of charge.”  You’ll most often hear this term in discussions about government data.  Non-partisan non-profits like the Sunlight Foundation advocate for governments around the world, at all levels, to release the data they collect in order to make them more transparent and accountable to their citizens.

New York City passed its first open data law in 2012, and since then, the city has worked to make more information available online, from street tree locations to water quality measurements and beyond.  According to the Mayor’s Office of Data Analytics, NYC’s open data law “mandates full coverage of all City public data by 2018.” (For a good overview on NYC Open Data, read this post from the Gotham Gazette, and check out the city’s official NYC Open Data website.)

But open data isn’t just about the government.  Many businesses, universities, and non-profits rely on open data for their work, and in turn, they can open their data to support others.  (We’ll talk about how BOP makes our data open later in this post.)

What are some of the problems with open data?

“Available” isn’t the same as “easy to use.”  Open data advocates say that accessing data must be a user-friendly process.  According to Mayor de Blasio, NYC’s goal is to make data usable by everyone, not just computer programmers.

As you look at government databases and maps, it’s worth considering:  How easy is it to find the information you’re looking for?  Are there changes that could make it better?  If you’ve got suggestions for improvements, let the agencies know!  Most sites have a feedback form or a contact section you can write to.

What is “open source” technology, and how is it related to open data?

It’s useful to think of data as bits of information.  In contrast, “open source” has to do with the technology that uses the data- like computer applications.  Applications are essentially a collection of tasks a user directs the computer to do.  The water temperature at every BOP Oyster Restoration Station is data.  An application would give a computer a task like “display water temperature data on a user’s screen when a user clicks ‘submit’.”

These tasks are written in programming languages (like C, Javascript, or Python).  The full set of tasks, written in a programming language, is called the “source code” of an application.  “Open source” means that the computer programmer has made the source code open to the public and free to use and build on.

Just like with open data, “open” doesn’t necessarily mean “accessible.”  There can be several barriers to entry, including:

  • Programming language choice.  Is the source code written in a common computer language that many people know, or a more obscure one?
  • Comments.  The source code for an application is written in a computer language, but most languages allow you to include comments, meant to be read by people instead of computers, in English or the programmer’s native language.  These comments can help someone looking at the source code better understand how the application works.  Source code without comments can make it more difficult for another programmer to use the code.
  • Documentation.  In addition to comments in the code, most applications should have documentation about how to use the program, how to get started, and how to contribute to the project.  If they don’t, it makes it more difficult for new users to contribute.  For example, the documentation for Google’s open source programming language “Go” includes a page that explains their specific process for how someone should contribute new code and when a contributor can expect it to be reviewed.
  • Code of Conduct and Community Moderation.  Open source projects are built and maintained by people, and fostering productive collaboration is key.  A good open source project includes a code of conduct that sets clear expectations for how people should participate and what they can expect from the project’s owner(s) in turn.  For example, the code of conduct for Go outlines a process for reporting conduct-related issues and provides a timeline for how quickly moderators will respond.  Creating a welcoming space for all contributors is crucial- see for example this style guide on writing gender-neutral code for Google’s Chromium, which is designed to make this project more inclusive to people of all genders.  Codes of conduct are generally enforced by moderators (on a small project, this might just be the project owner, but on a large project, it could be a whole team), but it’s helpful for the entire project community to hold each other accountable for creating a welcoming space.

Open source projects are an excellent tool for environmental justice- they allow a community to collect and use data in a way that’s best for them, based on their evolving needs, in a process that emphasizes collaboration over competition.

How do open data and open source connect to environmental justice?

In the US, many different government agencies are responsible for protecting the environment and public health, and they collect lots of data to help them make decisions that will impact the public.  Many universities and non-profits also collect data about the environment, whether it’s in support of a specific research project or informs their work more generally.

When people have access to this kind of data, they can advocate for the changes they want to see.  For example, in New York City, several government agencies, universities, and non-profit organizations record water quality measurements and/or make them available online.  This information can help regular New Yorkers decide when it’s safe to swim, fish, or do other recreational activities in the water.  But it also helps organizations decide what actions to take and to recommend to the public to address water pollution issues, whether it’s attending DEP meetings or writing to a city council member.  (For examples of actions you can take once you’ve investigated your local water quality, check out the Clean Water Steward Workbook from S.W.I.M. Coalition, “a coalition dedicated to ensuring swimmable waters around New York City through natural, sustainable stormwater management practices in our neighborhoods.”)

Open sourcing the tools for accessing open data takes a good thing and makes it even better.  It means that a water steward in California can tweak a tool developed in New York to create a data visualizer specific to their community.  It’s why we’ve made the BOP Digital Platform open source, in addition to making the data accessible- find out more in the next section!

How does BOP fit in?

BOP Schools and Citizen Science involves dozens of schools and hundreds of New York City students in monitoring Oyster Restoration Stations (ORS).  On a monitoring expedition, students and citizen scientists conduct water quality testing, record site conditions, measure oyster growth, and catalog mobile and sessile species found in the ORS.  They upload this data to the BOP Digital Platform.  All platform users can search or download data from all published expeditions via the Data page of the site.  (For more information, check out our guide to downloading and working with platform data.)

BOP students use this data to conduct original research projects that they present at the Annual BOP Research Symposium.  In addition, we love to see teachers using the data however they like to teach key science and math skills and concepts!  Our BOP professional development events often include activities designed to help teachers and students use platform data and make it tangible and meaningful for students (like a recent PD on building oyster measurement frequency distribution histograms from index cards).  The BOP team also uses platform data to help make decisions about where to site bigger restoration projects, like BOP’s Community Reefs.

In addition to making our data open, the BOP Digital Platform is an example of an open source application, because we’ve made the source code available on Github, a website where people, businesses, and non-profits store code they’ve written.  Github also includes tools to help programmers collaborate on applications.

If you’re new to programming but interested in contributing to open source projects like the BOP Digital Platform, Github has a set of helpful guides, including one on best practices for contributing.  And whether you code or not, we want you to be a part of building our digital community!  Let us know what you think of the platform by giving us feedback- click the “?” symbol at the top to submit your comments.

What are some existing open data and open source projects that focus on environmental justice?

Public Lab “is a community — supported by a 501(c)3 non-profit — which develops and applies open-source tools to environmental exploration and investigation. By democratizing inexpensive and accessible Do-It-Yourself techniques, Public Lab creates a collaborative network of practitioners who actively re-imagine the human relationship with the environment.”  Their list of community partners around the world (including several in NYC!) is an inspiring array of organizations who are using what Public Lab calls “civic technology” to pursue environmental justice.  Their Map Knitter tool pairs their open source code with DIY aerial balloon photography kits (you can make or buy one for under $100) that you can use to document and map environmental incidents in your neighborhood (organizers used this to empower Gulf Coast residents to map the BP oil spill for themselves in 2010).  And their “Open Water” project includes Riffle, an open source water monitoring approach that uses sensors inside a water bottle.

This post in Nature looks at citizen science air quality sensor projects.

Open datasets of interest to the BOP Community

Here’s a short list of water quality open data for NYC that you can use to investigate your local waterway:




  • • New York City Water Trail Association (NYCWTA) Water Quality Data.  NYCWTA is a not-for-profit stewardship group comprising over 20 community-based non-motorized boating organizations in and around New York City.  Their citizen science water quality testing program tests for Enterococcus near the shore where both CSOs and many recreational activities take place, as opposed to the NYC DEP, which generally tests in so-called “center waterways” that some argue present more favorable results because they are further from pollution sources (for more context, read these public comments to the NYS DEC).  They test mainly during the recreational season.  You can find their testing site locations on a map- each marker on the map provides a link to a spreadsheet of that site’s data.
  • • The non-profit Riverkeeper’s citizen science water quality testing program provides the NYCWTA’s data on their website as well, along with Hudson River data from their samples (mostly during the recreational season).  You can view their data by location here (individual locations are listed in the right sidebar).
  • • Billion Oyster Project– go to the “Data” section under the menu in the left sidebar.

Other useful open environmental datasets and/or data visualizers, with some tips for use:

  • • Oasis map (city-level).  An interactive map that allows you to turn features on and off including open space, CSOs, community groups, etc. Check and uncheck the boxes to see and hide features.  If there are two boxes next to a feature, the first box is usually a symbol and the second is a written label.  This map draws from many different government and non-profit data sources.
  • • NYC Open Data 
    • -When you do a search, you can filter the results using the left sidebar.
    • -Try setting the category as“Environment” and browsing.
    • -Try setting the type as “Maps” and browsing.
    • -Note that there is official data and community generated data.
    • -Try browsing the tag “water.”  
  • • NYC Office of Environmental Remediation “SPEED” map.  This city map shows sites where petroleum and chemicals are stored, hazardous waste sites, spill locations, solid waste facilities, and more.
  • • New York State Department of Environmental Conservation (DEC) databases and maps (state-level)
  • • Data.gov (federal level)
    • -This site is the open data collection for the US federal government, which also includes data from some city and state level government agencies.
    • -There is a lot of information here, and it can be hard to sift through, but it might be worth browsing.
  • • Environmental Protection Agency (EPA) databases and maps (federal level)
    • MyEnvironment page offers a snapshot of your local conditions, including air and water quality.  The MyMaps section of the page is a good place to start investigating your steward-shed’s conditions.
    • Envirofacts is the EPA’s master database/map of places that are subject to environmental regulations or of environmental interest.  It includes some information from state databases, in addition to federal databases.
      • -On the Topic Searches page, you can also try separate searches to get specific information about places that produce air pollution, facilities that have permits to discharge wastewater, locations that store hazardous materials near you, etc.
      • -(You might be wondering- why are manhole covers all over this map? Underground electrical and communications infrastructure, like manholes and vaults, can fill up with water and sediment that has been polluted with oil, lead, and other contaminants from the streets and from electrical equipment.  When electrical companies remove this sediment, they sometimes have to treat it as hazardous waste.  This means the manholes end up falling under the “Resource Conservation and Recovery Act,” the law that creates the framework for dealing with hazardous and non-hazardous solid waste.)

For furthering reading on open data…

Posted 5/1/17 at 8:19 PM