APIs: How To Ask Robots For Data (And Be Nice About It!)
API: Application Programming Interface.
No matter how often I looked it up, that acronym would not make sense in my head. “Application” made some sense, although that term can mean a million things in tech. But “Programming Interface”? “Programming” is a verb. An action word, if you will. And my only association with the word “interface” is something with a screen and a clacky-clack keyboard in science fiction shows… or a horrifically dumb way to talk about client meetings.
My college degree is in Cultural Anthropology (not the Indiana Jones kind, the other kind, with the fun kind of writing). I was almost to the end of my Data Analytics boot camp and I needed to figure out how to use an API to get the data I needed for my final project. Those three little words, those vague, tech-y words, were the key to everything I needed.
Even now I get frustrated just looking at them. So, let’s make some sense from them, shall we?
Note: I am sure I will completely butcher several technical terms in this entry, and I am ok with that. I can’t translate tech-speak without using words that mean things to other people. Yes, I’ve Google’d. No, it’s not enough. I’m doing my best here. Reader discretion is advised.
Word 1: APPLICATION
APIs are little robots that go fetch data for you (if you ask nicely). They use your request to pull just the information you want, wrap it up in a type of file that both you and the robot like to use, and drop it off. The place they deposit the data is a spot you make for it: your “application”, if you will.
A lot of the time, people using APIs will have already created a program that connects to the API robot easily to schedule regular data requests. I didn’t have anything like that. All I had was a Google Colab notebook (more on those later) and a sample string of code. My “application” was basically, me. I’m the application. Please give me data?
Word 2: PROGRAMMING
Programming is a verb, not an adjective, and I will fight anyone about that. But the basic meaning comes across: you need to know how to talk to a computer in order to talk to this robot.
Requests generally need three main parts: the url of the API’s data source, words used to filter the data you’re asking for (parameters), and the word indicating what form you want the data to take.
For example, we had to ask the GitHub API for a list of all their repositories with the word “tetris” in the name, written with Assembly language. GitHub asked for three parameters: The word to filter on, what you use to sort the results, and whether the sort is ascending or descending.
q = “tetris+language:assembly”
sort = “stars”
order = “desc”
The API’s URL is https://api.github.com/search/repositories. The data needed to come in a “json” file, which opens in the window as a list of what you want surrounded by code to put it in order.
Altogether, this is the request: https://api.github.com/search/repositories?q=tetris+language:assembly&sort=stars&order=desc . If you put that in a browser, you’ll see a page with the json-style list on it. That’s your data! Assuming you really wanted to see tetris-related Github content, you are good to go!
Word 3: INTERFACE
This is a noun, yes, but not a very descriptive one. If you’re going by the “person, place, or thing” definition of a noun, it’s certainly a “thing”, but the screen and keyboard in my head are not necessary. The “interface” is the little robot made by your data source. It’s the API itself. And it’s happy to meet you and learn all about what kind of data you need.
That’s the basics! After a couple examples I had a pretty decent grasp on what an API does. I read through my final boot camp assignment, donned my explorer’s cap, and ventured out into the interwebs to find one myself.
There were… some false starts.
The assignment was to find some data out in the wild and pull it into a Google Colab notebook. These things are great; they’re basically like Google Docs for Python code. You can write some text, then make a line of code, run that code, and write more text underneath. It’s brilliant. Makes you feel powerful.
That’s a bit dangerous. I have never been accused of excessive humility. For my first attempt at using an API, I didn’t want some dull, normie dataset. I wanted a cool dataset. I wanted something bold, intense, challenging…
ATTEMPT #1
I went to NASA’s website to look up weather data gathered by rovers on Mars.
To find the API I literally typed “API” in the search bar at the top of NASA’s site. I don’t know if there’s a better way, but this was pretty efficient. NASA obliged me with this page that let me browse all their APIs; it seems like you need a different API for different major data sources.
I clicked on the option for weather data from the Mars rover Insight. I found the documentation for this particular API, which tells you how to use it and what parameters you might need to include. Everything was going well until:
“Insight Has Temporarily Suspended Temperature Measurements!”
This thing was severely missing data. I wasn’t going to get what I needed from it at all.
ATTEMPT #2
I moved on to the Mars Rover Photos API. This documentation was elegant and simple and showed really clear examples of requests I could make to it. The information turned out to be a little more simple than I thought, though. The API let me grab information about each day’s photo: when it was taken, how big it was, the camera used for it, which Rover, the Mars sol/day it was taken, that kind of thing. But it didn’t have anything about what was in the photo itself.
It took me a long time to get my code right to pull anything into my Google Colab notebook. When I finally got the hang of that part, I realized I was just pulling in json files. I knew it was possible to turn that blob of text into clean, readable information, but it felt like one hurdle too many for the data I’d get in return.
I checked the documentation again. Sols, rover names, names of each camera, the date of each picture from each camera… was that kind of data worth all this? What was I trying to find? What were the questions I was trying to answer?
Information about Mars is very cool. Data from Mars ROVERS? Even cooler. But this dataset did not hold vast secrets. All I could do with this information was count the number of pictures taken by a single rover, or by a single camera, or over a certain length of time. The questions I could answer with that information were things like, “How many rear-facing pictures did Opportunity take compared to Perseverance? How did number of pictures taken change over the past Martian year?”
Don’t get me wrong, those aren’t bad questions. They just weren’t worth me putting in so much effort to try and parse all that json for this class.
It was time to abandon my dream of discovering new truths about the universe from NASA’s space robots. Instead I focused on a smaller, much more reasonable goal:
ATTEMPT #3
Solving climate change.
OK, that’s not quite what happened, but I did find my data!
The Environmental Protection Agency is eager to talk to data nerds. My search for “API” on their website led me to a vast network of web services, including search engines, interactive maps, case files, and a million and acronym-rich data dictionaries. It was like meeting someone who says, “I like Star Trek”, and I’m like “me too!”, and then they start talking and you realize oh, ohhhh they’re THAT kind of fan. And it’s ok! Cuz you may not know any swear words in Klingon, but at least you both respect the superiority of The Next Generation above all the spinoff series!
Hey. It’s my blog, I get to say what I want.
I had to dig through a lot of Klingon to figure out how to get the data I wanted. The API documentation said I needed to specify a table name and some search parameters to get my data, so I looked at source after source to figure out what data lived in what tables.
I found part of the site devoted to Compliance. This was juicy. Here, I could look up a company and see what kinds of issues they’d had when they didn’t meet the environmental standards of the EPA.
THE DATA AT THE END OF THE TUNNEL
Finally, I had everything I needed: the source URL, the parameters, and the ability to pull data into a CSV file instead of json so I could throw it into a gosh-dang Excel workbook like I wanted to in the first place.
I pulled two tables using these requests:
1) “Activities” (cases) in Virginia: https://enviro.epa.gov/enviro/efservice/icis_activity_rpt/state_code/VA/CSV
2) Facilities located in Virginia: https://enviro.epa.gov/enviro/efservice/icis_facility_interest/state_code/VA/CSV
I dropped the CSV’s in my Google Drive, connected them to my Notebook, and pulled in my data. Then I got to work.
And that’s my adventure with API’s! If you’d like to see what happened to all that data, check out my Notebook here. API’s are very cool and very powerful, but they are WILD. If you don’t understand them, don’t worry; every single one is a weird little robot snowflake. Take your time, look through the documentation, and if it’s not working, if the robot is old or mad or extremely pretentious, move on!
There’s so much data out there. Have fun :)