When I built the Nature section of Paperless Ponte Vedra, I looked to GBIF for a list of reported nature sightings. As of this writing, the GBIF aggregator had logged 3,117,737,422 individual reports of nature sightings around the globe. Is that billions? Thankfully, this number was narrowed down to just 7,746,724 unique species. Is that millions? Can that be true? Noah’s crew is evidently expanding along with the universe.
Using GBIF’s polygon filter, I narrowed the 7,746,724 global species down to what I considered a representative nature zone for Ponte Vedra. That left me with a mere 2,427 species to introduce you to!
I’m sorry. I’m just not that into it. I needed something manageable. I needed to tame it.
Excel is so named because it is excellent at taming big data.
Let me rephrase that. Databases are excellent at taming big data. Excel gives mere mortals an entry-level tool to get giant numbers under our control. That hint of control feels Excel-lent.
Data travels by CSV
Big data is usually handed over in the age-old form of CSV. If you have ever seen a file named somthing.csv, you have met CSV. I first met CSV in 1993 when I had to transmit payroll data by dial-up modem. Every two weeks, I plugged a regular old phone cord into my computer. It could be one of those springy spiral cords – it wouldn’t matter. My computer still made the call from Jacksonville, Florida to my destination in Jefferson City, Missouri. In its binary whisper, my Gateway computer recited every single thing you’d see on a paycheck statement for 700 employees in about 30 minutes. Compared to printing and FedEx, it felt like my payroll had traveled by the speed of light.
CSV is an extremely long single-file line of characters. Some data files could reach the moon if you typed them out in their line, but they’re generally looking for a table here on earth. Each character is looking for its seat at the table. Most of the characters have to share a seat with their neighbors; we’ll call it a bench seat.
Commas come along as Separators. They tell the characters when to start the next bench. Warning: the Separator is not always a comma. Sometimes it’s a tab, like the one pictured above, or simply a fixed width. But it’s like Kleenex – we still call it Comma Separated Values (CSV) even if we faked the comma. Isn’t it a marvel to see how well humans have adapted to obviously false language? I wonder if animals do that.
Opening a data file in Excel
Once you have your CSV file safe and sound in your computer, Excel will be eyeing it hungrily. In fact, Excel will probably slap its logo on your CSV file. It’s trying to trick you into double-clicking it. Go ahead. Try it. Here’s what happens when I double-click my species file from GBIF:

See that spacey white box near the top that looks like “taxonKey” and “kingdom” are doing a square dance with a bunch of other weird terms? Those are column headings that Excel didn’t recognize. Excel put ALL of the column headings in the first cell of the spreadsheet (A1). Lucky for them, they can waltz right over cells B1 through ZZZ1 because those are completely empty.
As for the data below, Excel DID get the first two columns correct. Maybe. If any of those numbers had leading zeros, they’re gone; in math land, leading zeroes are insignificant. If any of them were dates, they got calculated as if the slash marks were divide symbols. And then of course, all the rest of the fields are square dancing in column C. Are you getting the picture? If you ever want a sneak peek at a data file, go ahead and double-click it. Or even click File, Open from within Excel. You’ll get the same confused character soup either way. Open – peek – close.
No, you cannot just open a data file. Data must be invited in with an agreement on ground rules. Excel needs to know its dealing with data which has been packaged some kinda way, and don’t go messing with zeros or date slashes.
Importing data into Excel
If there’s anything Excel loves more than numbers, it’s data. Excel has an entire menu dedicated to data. See it up there, midpoint of the menus? Now I will extend a proper Excel welcome to my species list. From a new Excel workbook, I click Data, Get Data, From Text/CSV. There’s my GBIF CSV file. I select it, then click Import. Excel gives it the ol’ wine tasting swirl, then offers me its best guess at structure:

Oh, look! After I told Excel we’re dealing with DATA, Excel figured out that the Values are Separated by tabs, not Commas! So that’s what those gaps were between the square dancers. I don’t have to correct a thing now. Load ’em on in, Sam.

Look how excited Excel gets when you mention the word DATA. It decorated the rows. It put filter and sort buttons on each column. It put a little sign on the side to brag that it counted all 2,427 rows.
Species. 2,427 species. That is two thousand, four hundred and twenty-seven plants and birds and fish and critters with some kind of scientific Latin names I must decipher.
It’s too much. Nine times out of ten, if data earned the right to be delivered in a CSV file, that is a clear sign it’s too much for our wittle bwains to compwehend. It must be tamed.
But first, I will save my loaded data as an Excel file so we don’t lose that nice, clean structure.
Sorting data in Excel
Is it possible your data could be trimmed of excess? I had to concede that 2,427 nature species in Ponte Vedra felt excessive for my Paperless Ponte Vedra project. I’ll leave that completeness for iNaturalist and GBIF. I chose to keep only the most frequently reported species. Column E of my GBIF data is titled numberOfOcurrences. I sorted the list by that column with the highest numbers on top. Can you guess which species was most reported in my Ponte Vedra nature frame?
Click here to check your answer
Pandion haliaetus
Click here to translate your answer
Osprey
Click here to celebrate your answer
SWOOP!
Deleting data in Excel
I decided to discard any species that had been reported less than 10 times. I scrolled down my list until the number of sightings fell below 10. I clicked on the row header for the first species that had been reported less than 10 times. Then I scrolled to the end, held down the Shift key, and clicked the last species on the list. Holding the Shift key selects everything from the first click to the next. Then I pressed that scary and exciting button, DELETE.
Hey. The data was gone but it left empty blue-striped rows. I tried again. I highlighted the rows to go. This time, I used Excel’s own DELETE tool in the Home menu. There. Gone. Beautiful. And not scary–mainly because I knew I could go get the list again if I destroyed good data. Thank you, GBIF!
What was even more beautiful was the bearable size of my list after deleting the rarities. Can you guess how many unique species were reported in my Ponte Vedra frame at least 10 times?
Click here to check your answer
527
Now, that is a reasonable quantity of nature beings I can introduce you to. My big data is sufficiently tamed. Thank you, Excel.