Five public datasets, and a lot of suggestions for exploring them


The planet is filled with interesting datasets. But though information is more and more accessible, it&rsquos sometimes hard come up with a fascinating problem to evaluate. Maybe you will find way too many possible questions, maybe it&rsquos a discomfort to setup analytical tools, or possibly it&rsquos too simple to get depressed by animal GIFs.

Regardless of the situation, you want to allow it to be simpler to begin focusing on interesting problems immediately. Listed here are five datasets, already loaded into Mode&rsquos public database, that you could query, evaluate, and visualize at this time.

For every dataset, I&rsquove provided a hyperlink towards the table in Mode&rsquos public data warehouse. Should you&rsquore feeling lazy and simply want to make use of a small quantity of data (as with, one row), I discovered the very best single row of information from each dataset. And when you&rsquore feeling ambitious—and would like to get popular on the web or explain some things—I added ideas for turning these datasets into maps.

FEC Campaign Finance Data

The Government Election Commission requires candidates to create their campaign expenses public. This dataset includes over 200,000 campaign expenses in the 2012 U.S. presidential campaign, and is filled with fascinating breakthroughs. Like Herman Cain&rsquos $150,000 expense on Herman Cain. And also the $5,000–the the majority of any candidate by far–Mitt Romney spent at liquor stores. And Ron Paul&rsquos and Romney&rsquos dependence on junk food (and Obama&rsquos obvious preference for Subway).

Herman Cain end up like:

treat yo self

  • What types of questions can one ask? Exactly what do candidates spend as much as possible on? Perform some candidates obtain a spending lead early, while some save for that finish? How can spending patterns differ for that incumbent when compared with challengers?
  • What&rsquos the table known as?cooldata.fec_2012_presidential_campaign_expenses
  • What&rsquos the very best row? wut.
  • Can One make use of this data to create a map? Absolutely. You can observe how money was put in each condition. Or you might find out if different candidates concentrate on different regions. Or, should you be feeling particularly ambitious, you can map candidate expenses during the day to sketch out the way they traveled across the nation throughout their campaigns.


Crunchbase is rapidly becoming the dataset of record for that startup and investment capital communities. It may showcase everything from what industries are hot (biotech) towards the potential results of founder experience or age. The dataset includes funding, investment, and acquisition data on over 40,000 companies.

  • What types of questions can one ask? Exist characteristics of the company—industry, location, etc.—that differ by VC? Perform some VCs typically invest together, while some rarely achieve this? Are companies raising more income earlier? Shall We Be Inside A BUBBLE??
  • Do you know the tables known as? crunchbase.acquisitions crunchbase.companies crunchbase.models.
  • What&rsquos the very best row? That one, that is approaching the theoretical limit of methods good a row of information could be.
  • Can One make use of this data to create a map? Yes! Such as this rather uninformative one, showing the amount of startups through the county where they&rsquore headquartered.

UFO Sightings

Quandl, which supplies countless free datasets on huge selection of subjects, added data on UFO sightings to Mode. The information includes the amount of reported sightings by month. Quandl will get the information in the National UFO Reporting Center (as well as in situation you have to report a sighting, there is a hotline).

  • What types of questions can one ask? Are a few several weeks popular for sightings? What correlates with UFO sightings?
  • What&rsquos the table known as? thomas.ufo_sightings
  • What&rsquos the very best row? The first, on the sighting from June 1400. The very first sighting from the Black Dark night?
  • Can One make use of this data to create a map? No. However, you can most likely blend it with some Independence Day GIFs making a killer listicle.


FiveThirtyEight, Nate Silver&rsquos data journalism site, produces lots of great analysis. For many articles, they publish the actual data on GitHub. If you wish to explore their data or expand on their own analyses, we&rsquove submitted many of their datasets. A couple of topics include classic rock radio plays, the years of Congressional representatives, World Cup predictions, and surveys about defining U.S. geographic regions and worldwide cuisine preferences.

  • What types of questions can one ask? In the cuisine survey, do individuals from different areas of the nation prefer different foods? Are we able to predict what food someone would really like according to their other preferences? In the data on Congress age, it may be interesting to find out if individuals from different states have a tendency to elect representatives of various ages—and are individuals ages associated with age the ingredients? And in the classic rock data, which classic rock songs don’t let be most tired of right now? Which r / c possess the laziest DJs?
  • Do you know the tables known as? cooldata.fivethirtyeight_region_survey cooldata.fivethirtyeight_congress_age cooldata.fivethirtyeight_world_cup_predictions cooldata.fivethrityeight_classic_rock_plays cooldata.fivethirtyeight_classic_rock_songs fivethirtyeight_food_world_cup.
  • What&rsquos the very best row? Too early?
  • Can One make use of this data to create a map? Yes! I made the map below to understand more about how individuals from different states defined the South and Midwest. You can map food cuisine preferences by region or show how age states’ Congressional representatives have altered with time.

Holidays around the globe

This dataset includes a summary of all of the holidays on the planet within the the coming year. Although this information is helpful for analysis, it may be much more valuable for working out which areas of the world—and which of the customers—are on holiday.

  • What types of questions can one ask? Which countries possess the most holidays? Which several weeks and days possess the most holidays? Which countries share lots of holidays, and which only share a couple of?
  • What&rsquos the table known as? reference_lookups.holidays_by_country
  • What&rsquos the very best row? The liberty row.
  • Can One make use of this data to create a map? Yes! You can show the typical quantity of holidays in the past year based on country, or high&rsquos a vacation on a day.

Suggestions for More?

Inspired to behave fun using these datasets? Send us a hyperlink for your project on Twitter or Facebook, so we&rsquoll share the best work! And if you wish to create a map, we&rsquoll soon be publishing a fast tutorial for the way make one, but you can send us an email for those who have any queries now.

Suggested Articles

  • 67 Many years of Lego Sets
  • Facebook’s Aha Moment Now Is Easier Than You Believe
  • How you can Create 100,000 Parking Spots in Bay Area
  • Resourse:

The Best Way to Prepare a Dataset Easily


Natu "Altermax" Myers: This is lit

MsLemons12: agree

Siraj Raval: Thanks Natu

adi331: How old are you ? Male.

adi331: 🙂 I see, you added the annotation at 2:42

Matheus Ribeiro: You have my age and you've already acomplished so much! I'm more than just a bit jealous! Great work!

Luigi Tecnologo: too funny sexy math is the ultimate perversion of nerd pr0n

Siraj Raval: woot 1337 nerd pr0n

Dominik Andreas: hey siraj, first: really enjoy your show! Having a ML background myself, I really like how you make the complicated simple. second: have you thought about using Jupyter notebooks in your videos and share them afterwards, I think those are much more interactive and fun than code files (and allows for great inline documentation and images). keep up the good work!

Siraj Raval: thank you! will consider that

Simple Man: 0:36 What's her name? (for educational purpose ofc!)

MrC0MPUT3R: Holy shit. Shots fired at Tim Cook hahaha

Siraj Raval: it's on lol

Raghav Gupta: Hi Siraj, thanks for the awesum vid…Here's my submission for the Pokemon Classifier :\n

Siraj Raval: good job Raghav, we could definitely get that prediction number up next time (more data, less features)

ProSurviver: bro where do you get time to meditate, I don't even get time to breath without something due. lol just kidding, no seriously comp sci is absolute death

Siraj Raval: Yo it comes in waves for me. Lately no time because i've been outputting 3 videos a week but im gonna get back into it in january. i feel you, the comp sci major is hard AF.

libai tony: Exciting project + sample + humor = great !!!

Siraj Raval: Thanks Tony!

Rafael Costa: The best video of all. With this, one can do anything! Thank you Siraj you are the Man!