Got Technology, Need Data? Here’s the Best Free Data Sources for Analytics & AI

Photo by Cristina Gottardi on Unsplash

The Data Challenge

Have you ever wanted to assess or explore a new BI, analytics, data visualization, data science, etc. technology, but struggled to find a data set to use for your assessment and exploration activities? Have you grown tired of analyzing the “typical” free data sources–think census, stock market, and employment and inflation rate data–known to be publicly available? Are you seeking data relevant to a particular concept, but aren’t sure how to collect it? Are you wanting to generate world-changing insights on a fresh, rarely analyzed, concept?
In the data-driven, data science and analytics obsessed world we live in today, obtaining robust data sets to fill each of these use cases is fortunately not the monumental task it was just a few years ago. From static historical government data to real-time social media streams to private company data released for public consumption, the internet is now your data oyster.

Start Here

Below, I’ve compiled a list of great resources providing free, robust data sets for your consumption, as well as hints for finding other data sets that might be of interest to you. While many of these represent traditional datasets, there are lots of hidden gems.

  • World Bank Open Data – Provides free and open access to global development data
  • DATA.GOV – “The home of the U.S. Government’s open data”
  • Canada Open Data – wondering how many immigration applications Canada receives after each U.S. Presidential election? There’s a dataset for that.
  • The CIA World Factbook – download from the archives
  • State/Local Data Portals – many state and local level governments across the U.S. have data portals where you can download datasets related to health, agriculture, public safety, resident requests, building permits, recreation, etc. specific to a given geographic area. To find if your area of interest has a data portal, complete a simple data search for “<Area> Data Portal”. For example, a search for “Houston Data Portal” returns just that, the Houston Data Portal. I’ve listed some other state and big city data portals below.
  • Kaggle – Kaggle is a go-to platform for free, high-quality datasets that you can can use.
  • Wikipedia Database Download – “Wikipedia offers free copies of all available content to interested users.” Want to get into text analytics? This is an immensely comprehensive data set that could keep you analyzing for years. You’ll also need an immense amount of storage, however, if you want to take it offline.
  • Public Datasets on AWS – AWS offers free public datasets, covering fields like genomics, satellite imagery, and weather. 
  • Common Crawl – An open repository of web crawl data.  Collected over 18 years, they make ” wholesale extraction, transformation and analysis of open web data accessible to researchers.”
  • Nasdaq Data Link– Provides millions of financial and economic time-series datasets.  
  • KONECT -The KONECT project has 1,326 network datasets in 24 categories.
  • SNAP, Stanford Large Network Dataset Collection – Provides datasets covering social networks, online reviews, product links and commonly co-purchased products, and more.
  • Million Song Dataset – “A freely-available collection of audio features and metadata for a million contemporary popular music tracks.” The site also provides complimentary datasets around concepts like genre, cover songs, lyrics, etc. that have been contributed by the community.
  • aiHitdata – “aiHitdata is a massive, artificial intelligence/machine learning, automated system that has been trained to build and update company information from the web.”
  • Best Buy APIs – “Our API suite allows you to query Products, Stores and much more. Come on in to explore our data, browse descriptions of the available attributes and see examples of working requests and responses.”
  • WalMart Stores Sales Data – Provides historical sales data for 45 Walmart stores located in different physical regions.
  • Yelp Open Dataset – “The Yelp dataset is a subset of our businesses, reviews, and user data for use in connection with academic research. Available as JSON files, use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps.”

There are also many lists of publicly available datasets compiled by others that you can use as a springboard to finding the unique data set you are seeking.  I’ve listed several below, but a Google search will produce even more results if you still haven’t found quite what you’re looking for.

Go Anywhere

With such a breadth of free data sources publicly available, the only limits on assessment and exploration potential in the area of BI, analytics, data visualization, data science, etc. are the time and attention a person dedicates the actual technologies.
So pick a dataset, or two or three, and start playing. Use the datasets individually or mash them up, they’re free for you to do just that. And while each differs greatly in content, the spirit in which they have been made available is resoundingly consistent–to further knowledge and advancement in the area of a data-driven technology, knowledge, insights, and decision making. With such diversity in data available and diversity in approaches to exploring it, the worlds of BI, analytics, data visualization, and data science only become more exciting and intriguing with each newly available dataset.

For more data tips and tricks, check out our blogs or browse the RPA blogs on Medium.

Ready to unlock the full potential of your data? Our experts are here to help. Send us a message and see how we can transform your data into actionable insights.

Scroll to Top