6.3. Get Data#
This section covers tools to get some data for your projects.
6.3.1. faker: Create Fake Data in One Line of Code#
Show code cell content
!pip install Faker
To quickly create fake data for testing, use faker.
from faker import Faker
fake = Faker()
fake.color_name()
'CornflowerBlue'
fake.name()
'Michael Scott'
fake.address()
'881 Patricia Crossing\nSouth Jeremy, AR 06087'
fake.date_of_birth(minimum_age=22)
datetime.date(1927, 11, 5)
fake.city()
'North Donald'
fake.job()
'Teacher, secondary school'
6.3.2. Silly: Produce Silly Test Data#
Show code cell content
!pip install silly
If you want to produce some sill test data, try the library silly.
import silly
name = silly.name()
email = silly.email()
print(f"Her name is {name}. Her email is {email}")
Her name is olivia ringslap.Her email is boatbench@thirty-three-mighty-horses.link
silly.a_thing()
'five cherry onions'
silly.thing()
'container of khaki wads'
silly.things()
'a tote of plans, twenty-four eyes, and eighteen garlic arms'
silly.sentence()
"God himself can't wait to move a hat in Birmingpoop."
silly.paragraph()
"Agustin Neutral-Jerk needs a group of concerns, badly. Agnes Basil can't wait to taste a box of slate chairs in Assesford. To get to Testasia, you need to go to Fantasticheartsound, then drive east. The band 'Queen' will hurl a dance. To get to Arztotzka, you need to go to Cape City Central, then drive north. The world will head to Integrated Eye And Onion to buy a tub of boots. Assemble jean! Seth Violetbag will assemble a laudable ring. Lampton is in South Yemen. Birmingobject is in West Cybertron."
6.3.3. Random User: Generate Random User Data in One Line of Code#
Have you ever wanted to create fake user data for testing? Random User Generator is a free API that generates random user data. Below is how to download and use this data in your code.
import json
from urllib.request import urlopen
# Show 2 random users
data = urlopen("https://randomuser.me/api?results=2").read()
users = json.loads(data)["results"]
users
[{'gender': 'female',
'name': {'title': 'Miss', 'first': 'Ava', 'last': 'Hansen'},
'location': {'street': {'number': 3526, 'name': 'George Street'},
'city': 'Worcester',
'state': 'Merseyside',
'country': 'United Kingdom',
'postcode': 'K7Z 3WB',
'coordinates': {'latitude': '11.9627', 'longitude': '17.6871'},
'timezone': {'offset': '+9:00',
'description': 'Tokyo, Seoul, Osaka, Sapporo, Yakutsk'}},
'email': 'ava.hansen@example.com',
'login': {'uuid': '253e53f9-9553-4345-9047-fb18aec51cfe',
'username': 'heavywolf743',
'password': 'cristina',
'salt': 'xwnpqwtd',
'md5': '2b5037da7d78258f167d5a3f8dc24edb',
'sha1': 'fabbede0577b3fed686afd319d5ab794f1b35b02',
'sha256': 'd42e2061f9c283c4548af6c617727215c79ecafc74b9f3a294e6cf09afc5906f'},
'dob': {'date': '1948-01-21T10:26:00.053Z', 'age': 73},
'registered': {'date': '2011-11-19T03:28:46.830Z', 'age': 10},
'phone': '015242 07811',
'cell': '0700-326-155',
'id': {'name': 'NINO', 'value': 'HT 97 25 71 Y'},
'picture': {'large': 'https://randomuser.me/api/portraits/women/60.jpg',
'medium': 'https://randomuser.me/api/portraits/med/women/60.jpg',
'thumbnail': 'https://randomuser.me/api/portraits/thumb/women/60.jpg'},
'nat': 'GB'},
{'gender': 'male',
'name': {'title': 'Mr', 'first': 'Aubin', 'last': 'Martin'},
'location': {'street': {'number': 8496, 'name': "Rue du BĂąt-D'Argent"},
'city': 'Strasbourg',
'state': 'Meurthe-et-Moselle',
'country': 'France',
'postcode': 83374,
'coordinates': {'latitude': '-1.3192', 'longitude': '24.0062'},
'timezone': {'offset': '+10:00',
'description': 'Eastern Australia, Guam, Vladivostok'}},
'email': 'aubin.martin@example.com',
'login': {'uuid': '54b9bfa9-5e86-4335-8ae3-164d85df98e7',
'username': 'heavyladybug837',
'password': 'kendra',
'salt': 'LcEMyR5s',
'md5': '2fbd9e05d992eb74f7afcccec02581fc',
'sha1': '530a1bc71a986415176606ea377961d2ce381e5d',
'sha256': 'f5ee7bc47f5615e89f1729dcb49632c6b76a90ba50eb42d782e2790398ebc539'},
'dob': {'date': '1949-04-12T05:01:31.463Z', 'age': 72},
'registered': {'date': '2006-05-28T03:54:36.433Z', 'age': 15},
'phone': '01-88-32-00-30',
'cell': '06-09-79-55-81',
'id': {'name': 'INSEE', 'value': '1NNaN48231023 75'},
'picture': {'large': 'https://randomuser.me/api/portraits/men/65.jpg',
'medium': 'https://randomuser.me/api/portraits/med/men/65.jpg',
'thumbnail': 'https://randomuser.me/api/portraits/thumb/men/65.jpg'},
'nat': 'FR'}]
6.3.4. fetch_openml: Get OpenMLâs Dataset in One Line of Code#
OpenML has many interesting datasets. The easiest way to get OpenMLâs data in Python is to use the sklearn.datasets.fetch_openml
method.
In one line of code, you get the OpenMLâs dataset to play with!
from sklearn.datasets import fetch_openml
monk = fetch_openml(name="monks-problems-2", as_frame=True)
print(monk["data"].head(10))
attr1 attr2 attr3 attr4 attr5 attr6
0 1 1 1 1 2 2
1 1 1 1 1 4 1
2 1 1 1 2 1 1
3 1 1 1 2 1 2
4 1 1 1 2 2 1
5 1 1 1 2 3 1
6 1 1 1 2 4 1
7 1 1 1 3 2 1
8 1 1 1 3 4 1
9 1 1 2 1 1 1
6.3.5. Autoscraper: Automate Web Scraping in Python#
Show code cell content
!pip install autoscraper
If you want to automatically scrape a website in a few lines of Python code, try autoscraper.
With autoscraper, you can extract elements with certain patterns by simply providing the text with that pattern.
For a more precise selection of elements to extract, use Beautiful Soup.
from autoscraper import AutoScraper
url = "https://stackoverflow.com/questions/2081586/web-scraping-with-python"
wanted_list = ["How to check version of python modules?"]
scraper = AutoScraper()
result = scraper.build(url, wanted_list)
for res in result:
print(res)
How to execute a program or call a system command?
What are metaclasses in Python?
Does Python have a ternary conditional operator?
Convert bytes to a string
Does Python have a string 'contains' substring method?
How to check version of python modules?
6.3.6. pandas-reader: Extract Data from Various Internet Sources Directly into a Pandas DataFrame#
Show code cell content
!pip install pandas-datareader
Have you wanted to extract series data from various Internet sources directly into a pandas DataFrame? That is when pandas_reader comes in handy.
Below is the snippet to extract daily data of AD indicator from 2008 to 2018.
import os
from datetime import datetime
import pandas_datareader.data as web
df = web.DataReader(
"AD",
"av-daily",
start=datetime(2008, 1, 1),
end=datetime(2018, 2, 28),
api_key=os.gehide-outputtenv("ALPHAVANTAGE_API_KEY"),
)
6.3.7. pytrends: Get the Trend of a Keyword on Google Search Over Time#
Show code cell content
!pip install pytrends
If you want to get the trend of a keyword on Google Search over time, try pytrends.
In the code below, I use pytrends to get the interest of the keyword âdata scienceâ on Google Search from 2016 to 2021.
from pytrends.request import TrendReq
pytrends = TrendReq(hl="en-US", tz=360)
pytrends.build_payload(kw_list=["data science"])
df = pytrends.interest_over_time()
df["data science"].plot(figsize=(20, 7))
<AxesSubplot:xlabel='date'>

6.3.9. Datacommons: Get Statistics about a Location in One Line of Code#
Show code cell content
!pip install datacommons
If you want to get some interesting statistics about a location in one line of code, try Datacommons. Datacommons is a publicly available data from open sources (census.gov, cdc.gov, data.gov, etc.). Below are some statistics extracted from Datacommons.
import datacommons_pandas
import plotly.express as px
import pandas as pd
6.3.9.1. Find the Median Income in California Over Time#
median_income = datacommons_pandas.build_time_series("geoId/06", "Median_Income_Person")
median_income.index = pd.to_datetime(median_income.index)
median_income.plot(
figsize=(20, 10),
x="Income",
y="Year",
title="Median Income in California Over Time",
)
<AxesSubplot:title={'center':'Median Income in California Overtime'}>

6.3.9.2. Number of People in the U.S Over Time#
def process_ts(statistics: str):
count_person = datacommons_pandas.build_time_series('country/USA', statistics)
count_person.index = pd.to_datetime(count_person.index)
count_person.name = statistics
return count_person
count_person_male = process_ts('Count_Person_Male')
count_person_female = process_ts('Count_Person_Female')
count_person = pd.concat([count_person_female, count_person_male], axis=1)
count_person.plot(
figsize=(20, 10),
title="Number of People in the U.S Over Time",
)
<AxesSubplot:title={'center':'Number of People in the U.S Overtime'}>

6.3.9.3. Number of Robberies in the US Over Time#
count_robbery = datacommons_pandas.build_time_series(
"country/USA", "Count_CriminalActivities_Robbery"
)
count_robbery.index = pd.to_datetime(count_robbery.index)
count_robbery.plot(
figsize=(20, 10),
title="Number of Robberies in the US Over Time",
)
<AxesSubplot:title={'center':'Number of Robberies in the US Overtime'}>

6.3.10. Get Google News Using Python#
Show code cell content
!pip install GoogleNews
If you want to get Google news in Python, use GoogleNews. GoogleNews allows you to get search results for a keyword in a specific time interval.
from GoogleNews import GoogleNews
googlenews = GoogleNews()
googlenews.set_time_range('02/01/2022','03/25/2022')
googlenews.search('funny')
googlenews.results()
[{'title': 'Hagan has fastest NHRA Funny Car run in 4 years',
'media': 'ESPN',
'date': 'Feb 26, 2022',
'datetime': datetime.datetime(2022, 2, 26, 0, 0),
'desc': '-- Matt Hagan made the quickest Funny Car run in four years Saturday, \ngiving the new Tony Stewart Racing NHRA team its first No. 1 qualifier and \nsetting the...',
'link': 'https://www.espn.com/racing/story/_/id/33381149/matt-hagan-fastest-nhra-funny-car-pass-4-years',
'img': ''},
{'title': 'Full fields in Top Fuel, Funny Car, and Pro Stock promise fast ...',
'media': 'NHRA',
'date': 'Feb 10, 2022',
'datetime': datetime.datetime(2022, 2, 10, 0, 0),
'desc': 'The pits at Auto Club Raceway at Pomona will be packed with NHRA Camping \nWorld Drag Racing Series teams for the 2022 season-opening Lucas Oil NHRA...',
'link': 'https://www.nhra.com/news/2022/full-fields-top-fuel-funny-car-and-pro-stock-promise-fast-start-winternationals',
'img': ''},
{'title': 'Full Cast Set for Broadway Revival of Funny Girl, Starring ...',
'media': 'Playbill',
'date': 'Feb 7, 2022',
'datetime': datetime.datetime(2022, 2, 7, 0, 0),
'desc': 'Among those newly added to the company are Peter Francis James, Ephie \nAardema, Martin Moran, and Julie Benko. By Margaret Hall. February 07, 2022.',
'link': 'https://playbill.com/article/full-cast-set-for-broadway-revival-of-funny-girl-starring-beanie-feldstein-and-ramin-karimloo',
'img': ''},
{'title': 'Robert Hight tops Funny Car qualifying at season-opening Lucas Oil NHRA Winternationals',
'media': 'ESPN',
'date': 'Feb 18, 2022',
'datetime': datetime.datetime(2022, 2, 18, 0, 0),
'desc': "-- Robert Hight topped Funny Car qualifying Friday night in the NHRA \nCamping World Drag Racing Series' season-opening Lucas Oil NHRA \nWinternationals. Hight, a...",
'link': 'https://www.espn.com/racing/story/_/id/33324340/robert-hight-tops-funny-car-qualifying-season-opening-lucas-oil-nhra-winternationals',
'img': ''},
{'title': 'New NHRA Funny Car Team Owner Ron Capps Throws ...',
'media': 'Autoweek',
'date': 'Feb 21, 2022',
'datetime': datetime.datetime(2022, 2, 21, 0, 0),
'desc': 'Defending Funny Car champion enters season without automaker deal after \nlong-time partner turns him down. By Susan Wade. Feb 21, 2022.',
'link': 'https://www.autoweek.com/racing/nhra/a39160639/ron-capps-throws-dodgemopar-under-bus/',
'img': ''},
{'title': 'VIDEO: Beanie Feldstein, Ramin Karimloo, and More in ...',
'media': 'Broadway World',
'date': 'ar 9, 2022',
'datetime': None,
'desc': 'The highly anticipated Broadway revival of Funny Girl is beginning \nperformances this month! The musical will have its first preview at the \nAugust Wilson on...',
'link': 'https://www.broadwayworld.com/article/VIDEO-Beanie-Feldstein-Ramin-Karimloo-and-More-in-Rehearsal-For-FUNNY-GIRL-20220309',
'img': ''},
{'title': 'Watch: The Funny Girl Sitzprobe, With Beanie Feldstein and ...',
'media': 'TheaterMania',
'date': 'ar 24, 2022',
'datetime': None,
'desc': "Funny Girl is headed back to Broadway. Here is a first look at the cast's \nfirst orchestra rehearsal, with snippets of stars Beanie Feldstein, Ramin \nKarimloo...",
'link': 'https://www.theatermania.com/broadway/news/first-look-the-funny-girl-sitzprobe-with-beanie-fe_93550.html',
'img': ''},
{'title': 'Stephen Colbert, Funny or Die Prep Primetime Pickleball Special for CBS',
'media': 'The Hollywood Reporter',
'date': 'ar 15, 2022',
'datetime': None,
'desc': 'Stephen Colbert, Funny or Die Prep Primetime Pickleball Special for CBS. \nThe special, \'Pickled,\' will see celebrity competitors vie for the "Golden \nGherkin.".',
'link': 'https://www.hollywoodreporter.com/tv/tv-news/stephen-colbert-funny-or-die-primetime-pickleball-cbs-1235111617/',
'img': ''},
{'title': 'Randy Meyer Racing to debut injected nitro Funny Car at ...',
'media': 'NHRA',
'date': 'ar 22, 2022',
'datetime': None,
'desc': "The Funny Car Chaos deal is becoming more popular here in the Midwest, so \nit's an opportunity for us to go race close to home, have some fun, and \ntake on a new...",
'link': 'https://www.nhra.com/news/2022/randy-meyer-racing-debut-injected-nitro-funny-car-funny-car-chaos-event',
'img': ''},
{'title': 'Laurie Zaleski talks about her book âFunny Farmâ',
'media': 'The Washington Post',
'date': 'Feb 25, 2022',
'datetime': datetime.datetime(2022, 2, 25, 0, 0),
'desc': "This is the Funny Farm, double-entendre intended: âBecause it's full of \nanimals, and fit for lunatics,â Zaleski jokes of the sanctuary that she \nbuilt here,...",
'link': 'https://www.washingtonpost.com/books/2022/02/25/funny-farm-rescue-animals/',
'img': ''}]
6.3.11. people_also_ask: Python Wrapper for Google People Also Ask#
Show code cell content
!pip install people_also_ask
If you want to use Google People Also Ask in Python, try the people_also_ask
library.
import people_also_ask as ask
ask.get_related_questions('data science')
['What exactly data science do?',
'Is data science a good career?',
'What are the 3 main concepts of data science?',
'Is data science a easy career?']
ask.get_answer('Is data science a easy career?')
{'has_answer': True,
'question': 'Is data science a easy career?',
'related_questions': ['Is becoming a data scientist easy?',
'Is data science a stressful career?',
'Is Python for data science hard?',
'Do data scientists code a lot?'],
'response': 'The short answer to the above question is a big NO! Data Science is hard to learn is primarily a misconception that beginners have during their initial days. As they discover the unique domain of data science more, they realise that data science is just another field of study that can be learned by working hard.Oct 4, 2022',
'heading': 'The short answer to the above question is a big NO! Data Science is hard to learn is primarily a misconception that beginners have during their initial days. As they discover the unique domain of data science more, they realise that data science is just another field of study that can be learned by working hard.Oct 4, 2022',
'title': 'Is Data Science Hard to Learn? (Answer: NO!) - ProjectPro',
'link': 'https://www.projectpro.io/article/is-data-science-hard-to-learn/522#:~:text=The%20short%20answer%20to%20the,be%20learned%20by%20working%20hard.',
'displayed_link': 'https://www.projectpro.io âș article âș is-data-science-hard-t...',
'snippet_str': 'The short answer to the above question is a big NO! Data Science is hard to learn is primarily a misconception that beginners have during their initial days. As they discover the unique domain of data science more, they realise that data science is just another field of study that can be learned by working hard.Oct 4, 2022\nhttps://www.projectpro.io âș article âș is-data-science-hard-t...\nhttps://www.projectpro.io/article/is-data-science-hard-to-learn/522#:~:text=The%20short%20answer%20to%20the,be%20learned%20by%20working%20hard.\nIs Data Science Hard to Learn? (Answer: NO!) - ProjectPro',
'snippet_data': None,
'date': None,
'snippet_type': 'Definition Featured Snippet',
'snippet_str_body': '',
'raw_text': 'Featured snippet from the web\nThe short answer to the above question is a big NO! \nData Science is hard to learn\n is primarily a misconception that beginners have during their initial days. As they discover the unique domain of data science more, they realise that data science is just another field of study that can be learned by working hard.\nOct 4, 2022\nIs Data Science Hard to Learn? (Answer: NO!) - ProjectPro\nhttps://www.projectpro.io\n âș article âș is-data-science-hard-t...'}
6.3.12. Scrape Facebook Public Pages Without an API Key#
Show code cell content
pip install facebook-scraper
If you want to scrape Facebook public pages without an API key, try facebook-scraper.
With facebook-scraper, you can scrape posts by a user and the profile of a user or a group
from facebook_scraper import get_profile, get_group_info
get_group_info("thedachshundowners")
{'id': '2685753618191566',
'name': 'Dachshund Owners',
'type': 'Public group',
'members': 128635,
'about': "Hello, Welcome to the Dachshund Owners group.\nPost pictures/videos, share stories, ask advise from other Dachshund lovers.\nYou can post YOUR videos / pics of your Dachshund if theyâve got that viral element, your dog will be seen by millions of people around the globe.\n\n* RULES AND POSTING GUIDELINES\nâ
Post original contents that you created ONLY\nâ
Post with a short description/story about the content\nâ
Include if you want to be credited or not\nâ
Be nice to other members\nâ No aggressive behavior\nâ No backyard breeding\nâ No spam\nâ No unrelated content or video\n\nđŸ Selling dogs are PROHIBITED. Any transactions made with our group from this point forward will be at your own risk.\nđŸNo Promotions - include(s) SELLING/PROMOTING ITEM(s) Fishing potential buyers. We are just protecting each and everyone from scams!\n\nđïž Hello, Sir or Lady.\nThe Dachshund Owners group administrator is speaking.\nWe appreciate you being here. We want to let you know that managing spam and other related tasks is difficult with so many posts to approve. It's a very time-consuming task, and that's why we need your support. Please buy a T-shirt from us so we can keep the group running and providing awesome value! You can get a T-shirt, mug, Canvas, and other items from us that can be customized for Dog lovers. On our Store, you may check out our customized campaigns. Thank You! đ\nđ Store Link: https://www.pawowners.com/collectio..."}
get_profile("zuck")
{'Friend_count': None,
'Follower_count': None,
'Following_count': None,
'cover_photo': 'https://scontent-ord5-1.xx.fbcdn.net/v/t31.18172-8/19575079_10103832396388711_8894816584589808440_o.jpg?stp=cp0_dst-jpg_e15_fr_q65&_nc_cat=1&ccb=1-7&_nc_sid=ed5ff1&_nc_ohc=Z5jCEAhNv3AAX9ihcdv&_nc_ht=scontent-ord5-1.xx&oh=00_AfCTBrP26zWK0onpRfKbpJLRlFDwWLmlv1_XlkeVLkE_yw&oe=63CA953D',
'profile_picture': 'https://scontent-ord5-1.xx.fbcdn.net/v/t39.30808-1/312257846_10114737758665291_6588360857015169674_n.jpg?stp=cp0_dst-jpg_e15_q65_s120x120&_nc_cat=1&ccb=1-7&_nc_sid=dbb9e7&_nc_ohc=x2_MUzaxC2cAX9w6LZ6&_nc_ht=scontent-ord5-1.xx&oh=00_AfDiKcBBDdzCymHXd-yjp2stit_VGPQRm9oeibSyDFG8BA&oe=63A81F9E',
'id': '4',
'Name': 'Mark Zuckerberg',
'Work': 'Chan Zuckerberg Initiative\nDecember 1, 2015 - Present\nMeta\nFounder and CEO\nFebruary 4, 2004 - Present\nPalo Alto, California\nBringing the world closer together.',
'Education': 'Harvard University\nComputer Science and Psychology\nAugust 30, 2002 - April 30, 2004\nPhillips Exeter Academy\nClassics\nClass of 2002\nArdsley High School\nHigh school\nAugust 1998 - May 2000',
'Places lived': [{'link': '/profile.php?id=104022926303756&refid=17',
'text': 'Palo Alto, California',
'type': 'Current city'},
{'link': '/profile.php?id=105506396148790&refid=17',
'text': 'Dobbs Ferry, New York',
'type': 'Hometown'}],
'About': "I'm trying to make the world a more open place.",
'Favorite quotes': '"Fortune favors the bold."\n- Virgil, Aeneid X.284\n\n"All children are artists. The problem is how to remain an artist once you grow up."\n- Pablo Picasso\n\n"Make things as simple as possible but no simpler."\n- Albert Einstein'}