Home

Awesome

Data Collection From Web APIs

<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->

All Contributors

<!-- ALL-CONTRIBUTORS-BADGE:END -->

A curated list of example code to collect data from Web APIs using DataPrep.Connector.

How to Contribute?

You can contribute to this project in two ways. Please check the contributing guide.

  1. Add your example code on this page
  2. Add a new configuration file to this repo

Why Contribute?

Index

Art

Harvard Art Museum -- Collect Museums' Collection Data

<details> <summary>Find the objects with dog in their titles and were made in 1990.</summary>
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('object', title='dog', yearmade=1990)
df[['title', 'division', 'classification', 'technique', 'department', 'century', 'dated']]
titledivisionclassificationtechniquedepartmentcenturydated
0Paris (black dog on street)Modern and Contemporary ArtPhotographsGelatin silver printDepartment of Photographs20th century1990s
1Pregnant Woman with DogModern and Contemporary ArtPhotographsGelatin silver printDepartment of Photographs20th century1990
2Pompeii DogModern and Contemporary ArtPrintsDrypointDepartment of Prints20th century1990
</details> <details> <summary>Find 10 people that are Dutch.</summary>
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('person', q='culture:Dutch', size=10)
df[['display name', 'gender', 'culture', 'display date', 'object count', 'birth place', 'death place']]
display namegenderculturedisplay dateobject countbirth placedeath place
0Joris Abrahamsz. van der HaagenunknownDutch1615 - 16697Arnhem or Dordrecht, NetherlandsThe Hague, Netherlands
1François Morellon de la CaveunknownDutch1723 - 651NoneNone
2Cornelis VroomunknownDutch1590/92 - 16613Haarlem(?), NetherlandsHaarlem, Netherlands
3Constantijn Daniel van RenesseunknownDutch1626 - 16802MaarssenEindhoven
4Dirck Dalens, the YoungerunknownDutch1654 - 16883Amsterdam, NetherlandsAmsterdam, Netherlands
</details> <details> <summary>Find all exhibitions that take place at a Harvard Art Museums venue after 2020-01-01.</summary>
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('exhibition', venue='HAM', after='2020-01-01')
df
titlebegin dateend dateurl
0Painting Edo: Japanese Art from the Feinberg Collection2020-02-142021-07-18https://www.harvardartmuseums.org/visit/exhibitions/5909
</details> <details> <summary>Find 5 records for publications that were published in 2013.</summary>
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('publication', q='publicationyear:2013', size=5)
df[['title','publication date','publication place','format']]
titlepublication datepublication placeformat
019th Century Paintings, Drawings & WatercoloursJanuary 23, 2013LondonAuction/Dealer Catalogue
1"With Éclat" The Boston Athenæum and the Orig...2013Boston, MABook
2"Review: Fragonard's Progress of Love at the F...2013LondonArticle/Essay
3Alternative NarrativesFebruary 2013NoneArticle/Essay
4Victorian & British Impressionist ArtJuly 11, 2013LondonAuction/Dealer Catalogue
</details> <details> <summary>Find 5 galleries that are on floor (Level) 2 in the Harvard Art Museums building.</summary>
from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('gallery', floor=2, size=5)
df[['id','name','theme','object count']]
idnamethemeobject count
02200European and American Art, 17th–19th centuryThe Emergence of Romanticism in Early Nineteen...20
12210West ArcadeNone6
22340European and American Art, 17th–19th centuryThe Silver Cabinet: Art and Ritual, 1600–185073
32460East ArcadeNone2
42700European and American Art, 19th centuryImpressionism and the Late Nineteenth Century19
</details>

Business

Yelp -- Collect Local Business Data

<details> <summary>What's the phone number of Capilano Suspension Bridge Park?</summary>
from dataprep.connector import connect

# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)

df = await conn_yelp.query("businesses", term = "Capilano Suspension Bridge Park", location = "Vancouver", _count = 1)

df[["name","phone"]]
idnamephone
0Capilano Suspension Bridge Park+1 604-985-7474
</details> <details> <summary>Which yoga store has the highest review count in Vancouver?</summary>
from dataprep.connector import connect

# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 1)

  # Check all supported categories: https://www.yelp.ca/developers/documentation/v3/all_category_list
df = await conn_yelp.query("businesses", categories = "yoga", location = "Vancouver", sort_by = "review_count", _count = 1)
df[["name", "review_count"]]
idnamereview_count
0YYOGA Downtown Flow107
</details> <details> <summary>How many Starbucks stores in Seattle and where are they?</summary>
from dataprep.connector import connect

# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)
df = await conn_yelp.query("businesses", term = "Starbucks", location = "Seattle", _count = 1000)

# Remove irrelevant data
df = df[(df['city'] == 'Seattle') & (df['name'] == 'Starbucks')]
df[['name', 'address1', 'city', 'state', 'country', 'zip_code']].reset_index(drop=True)
idnameaddress1citystatecountryzip_code
0Starbucks515 Westlake Ave NSeattleWAUS98109
1Starbucks442 Terry Avenue NSeattleWAUS98109
...............................
126Starbucks17801 International BlvdSeattleWAUS98158
</details> <details> <summary>What are the ratings for a list of resturants?</summary>
from dataprep.connector import connect
import pandas as pd
import asyncio
# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)

names = ["Miku", "Boulevard", "NOTCH 8", "Chambar", "VIJ’S", "Fable", "Kirin Restaurant", "Cafe Medina", \
 "Ask for Luigi", "Savio Volpe", "Nicli Pizzeria", "Annalena", "Edible Canada", "Nuba", "The Acorn", \
 "Lee's Donuts", "Le Crocodile", "Cioppinos", "Six Acres", "St. Lawrence", "Hokkaido Santouka Ramen"]

query_list = [conn_yelp.query("businesses", term=name, location = "Vancouver", _count=1) for name in names]
results = asyncio.gather(*query_list)
df = pd.concat(await results)
df[["name", "rating", "city"]].reset_index(drop=True)
IDNameRatingCity
0Miku4.5Vancouver
1Boulevard Kitchen & Oyster Bar4.0Vancouver
............
20Hokkaido Ramen Santouka4.0Vancouver
</details>

Hunter -- Collect and Verify Professional Email Addresses

<details> <summary>Who are executives of Asana and what are their emails?</summary>
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

df = await conn_hunter.query('all_emails', domain='asana.com', _count=10)

df[df['department']=='executive']
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>first_name</th> <th>last_name</th> <th>email</th> <th>position</th> <th>department</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Dustin</td> <td>Moskovitz</td> <td>dustin@asana.com</td> <td>Cofounder</td> <td>executive</td> </tr> <tr> <th>1</th> <td>Stephanie</td> <td>Heß</td> <td>shess@asana.com</td> <td>CEO</td> <td>executive</td> </tr> <tr> <th>2</th> <td>Erin</td> <td>Cheng</td> <td>erincheng@asana.com</td> <td>Strategic Initiatives</td> <td>executive</td> </tr> </tbody> </table> </div> </details> <details> <summary>What is Dustin Moskovitz's email?</summary>
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

df = await conn_hunter.query("individual_email", full_name='dustin moskovitz', domain='asana.com')

df
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>first_name</th> <th>last_name</th> <th>email</th> <th>position</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Dustin</td> <td>Moskovitz</td> <td>dustin@asana.com</td> <td>Cofounder</td> </tr> </tbody> </table> </div> </details> <details> <summary>Are the emails of Asana executives valid?</summary>
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

employees = await conn_hunter.query("all_emails", domain='asana.com', _count=10)
executives = employees.loc[employees['department']=='executive']
emails = executives[['email']]

for email in emails.iterrows():
status = await conn_hunter.query("email_verifier", email=email[1][0])
emails['status'] = status

emails
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>email</th> <th>status</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>dustin@asana.com</td> <td>valid</td> </tr> <tr> <th>3</th> <td>shess@asana.com</td> <td>NaN</td> </tr> <tr> <th>4</th> <td>erincheng@asana.com</td> <td>NaN</td> </tr> </tbody> </table> </div> </details> <details> <summary>How many available requests do I have left?</summary>
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

df = await conn_hunter.query("account")
df
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>requests available</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>19475</td> </tr> </tbody> </table> </div> </details> <details> <summary>What are the counts of each level of seniority of Intercom employees?</summary>
from dataprep.connector import connect

# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})

df = await conn_hunter.query("email_count", domain='intercom.io')
df.drop('total', axis=1)
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>junior</th> <th>senior</th> <th>executive</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>0</td> <td>2</td> <td>2</td> </tr> </tbody> </table> </div> </details>

Finance

Finnhub -- Collect Financial, Market, Economic Data

<details> <summary>How to get a list of cryptocurrencies and their exchanges</summary>
import pandas as pd
from dataprep.connector import connect

# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)

df = await conn_finnhub.query('crypto_exchange')
exchanges = df['exchange'].to_list()
symbols = []
for ex in exchanges:
    data = await df.query('crypto_symbols', exchange=ex)
    symbols.append(data)
df_symbols = pd.concat(symbols)
df_symbols
iddescriptiondisplaySymbolsymbol
0Binance FRONT/ETHFRONT/ETHBINANCE:FRONTETH
1Binance ATOM/BUSDATOM/BUSDBINANCE:ATOMBUSD
............
281Poloniex AKRO/BTCAKRO/BTCPOLONIEX:BTC_AKRO
</details> <details> <summary>Which ipo in the current month has the highest total share values?</summary>
import calendar
from datetime import datetime
from dataprep.connector import connect

# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)

today = datetime.today()
days_in_month = calendar.monthrange(today.year, today.month)[1]
date_from = today.replace(day=1).strftime('%Y-%m-%d')
date_to = today.replace(day=days_in_month).strftime('%Y-%m-%d')
ipo_df = await conn_finnhub.query('ipo_calender', from_=date_from, to=date_to)
ipo_df[ipo_df['totalSharesValue'] == ipo_df['totalSharesValue'].max()]
iddateexchangenamenumberOfShares...totalSharesValue
52021-02-03NYSETELUS International (Cda) Inc.33333333...9.58333e+08
</details> <details> <summary>What are the average acutal earnings from the last 4 seasons of a list of 10 popular stocks?</summary>
import asyncio
import pandas as pd
from dataprep.connector import connect

# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)

stock_list = ['TSLA', 'AAPL', 'WMT', 'GOOGL', 'FB', 'MSFT', 'COST', 'NVDA', 'JPM', 'AMZN']
query_list = [conn_finnhub.query('earnings', symbol=symbol) for symbol in stock_list]
query_results = asyncio.gather(*query_list)
stocks_df = pd.concat(await query_results)
stocks_df = stocks_df.groupby('symbol', as_index=False).agg({'actual': ['mean']})
stocks_df.columns = stocks_df.columns.get_level_values(0)
stocks_df = stocks_df.sort_values(by='actual', ascending=False).rename(columns={'actual': 'avg_actual'})
stocks_df.reset_index(drop=True)
idsymbolavg_actual
0GOOGL12.9375
1AMZN8.5375
2FB2.4475
........
9TSLA0.556
</details> <details> <summary>What is the earnings of last 4 quarters of a given company? (e.g. TSLA)</summary>
from dataprep.connector import connect
from datetime import datetime, timedelta, timezone

# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)

today = datetime.now(tz=timezone.utc)
oneyear = today - timedelta(days = 365)
start = int(round(oneyear.timestamp()))

result = await conn_finnhub.query('earnings_calender', symbol='TSLA', from_=start, to=today)
result = result.set_index('date')
result
iddateepsActualepsEstimatehourquarter...symbolyear
02021-01-270.81.37675amc4...TSLA2020
12020-10-210.760.600301amc3...TSLA2020
22020-07-220.436-0.0267036amc2...TSLA2020
..........................
32011-02-15-0.094-0.101592amc4...TSLA2010
</details>

Geocoding

MapQuest -- Collect Driving Directions, Maps, Traffic Data

<details> <summary>Where is the Simon Fraser University? Give all the places if there is more than one campus.</summary>
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)

BC_BBOX = "-139.06,48.30,-114.03,60.00"
campus = await conn_map.query("place", q = "Simon Fraser University", sort = "relevance", bbox = BC_BBOX, _count = 50)
campus = campus[campus["name"] == "Simon Fraser University"].reset_index()
idindexnamecountrystatecityaddresspostalCodecoordinatesdetails
00Simon Fraser UniversityCABCBurnaby8888 University Drive EV5A 1S6[-122.90416, 49.27647]...
12Simon Fraser UniversityCABCVancouver602 Hastings St WV6B 1P2[-123.113431, 49.284626]...
</details> <details> <summary>How many KFC are there in Burnaby? What are their address?</summary>
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)

BC_BBOX = "-139.06,48.30,-114.03,60.00"
kfc = await conn_map.query("place", q = "KFC", sort = "relevance", bbox = BC_BBOX, _count = 500)
kfc = kfc[(kfc["name"] == "KFC") & (kfc["city"] == "Burnaby")].reset_index()
print("There are %d KFCs in Burnaby" % len(kfc))
print("Their addresses are:")
kfc['address']

There are 1 KFCs in Burnaby

Their addresses are:

idaddress
05094 Kingsway
</details> <details> <summary>The ratio of Starbucks to Tim Hortons in Vancouver?</summary>
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
VAN_BBOX = '-123.27,49.195,-123.020,49.315'
starbucks = await conn_map.query('place', q='starbucks', sort='relevance', bbox=VAN_BBOX, page='1', pageSize = '50', _count=200)
timmys = await conn_map.query('place', q='Tim Hortons', sort='relevance', bbox=VAN_BBOX, page='1', pageSize = '50', _count=200)

is_vancouver_sb = starbucks['city'] == 'Vancouver'
is_vancouver_tim = timmys['city'] == 'Vancouver'
sb_in_van = starbucks[is_vancouver_sb]
tim_in_van = timmys[is_vancouver_tim]
print('The ratio of Starbucks:Tim Hortons in Vancouver is %d:%d' % (len(sb_in_van), len(tim_in_van)))

The ratio of Starbucks:Tim Hortons in Vancouver is 188:120

</details> <details> <summary>What is the closest gas station from Metropolist and how far is it?</summary>
from dataprep.connector import connect
from numpy import radians, sin, cos, arctan2, sqrt

def distance_in_km(cord1, cord2):
    R = 6373.0

    lat1 = radians(cord1[1])
    lon1 = radians(cord1[0])
    lat2 = radians(cord2[1])
    lon2 = radians(cord2[0])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * arctan2(sqrt(a), sqrt(1 - a))
    distance = R * c

    return(distance)

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
METRO_TOWN = [-122.9987, 49.2250]
METRO_TOWN_string = '%f,%f' % (METRO_TOWN[0], METRO_TOWN[1])
nearest_petro = await conn_map.query('place', q='gas station', sort='distance', location=METRO_TOWN_string, page='1', pageSize = '1')
print('Metropolist is %fkm from the nearest gas station' % distance_in_km(METRO_TOWN, nearest_petro['coordinates'][0]))
print('The gas station is %s at %s' % (nearest_petro['name'][0], nearest_petro['address'][0]))

Metropolist is 0.376580km from the nearest gas station

The gas station is Chevron at 4692 Imperial St

</details> <details> <summary>In BC, which city has the most amount of shopping centers?</summary>
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
BC_BBOX = "-139.06,48.30,-114.03,60.00"
GROCERY = 'sic:541105'
shop_list = await conn_map.query("place", sort="relevance", bbox=BC_BBOX, category=GROCERY, _count=500)
shop_list = shop_list[shop_list["state"] == "BC"]
shop_list.groupby('city')['name'].count().sort_values(ascending=False).head(10)
citycount
Vancouver42
Victoria24
Surrey15
Burnaby14
......
North Vancouver8
</details> <details> <summary>Where is the nearest grocery of SFU? How many miles far? And how much time estimated for driving?</summary>
from dataprep.connector import connect

# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
SFU_LOC = '-122.90416, 49.27647'
GROCERY = 'sic:541105'
nearest_grocery = await conn_map.query("place", location=SFU_LOC, sort="distance", category=GROCERY)
destination = nearest_grocery.iloc[0]['details']
name = nearest_grocery.iloc[0]['name']
route = await conn_map.query("route", from_='8888 University Drive E, Burnaby', to=destination)
total_distance = sum([float(i)for i in route.iloc[:]['distance']])
total_time = sum([int(i)for i in route.iloc[:]['time']])
print('The nearest grocery of SFU is ' + name + '. It is ' + str(total_distance) + ' miles far, and It is expected to take ' + str(total_time // 60) + 'm' + str(total_time % 60)+'s of driving.')
route

The nearest grocery of SFU is Nesters Market. It is 1.234 miles far, and It is expected to take 3m21s of driving.

idindexnarrativedistancetime
00Start out going east on University Dr toward Arts Rd.0.34857
11Turn left to stay on University Dr.0.60684
22Enter next roundabout and take the 1st exit onto University High St.0.2860
339000 UNIVERSITY HIGH STREET is on the left.00
</details>

Lifestyle

Spoonacular -- Collect Recipe, Food, and Nutritional Information Data

<details> <summary>Which foods are unhealthy, i.e.,have high carbs and high fat content?</summary>
from dataprep.connector import connect
import pandas as pd

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df = await dc.query('recipes_by_nutrients', minFat=65, maxFat=100, minCarbs=75, maxCarbs=100, _count=20)

df["calories"] = pd.to_numeric(df["calories"]) # convert string type to numeric
df = df[df['calories']>1100] # considering foods with more than 1100 calories per serving to be unhealthy

df[["title","calories","fat","carbs"]].sort_values(by=['calories'], ascending=False)
idtitlecaloriesfatcarbs
2Brownie Chocolate Chip Cheesecake121092g79g
8Potato-Cheese Pie120880g96g
0Stuffed Shells with Beef and Broc119272g81g
3Coconut Crusted Rockfish118772g92g
4Grilled Ratatouille114382g88g
7Pecan Bars112184g91g
</details> <details> <summary>Which meat dishes are rich in proteins?</summary>
from dataprep.connector import connect

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df = await dc.query('recipes', query='beef', diet='keto', minProtein=25, maxProtein=60, _count=5)
df = df[["title","nutrients"]]

# Output of 'nutrients' column : [{'title': 'Protein', 'amount': 22.3768, 'unit': 'g'}]
g = [] # to extract the exact amount of Proteins in grams and store as list
for i in df["nutrients"]:
  z = i[0]
  g.append(z['amount'])
  
df.insert(1,'Protein(g)',g)
df[["title","Protein(g)"]].sort_values(by='Protein(g)',ascending=False)
idtitleProtein(g)
3Strip steak with roasted cherry tomatoes and v...56.2915
0Low Carb Brunch Burger53.7958
2Entrecote Steak with Asparagus41.6676
1Italian Style Meatballs35.9293
</details> <details> <summary>Which Italian Vegan dishes are popular?</summary>
from dataprep.connector import connect

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df = await dc.query('recipes', query='popular veg dishes', cuisine='italian', diet='vegan', _count=20)
df[["title"]]
idTitle
0Vegan Pea and Mint Pesto Bruschetta
1Gluten Free Vegan Gnocchi
2Fresh Tomato Risotto with Grilled Green Vegeta...
</details> <details> <summary>What are the top 5 liked chicken recipes with common ingredients?</summary>
from dataprep.connector import connect
import pandas as pd

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df= await dc.query('recipes_by_ingredients', ingredients='chicken,buttermilk,salt,pepper')
df['likes'] = pd.to_numeric(df['likes'])

df[['title', 'likes']].sort_values(by=['likes'], ascending=False).head(5)
idtitlelikes
9Oven-Fried Ranch Chicken561
1Fried Chicken and Wild Rice Waffles with Pink ...78
6CCC: Carla Hall’s Fried Chicken47
2Buttermilk Fried Chicken12
0My Pantry Shelf10
</details> <details> <summary>What is the average calories for high calorie Korean foods?</summary>
from dataprep.connector import connect
from statistics import mean 

dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)

df = await dc.query('recipes', query='korean', minCalories = 500)
nutri = df['nutrients'].tolist()

calories = []
for i in range(len(nutri)):
  calories.append(nutri[i][0]['amount'])

print('Average calories for high calorie Korean foods:', mean(calories),'kcal')

Average calories for high calorie Korean foods: 644.765 kcal

</details>

Music

MusixMatch -- Collect Music Lyrics Data

<details> <summary>What is Katy Perry's Twitter URL?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("artist_info", artist_mbid = "122d63fc-8671-43e4-9752-34e846d62a9c")

df[['name', 'twitter_url']]
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>name</th> <th>twitter_url</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Katy Perry</td> <td>https://twitter.com/katyperry</td> </tr> </tbody> </table> </div> </details> <details> <summary>What album is the song "Gone, Gone, Gone" in?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("track_matches", q_track = "Gone, Gone, Gone")

df[['name', 'album_name']]
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>name</th> <th>album_name</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Gone, Gone, Gone</td> <td>The World From the Side of the Moon</td> </tr> </tbody> </table> </div> </details> <details> <summary>Which artist/artists group is most popular in Canada?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("top_artists", country = "Canada")

df['name'][0]
'BTS'
</details> <details> <summary>How many genres are in the Musixmatch database?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("genres")

len(df)
362
</details> <details> <summary>Who is the most popular American artist named Michael?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, _concurrency = 5)

df = await conn_musixmatch.query("artists", q_artist = "Michael")
df = df[df['country'] == "US"].sort_values('rating', ascending=False)

df['name'].iloc[0]
'Michael Jackson'
</details> <details> <summary>What is the genre of the album "Atlas"?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

album = await conn_musixmatch.query("album_info", album_id = 11339785)
genres = await conn_musixmatch.query("genres")
album_genre = genres[genres['id'] == album['genre_id'][0][0]]['name']

album_genre.iloc[0]
'Soundtrack'
</details> <details> <summary>What is the link to lyrics of the most popular song in the album "Yellow"?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, _concurrency = 5)

df = await conn_musixmatch.query("album_tracks", album_id = 10266231)
df = df.sort_values('rating', ascending=False)

df['track_share_url'].iloc[0]
'https://www.musixmatch.com/lyrics/Coldplay/Yellow?utm_source=application&utm_campaign=api&utm_medium=SFU%3A1409620992740'
</details> <details> <summary>What are Lady Gaga's albums from most to least recent?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, update = True)

df = await conn_musixmatch.query("artist_albums", artist_mbid = "650e7db6-b795-4eb5-a702-5ea2fc46c848", s_release_date = "desc")

df.name.unique()
array(['Chromatica', 'Stupid Love',
       'A Star Is Born (Original Motion Picture Soundtrack)', 'Your Song'],
      dtype=object)
</details> <details> <summary>Which artists are similar to Lady Gaga?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

df = await conn_musixmatch.query("related_artists", artist_mbid = "650e7db6-b795-4eb5-a702-5ea2fc46c848")

df
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>id</th> <th>name</th> <th>rating</th> <th>country</th> <th>twitter_url</th> <th>updated_time</th> <th>artist_alias_list</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>6985</td> <td>Cast</td> <td>41</td> <td></td> <td></td> <td>2015-03-29T03:32:49Z</td> <td>[キャスト]</td> </tr> <tr> <th>1</th> <td>7014</td> <td>black eyed peas</td> <td>77</td> <td>US</td> <td>https://twitter.com/bep</td> <td>2016-06-30T10:07:05Z</td> <td>[The Black Eyed Peas, ブラック・アイド・ピーズ, heiyandoud...</td> </tr> <tr> <th>2</th> <td>269346</td> <td>OneRepublic</td> <td>74</td> <td>US</td> <td>https://twitter.com/OneRepublic</td> <td>2015-01-07T08:21:52Z</td> <td>[ワンリパブリツク, Gong He Shi Dai, Timbaland presents...</td> </tr> <tr> <th>3</th> <td>276451</td> <td>Taio Cruz</td> <td>60</td> <td>GB</td> <td></td> <td>2016-06-30T10:32:58Z</td> <td>[タイオ クルーズ, tai ou ke lu zi, Trio Cruz, Jacob M...</td> </tr> <tr> <th>4</th> <td>409736</td> <td>Inna</td> <td>54</td> <td>RO</td> <td>https://twitter.com/inna_ro</td> <td>2014-11-13T03:37:43Z</td> <td>[インナ]</td> </tr> <tr> <th>5</th> <td>475281</td> <td>Skrillex</td> <td>62</td> <td>US</td> <td>https://twitter.com/Skrillex</td> <td>2013-11-05T11:28:57Z</td> <td>[スクリレックス, shi qi lei ke si, Sonny, Skillrex]</td> </tr> <tr> <th>6</th> <td>13895270</td> <td>Imagine Dragons</td> <td>82</td> <td>US</td> <td>https://twitter.com/Imaginedragons</td> <td>2013-11-05T11:30:28Z</td> <td>[イマジン・ドラゴンズ, IMAGINE DRAGONS]</td> </tr> <tr> <th>7</th> <td>27846837</td> <td>Shawn Mendes</td> <td>80</td> <td>CA</td> <td></td> <td>2015-02-17T10:33:56Z</td> <td>[ショーン・メンデス, xiaoenmengdezi]</td> </tr> <tr> <th>8</th> <td>33491890</td> <td>Rihanna</td> <td>81</td> <td>GB</td> <td>https://twitter.com/rihanna</td> <td>2018-10-15T20:32:58Z</td> <td>[りあーな, Rihanna, 蕾哈娜, Rhianna, Riannah, Robyn R...</td> </tr> <tr> <th>9</th> <td>33491981</td> <td>Avicii</td> <td>74</td> <td>SE</td> <td>https://twitter.com/avicii</td> <td>2018-04-20T18:27:01Z</td> <td>[アヴィーチー, ai wei qi, Avicci]</td> </tr> </tbody> </table> </div> </details> <details> <summary>What are the highest rated songs in Canada from highest to lowest popularity?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, _concurrency = 5)

df = await conn_musixmatch.query("top_tracks", country = 'CA')

df[df['is_explicit'] == 0].sort_values('rating', ascending = False).reset_index()
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>index</th> <th>id</th> <th>name</th> <th>rating</th> <th>commontrack_id</th> <th>has_instrumental</th> <th>is_explicit</th> <th>has_lyrics</th> <th>has_subtitles</th> <th>album_id</th> <th>album_name</th> <th>artist_id</th> <th>artist_name</th> <th>track_share_url</th> <th>updated_time</th> <th>genres</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>5</td> <td>201621042</td> <td>Dynamite</td> <td>99</td> <td>114947355</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>39721115</td> <td>Dynamite - Single</td> <td>24410130</td> <td>BTS</td> <td>https://www.musixmatch.com/lyrics/BTS/Dynamite...</td> <td>2021-01-15T16:40:48Z</td> <td>[Pop]</td> </tr> <tr> <th>1</th> <td>9</td> <td>187880919</td> <td>Before You Go</td> <td>99</td> <td>103153140</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2019-11-20T08:44:05Z</td> <td>[Pop, Alternative]</td> </tr> <tr> <th>2</th> <td>7</td> <td>189704353</td> <td>Breaking Me</td> <td>98</td> <td>105304416</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>34892017</td> <td>Keep On Loving</td> <td>42930474</td> <td>Topic feat. A7S</td> <td>https://www.musixmatch.com/lyrics/Topic-8/Brea...</td> <td>2021-01-19T16:57:29Z</td> <td>[House, Dance]</td> </tr> <tr> <th>3</th> <td>3</td> <td>189626475</td> <td>Watermelon Sugar</td> <td>95</td> <td>103096346</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>36101498</td> <td>Fine Line</td> <td>24505463</td> <td>Harry Styles</td> <td>https://www.musixmatch.com/lyrics/Harry-Styles...</td> <td>2020-02-14T08:07:12Z</td> <td>[Music]</td> </tr> </tbody> </table> </div> </details> <details> <summary>What are other songs in the same album as the song "Before You Go"?</summary>
from dataprep.connector import connect

# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})

song = await conn_musixmatch.query("track_info", commontrack_id = 103153140)
album = await conn_musixmatch.query("album_tracks", album_id = song["album_id"][0])

album
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>id</th> <th>name</th> <th>rating</th> <th>commontrack_id</th> <th>has_instrumental</th> <th>is_explicit</th> <th>has_lyrics</th> <th>has_subtitles</th> <th>album_id</th> <th>album_name</th> <th>artist_id</th> <th>artist_name</th> <th>track_share_url</th> <th>updated_time</th> <th>genres</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>186884178</td> <td>Grace</td> <td>31</td> <td>87857108</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2019-04-09T10:21:29Z</td> <td>[Folk-Rock]</td> </tr> <tr> <th>1</th> <td>186884184</td> <td>Bruises</td> <td>68</td> <td>70395936</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2020-07-31T12:58:04Z</td> <td>[Music, Alternative]</td> </tr> <tr> <th>2</th> <td>186884187</td> <td>Hold Me While You Wait</td> <td>89</td> <td>95176135</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2020-08-02T07:23:21Z</td> <td>[Music]</td> </tr> <tr> <th>3</th> <td>186884189</td> <td>Someone You Loved</td> <td>95</td> <td>89461086</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2020-06-22T15:34:07Z</td> <td>[Pop, Alternative]</td> </tr> <tr> <th>4</th> <td>186884190</td> <td>Maybe</td> <td>31</td> <td>95541701</td> <td>0</td> <td>1</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2019-05-20T11:41:00Z</td> <td>[Music]</td> </tr> <tr> <th>5</th> <td>186884191</td> <td>Forever</td> <td>67</td> <td>95541702</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2019-11-18T10:46:36Z</td> <td>[Music]</td> </tr> <tr> <th>6</th> <td>186884192</td> <td>One</td> <td>31</td> <td>95541699</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2019-05-19T04:08:23Z</td> <td>[Music]</td> </tr> <tr> <th>7</th> <td>186884193</td> <td>Don't Get Me Wrong</td> <td>31</td> <td>95541698</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2019-12-20T08:25:26Z</td> <td>[Music]</td> </tr> <tr> <th>8</th> <td>186884194</td> <td>Hollywood</td> <td>31</td> <td>95541700</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2019-05-21T08:00:54Z</td> <td>[Music]</td> </tr> <tr> <th>9</th> <td>186884195</td> <td>Lost on You</td> <td>31</td> <td>73530089</td> <td>0</td> <td>0</td> <td>1</td> <td>1</td> <td>35611759</td> <td>Divinely Uninspired To A Hellish Extent (Exten...</td> <td>33258132</td> <td>Lewis Capaldi</td> <td>https://www.musixmatch.com/lyrics/Lewis-Capald...</td> <td>2020-03-17T08:35:18Z</td> <td>[Alternative]</td> </tr> </tbody> </table> </div> </details>

Spotify -- Collect Albums, Artists, and Tracks Metadata

<details> <summary>How many followers does Eminem have?</summary>
from dataprep.connector import connect

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

df = await conn_spotify.query("artist", q="Eminem", _count=500)

df.loc[df['# followers'].idxmax(), '# followers']
41157398
</details> <details> <summary>How many singles does Pink Floyd have that are available in Canada?</summary>
from dataprep.connector import connect

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

artist_name = "Pink Floyd"
df = await conn_spotify.query("album", q = artist_name, _count = 500)

df = df.loc[[(artist_name in x) for x in df['artist']]]
df = df.loc[[('CA' in x) for x in df['available_markets']]]
df = df.loc[df['total_tracks'] == '1']
df.shape[0]
12
</details> <details> <summary>In the last quarter of 2020, which artist released the album with the most tracks?</summary>
from dataprep.connector import connect
import pandas as pd

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

df = await conn_spotify.query("album", q = "2020", _count = 500)

df['date'] = pd.to_datetime(df['release_date'])
df = df[df['date'] > '2020-10-01'].drop(columns = ['image url', 'external urls', 'release_date'])
df['total_tracks'] = df['total_tracks'].astype(int)
df = df.loc[df['total_tracks'].idxmax()]
print(df['album_name'] + ", by " + df['artist'][0] + ", tracks: " + str(df['total_tracks']))
ASOT 996 - A State Of Trance Episode 996 (Top 50 Of 2020 Special), by Armin van Buuren ASOT Radio, tracks: 172
</details> <details> <summary>Who is the most popular artist: Eminem, Beyonce, Pink Floyd and Led Zeppelin</summary>
# and what are their popularity ratings?
from dataprep.connector import connect

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

artists_and_num_followers = []
for artist in ['Beyonce', 'Pink Floyd', 'Eminem', 'Led Zeppelin']:
    df = await conn_spotify.query("artist", q = artist, _count = 500) 
    num_followers = df.loc[df['# followers'].idxmax(), 'popularity']
    artists_and_num_followers.append((artist, num_followers))

print(sorted(artists_and_num_followers, key=lambda x: x[1], reverse=True))
[('Eminem', 94.0), ('Beyonce', 88.0), ('Pink Floyd', 83.0), ('Led Zeppelin', 81.0)]```python
</details> <details> <summary>Who are the top 5 artists with the most followers from the current Billboard top 100 artists?</summary>
from dataprep.connector import connect
from bs4 import BeautifulSoup
import requests

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

web_page = requests.get("https://www.billboard.com/charts/artist-100")
html_soup = BeautifulSoup(web_page.text, 'html.parser')
artist_100 = html_soup.find_all('span', class_ = 'chart-list-item__title-text')

artists = {}
artists_top5 = []
for artist in artist_100:
    df_temp = await conn_spotify.query("artist", q = artist.text.strip(), _count = 10)
    df_temp = df_temp.loc[df_temp['popularity'].idxmax()]
    artists[df_temp['name']] = df_temp['# followers']
artists_top5 = sorted(artists, key = artists.get, reverse = True)[:5]
artists_top5
['Ed Sheeran', 'Ariana Grande', 'Drake', 'Justin Bieber', 'Eminem']
</details> <details> <summary>For a list of top 10 most popular albums from rollingstone.com which album has most selling markets (countries) around the world in 2020?</summary>
from dataprep.connector import connect
import asyncio

# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)

def count_markets(text):
    lst = text.split(',')
    return len(lst)

album_artists = ["Folklore", "Fetch the Bolt Cutters", "YHLQMDLG", "Rough and Rowdy Ways", "Future Nostalgia",
                 "RTJ4", "Saint Cloud", "Eternal Atake", "What’s Your Pleasure", "Punisher"]

album_list = [conn_spotify.query("album", q = name, _count = 1) for name in album_artists]
combined = asyncio.gather(*album_list)
df = pd.concat(await combined).reset_index()
df = df.drop(columns = ['image url', 'external urls', 'index'])
df['market_count'] = df['available_markets'].apply(lambda x: count_markets(x))
df = df.loc[df['market_count'].idxmax()]
print(df['album_name'] + ", by " + df['artist'][0] + ", with " + str(df['market_count']) + " avalible countries")
folklore, by Taylor Swift, with 92 avalible countries

</details>

iTunes — Collect iTunes Data

<details> <summary>What are all Jack Johnson audio and video content?</summary>
from dataprep.connector import connect

conn_itunes = connect('itunes')
df = await conn_itunes.query('search', term="jack+johnson")
df
idTypekindartistNamecollectionNametrackNametrackTime
0tracksongJack JohnsonJack Johnson and Friends: Sing-A-Longs and Lul...Upside Down208643
1tracksongJack JohnsonIn Between Dreams (Bonus Track Version)Better Together207679
2tracksongJack JohnsonIn Between Dreams (Bonus Track Version)Sitting, Waiting, Wishing183721
.....................
49tracksongJack JohnsonSleep Through the StaticWhile We Wait86112
</details> <details> <summary>How to compute the average track time of Rich Brian's music videos?</summary>
from dataprep.connector import connect

conn_itunes = connect('itunes')
df = await conn_itunes.query("search", term="rich+brian", entity="musicVideo")
avg_track_time = df['trackTime'].mean()/(1000*60)
print("The average track time is {:.3} minutes.".format(avg_track_time))

The average track time is 4.13 minutes.

</details> <details> <summary>How to get all Ang Lee's movies which are made in the Unite States?</summary>
from dataprep.connector import connect

conn_itunes = connect('itunes')
df = await conn_itunes.query("search", term="Ang+Lee", entity="movie", country="us")
df = df[df['artistName']=='Ang Lee']
df
idtypekindartistNamecollectionNametrackNametrackTime
0trackfeature-movieAng LeeFox 4K HDR Drama CollectionLife of Pi7642675
1trackfeature-movieAng LeeNoneGemini Man7049958
.....................
11trackfeature-movieAng LeeNoneRide With the Devil8290498
</details>

Networking

IPLegit -- Collect IP Address Data

<details> <summary>How can I check if an IP address is bad, so I can block it from accessing my website?</summary>
from dataprep.connector import connect

# You can get ”iplegit_access_token“ by registering as a developer https://rapidapi.com/IPLegit/api/iplegit
conn_iplegit = connect('iplegit', _auth={'access_token':iplegit_access_token})

ip_addresses = ['16.210.143.176', 
                '98.124.198.1', 
                '182.50.236.215', 
                '90.104.138.217', 
                '61.44.131.150', 
                '210.64.150.243', 
                '89.141.156.184']

for ip in ip_addresses:
    ip_status = await conn_iplegit.query('status', ip=ip)
    bad_status = ip_status['bad_status'].get(0)
    if bad_status == True:
        print('block ip address: ', ip_status['ip'].get(0))

block ip address: 98.124.198.1

</details> <details> <summary>What country are most people from who have visited my website?</summary>
from dataprep.connector import connect
import pandas as pd

# You can get ”iplegit_access_token“ by registering as a developer https://rapidapi.com/IPLegit/api/iplegit
conn_iplegit = connect('iplegit', _auth={'access_token':iplegit_access_token})

ip_addresses = ['16.210.143.176', 
                '98.124.198.1', 
                '182.50.236.215', 
                '90.104.138.217', 
                '61.44.131.150', 
                '210.64.150.243', 
                '89.141.156.184',
                '85.94.168.133', 
                '98.14.201.52', 
                '98.57.106.207', 
                '185.254.139.250', 
                '206.246.126.82', 
                '147.44.75.68', 
                '123.42.224.40', 
                '253.29.140.44', 
                '97.203.209.153', 
                '196.63.36.253']

ip_details = []
for ip in ip_addresses:
    ip_details.append(await conn_iplegit.query('details', ip=ip))

df = pd.concat(ip_details)
df.country.mode().get(0)

'UNITED STATES'

</details> <details> <summary>Make a map showing locations of people who have visited my website.</summary>
from dataprep.connector import connect
import pandas as pd
from shapely.geometry import Point
import geopandas as gpd
from geopandas import GeoDataFrame

# You can get ”iplegit_access_token“ by registering as a developer https://rapidapi.com/IPLegit/api/iplegit
conn_iplegit = connect('iplegit', _auth={'access_token':iplegit_access_token})

ip_addresses = ['16.210.143.176', 
                '98.124.198.1', 
                '182.50.236.215', 
                '90.104.138.217', 
                '61.44.131.150', 
                '210.64.150.243', 
                '89.141.156.184',
                '85.94.168.133', 
                '98.14.201.52', 
                '98.57.106.207', 
                '185.254.139.250', 
                '206.246.126.82', 
                '147.44.75.68', 
                '123.42.224.40', 
                '253.29.140.44', 
                '97.203.209.153', 
                '196.63.36.253']

ip_details = []
for ip in ip_addresses:
    ip_details.append(await conn_iplegit.query('details', ip=ip))

df = pd.concat(ip_details)
geometry = [Point(xy) for xy in zip(df['longitude'], df['latitude'])]
gdf = GeoDataFrame(df, geometry=geometry)   

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf.plot(ax=world.plot(figsize=(10, 6)), marker='o', color='red', markersize=15);

png

</details>

News

Guardian -- Collect Guardian News Data

<details> <summary>Which news section contain most mentions related to bitcoin ?</summary>
from dataprep.connector import connect, info, Connector
import pandas as pd

conn_guardian = connect('guardian', update = True, _auth={'access_token': API_key}, concurrency=3)
df3 = await conn_guardian.query('article', _q='covid 19', _count=1000)
df3.groupby('section').count().sort_values("headline", ascending=False)

sectionheadlineurlpublish_date
World news378378378
Business103103103
US news767676
Opinion727272
Sport535353
Australia news494949
Society444444
Politics343434
Football282828
Global development262626
UK news262626
Education171717
Environment141414
Technology101010
Film101010
Science888
Books888
Life and style777
Television & radio666
Media444
Culture444
Stage444
News444
Travel222
WEHI: Brighter together222
Xero: Resilient business222
Money222
The new rules of work111
LinkedIn: Hybrid workplace111
Global111
Getting back on track111
Westpac Scholars: Rethink tomorrow111
Food111
All together111
</details> <details> <summary>Find articles with covid precautions ?</summary>
from dataprep.connector import connect, Connector

conn_guardian = connect('guardian', update = True, _auth={'access_token': API_key}, concurrency=3)
df2 = await conn_guardian.query('article', _q='covid 19 protect',  _count=100)
df2[df2.section=='Opinion']
idheadlinesectionurlpublish_date
0Billionaires made $1tn since Covid-19. They ca...Opinionhttps://www.theguardian.com/commentisfree/2020...2020-12-09T11:32:20Z
1Jeff Bezos became even richer thanks to Covid-...Opinionhttps://www.theguardian.com/commentisfree/2020...2020-12-13T07:30:00Z
20Here's how to tackle the Covid-19 anti-vaxxers...Opinionhttps://www.theguardian.com/commentisfree/2020...2020-11-26T16:02:14Z
41Can the UK deliver on the Covid vaccine rollou...Opinionhttps://www.theguardian.com/commentisfree/2020...2020-12-11T09:00:24Z
68Covid-19 has turned back the clock on working ...Opinionhttps://www.theguardian.com/commentisfree/2020...2020-12-10T14:19:27Z
84The Guardian view on Covid-19 promises: season...Opinionhttps://www.theguardian.com/commentisfree/2020...2020-12-14T18:42:10Z
88The Guardian view on responding to the Covid-1...Opinionhttps://www.theguardian.com/commentisfree/2020...2020-12-30T18:58:05Z
</details>

Times -- Collect New York Times Data

<details> <summary>Who is the author of article 'Yellen Outlines Economic Priorities, and Republicans Draw Battle Lines'</summary>
from dataprep.connector import connect

# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
df = await conn_times.query('ac',q='Yellen Outlines Economic Priorities, and Republicans Draw Battle Lines')
df[["authors"]]
idauthors
0By Alan Rappeport
</details> <details> <summary>What is the newest news from Ottawa</summary>
from dataprep.connector import connect

# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
df = await conn_times.query('ac',q="ottawa",sort='newest')
df[['headline','authors','abstract','url','pub_date']].head(1)
headline...pub_date
021 Men Accuse Lincoln Project Co-Founder of Online Harassment...2021-01-31T14:48:35+0000
</details> <details> <summary>What are Headlines of articles where Trump was mentioned in the last 6 months of 2020 in the technology news section</summary>
from dataprep.connector import connect

# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
df = await conn_times.query('ac',q="Trump",fq='section_name:("technology")',begin_date='20200630',end_date='20201231',sort='newest', _count=50)

print(df['headline'])
print("Trump was mentioned in " + str(len(df)) + " articles")
idheadline
0No, Trump cannot win Georgia’s electoral votes through a write-in Senate campaign.
1How Misinformation ‘Superspreaders’ Seed False Election Theories
2No, Trump’s sister did not publicly back him. He was duped by a fake account.
.....
49Trump Official’s Tweet, and Its Removal, Set Off Flurry of Anti-Mask Posts

Trump was mentioned in 50 articles

</details> <details> <summary>What is the ranking of times a celebrity is mentioned in a headline in latter half of 2020?</summary>
from dataprep.connector import connect
import pandas as pd
# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
celeb_list = ['Katy Perry', 'Taylor Swift', 'Lady Gaga', 'BTS', 'Rihanna', 'Kim Kardashian']
number_of_mentions = []
for i in celeb_list:
    df1 = await conn_times.query('ac',q=i,begin_date='20200630',end_date='20201231')
    df1 = df1[df1['headline'].str.contains(i)]
    a = len(df1['headline'])
    number_of_mentions.append(a)

print(number_of_mentions)
    
ranking_df = pd.DataFrame({'name': celeb_list, 'number of mentions': number_of_mentions})
ranking_df = ranking_df.sort_values(by=['number of mentions'], ascending=False)
ranking_df

[2, 6, 3, 6, 1, 0]

namenumber of mentions
1Taylor Swift6
3BTS6
2Lady Gaga3
0Katy Perry2
4Rihanna1
5Kim Kardashian0
</details>

Science

DBLP -- Collect Computer Science Publication Data

<details> <summary>Who wrote this paper?</summary>
from dataprep.connector import connect
conn_dblp = connect("dblp")
df = await conn_dblp.query("publication", q = "Scikit-learn: Machine learning in Python", _count = 1)
df[["title", "authors", "year"]]
idtitleauthorsyear
0Scikit-learn - Machine Learning in Python.[Fabian Pedregosa, Gaël Varoquaux, Alexandre G...2011
</details> <details> <summary>How to fetch all publications of Andrew Y. Ng?</summary>
from dataprep.connector import connect

conn_dblp = connect("dblp", _concurrency = 5)
df = await conn_dblp.query("publication", author = "Andrew Y. Ng", _count = 2000)
df[["title", "authors", "venue", "year"]].reset_index(drop=True)
idtitleauthorsvenueyear
0The 1st Agriculture-Vision Challenge - Methods...[Mang Tik Chiu, Xingqian Xu, Kai Wang, Jennife...[CVPR Workshops]2020
...............
242An Experimental and Theoretical Comparison of ...[Michael J. Kearns, Yishay Mansour, Andrew Y. ...[COLT]1995
</details> <details> <summary>How to fetch all publications of NeurIPS 2020?</summary>
from dataprep.connector import connect

conn_dblp = connect("dblp", _concurrenncy = 5)
df = await conn_dblp.query("publication", q = "NeurIPS 2020", _count = 5000)

# filter non-neurips-2020 papers
mask = df.venue.apply(lambda x: 'NeurIPS' in x)
df = df[mask]
df = df[(df['year'] == '2020')]
df[["title", "venue", "year"]].reset_index(drop=True)
idtitlevenueyear
0Towards More Practical Adversarial Attacks on ...[NeurIPS]2020
............
1899Triple descent and the two kinds of overfittin...[NeurIPS]2020
</details>

Shopping

Etsy -- Collect Handmade Marketplace Data.

<details> <summary>What are the products I can get when I search for "winter jackets"?</summary>
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth={'access_token': etsy_access_key}, _concurrency = 5)
# Item search
df = await conn_etsy.query("items", keywords = "winter jackets")
df[['title',"url","description","price","currency"]]
idtitleurldescriptionpricecurrencyquantity
0White coat,cashmere coat,wool jacket with belt...https://www.etsy.com/listing/646692584/white-c...★Please leave your phone number to me while yo...183.00USD1
1Vintage 90's Nike ACG Parka Jacket Large N...https://www.etsy.com/listing/937300597/vintage...Vintage 90's Nike ACG Parka Jacket Large N...110.00USD1
...... ...... ...... ............
24Miss yo 2018 Vintage Checker Jacket for Blythe...https://www.etsy.com/listing/613790308/miss-yo...~~ Welcome to our shop ~~\n\nSet include:\n1 Vin...52.00SGD1
</details> <details> <summary>What's the favorites for the shop “CrazedGaming”?</summary>
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth={'access_token': etsy_access_key}, _concurrency = 5)

# Shop search
df = await conn_etsy.query("shops", shop_name = "CrazedGaming",  _count = 1)
df[["name", "url", "favorites"]]
idNameUrlFavorites
0CrazedGaminghttps://www.etsy.com/shop/CrazedGaming?utm_sou...265
</details> <details> <summary>What are the top 10 custom photo pillows ranked by number of favorites?</summary>
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth = {"access_token": etsy_access_key}, _concurrency = 5)

# Item search sort by favorites
df_cp_pillow = await conn_etsy.query("items", keywords = "custom photo pillow", _count = 7000)
df_cp_pillow = df_cp_pillow.sort_values(by = ['favorites'], ascending = False)
df_top10_cp_pillow = df_cp_pillow.iloc[:10]
df_top10_cp_pillow[['title', 'price', 'currency', 'favorites', 'quantity']]
idtitlepricecurrencyfavoritesquantity
68Custom Pet Photo Pillow, Valentines Day Gift, ...29.99USD9619.0320.0
193Custom Shaped Dog Photo Pillow Personalized Mo...29.99USD5523.0941.0
374Custom PILLOW Pet Portrait - Pet Portrait Pill...49.95USD5007.074.0
196Personalized Cat Pillow Mothers Day Gift for M...29.99USD3839.0939.0
69Photo Sequin Pillow Case, Personalized Sequin ...25.49USD3662.0675.0
637Family photo sequin pillow | custom image reve...28.50USD3272.0540.0
44Custom Pet Pillow Custom Cat Pillow best cat l...20.95USD2886.014.0
646Sequin Pillow with Photo Personalized Photo Re...32.00USD2823.01432.0
633Personalized Name Pillow, Baby shower gift, Ba...16.00USD2511.06.0
4416Letter C pillow Custom letter Alphabet pillow ...24.00USD2284.04.0
</details> <details> <summary>What are the prices of active products for quantities (>10) for a particular searched keyword "blue 2021 weekly spiral planner"?</summary>
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth={'access_token': etsy_access_key}, _concurrency = 5)

# Item search and filters
planner_df = await conn_etsy.query("items", keywords = "blue 2021 weekly spiral planner", _count = 100)

result_df = planner_df[((planner_df['state'] == 'active') & (planner_df['quantity'] > 10))]
result_df
idtitlestateurldescriptionpricecurrencyquantityviewsfavorites
12021 Plaid About You Medium Daily Weekly Month...activehttps://www.etsy.com/listing/789842329/2021-pl...Planning and organizing life is a snap with th...15.99USD49610011
22021 Undated Diary Planner , Notebook Weekly D...activehttps://www.etsy.com/listing/917640414/2021-un...A6 2021 Yearly Monthly Weekly Agenda Planner ,...12.00GBP7923433168
.... ......... ...... .................
85July 2020-June 2021 Big Blue Year Large Daily ...activehttps://www.etsy.com/listing/776300099/july-20...This 12-month academic year planner offers a c...6.95USD49345431
</details> <details> <summary>What's the average price for blue denim frayed jacket on Etsy selling in USD currency?</summary>
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth = {"access_token": etsy_access_key}, _concurrency = 5)

# Item search and filters 
df_dbfjacket = await conn_etsy.query("items", keywords = "blue denim frayed jacket", _count = 500)
df_dbfjacket = df_dbfjacket[df_dbfjacket['currency'] == 'USD'].astype(float)

# Calculate average price
average_price = round(df_dbfjacket['price'].mean(), 2)
print("The average price for blue denim frayed jacket is: $", average_price)

The average price for blue denim frayed jacket is: $ 58.82

</details> <details> <summary>What are the top 10 viewed for keyword “ceramic wind chimes” with a given word “handmade” present in the description?</summary>
from dataprep.connector import connect

# You can get ”etsy_access_key“ by following https://www.etsy.com/developers/documentation/getting_started/oauth
conn_etsy = connect("etsy", _auth = {"access_token": etsy_access_key}, _concurrency = 5)

# Item search
df = await conn_etsy.query("items", keywords = "ceramic wind chimes",  _count = 2000)

# Filter and sorting
df = df[(df["description"].str.contains('handmade'))]
new_df = df[["title", "url", "views"]]
new_df.sort_values(by="views", ascending=False).reset_index(drop=True).head(10)
idtitleurlviews
0Hanging ceramic wind chime in gloss white glaz...https://www.etsy.com/listing/101462779/hanging...24406
1Trending Now! Best Seller Birthday Gift for Mo...https://www.etsy.com/listing/555128094/trendin...17058
2Beautiful Ceramic outdoor hanging wind chime -...https://www.etsy.com/listing/155966922/beautif...9758
3Wind Chime, Garden Yard Art for Outdoor Home D...https://www.etsy.com/listing/159252106/wind-ch...8850
4Ceramic cow bells | wind chime bell | wall han...https://www.etsy.com/listing/538608210/ceramic...6540
5Mom Gift Ideas Housewarming Gifts Garden Decor...https://www.etsy.com/listing/171539253/mom-gif...6123
6Ceramic Wind Chimes single strand Wall Hanging...https://www.etsy.com/listing/598234797/ceramic...5288
7Handcraft Ceramic Bird Wind Chime/ Bird Windch...https://www.etsy.com/listing/697798625/handcra...4733
8Glass Wind Chime Green Leaves Windchime Garden...https://www.etsy.com/listing/744753959/glass-w...4579
9Handmade ceramic and driftwood wind chimes Bea...https://www.etsy.com/listing/615210251/handmad...2774
</details>

Social

Twitch -- Collect Twitch Streams and Channels Information

<details> <summary>How many followers does the Twitch user "Logic" have?</summary>
from dataprep.connector import connect

# You can get ”twitch_access_token“ by registering https://www.twitch.tv/signup
conn_twitch = connect("twitch", _auth={"access_token":twitch_access_token}, _concurrency=3)

df = await conn_twitch.query("channels", query="logic", _count = 1000)

df = df.where(df['name'] == 'logic').dropna()
df = df[['name', 'followers']]
df.reset_index()
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>index</th> <th>name</th> <th>followers</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>0</td> <td>logic</td> <td>540274.0</td> </tr> </tbody> </table> </div> </details> <details> <summary>Which 5 Twitch users that speak English have the most views and what games do they play?</summary>
from dataprep.connector import connect

# You can get ”twitch_access_token“ by registering https://www.twitch.tv/signup
conn_twitch = connect("twitch", _auth={"access_token":twitch_access_token}, _concurrency=3)

df = await conn_twitch.query("channels",query="%", _count = 1000)

df = df[df['language'] == 'en']
df = df.sort_values('views', ascending = False)
df = df[['name', 'views', 'game', 'language']]
df = df.head(5)
df.reset_index()
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>index</th> <th>name</th> <th>views</th> <th>game</th> <th>language</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>495</td> <td>Fextralife</td> <td>1280705870</td> <td>The Elder Scrolls Online</td> <td>en</td> </tr> <tr> <th>1</th> <td>9</td> <td>Riot Games</td> <td>1265668908</td> <td>League of Legends</td> <td>en</td> </tr> <tr> <th>2</th> <td>16</td> <td>ESL_CSGO</td> <td>548559390</td> <td>Counter-Strike: Global Offensive</td> <td>en</td> </tr> <tr> <th>3</th> <td>160</td> <td>BeyondTheSummit</td> <td>462493560</td> <td>Dota 2</td> <td>en</td> </tr> <tr> <th>4</th> <td>1</td> <td>shroud</td> <td>433902453</td> <td>Rust</td> <td>en</td> </tr> </tbody> </table> </div> </details> <details> <summary>Which channel has the most viewers for each of the top 10 games?</summary>
from dataprep.connector import connect

# You can get ”twitch_access_token“ by registering https://www.twitch.tv/signup
conn_twitch = connect("twitch", _auth={"access_token":twitch_access_token}, _concurrency=3)

df = await conn_twitch.query("streams", query="%", _count = 1000)

# Group by games, sum viewers and sort by total viewers
df_new = df.groupby(['game'], as_index = False)['viewers'].agg('sum').rename(columns = {'game':'games', 'viewers':'total_viewers'})
df_new = df_new.sort_values('total_viewers',ascending = False)

# Select the channel with most viewers from each game 
df_2 = df.loc[df.groupby(['game'])['viewers'].idxmax()]

# Select the most popular channels for each of the 10 most popular games
df_new = df_new.head(10)['games']
best_games = df_new.tolist()
result_df = df_2[df_2['game'].isin(best_games)]
result_df = result_df.head(10)
result_df = result_df[['game','channel_name', 'viewers']]
result_df.reset_index()
<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>index</th> <th>game</th> <th>channel_name</th> <th>viewers</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>3</td> <td></td> <td>seonghwazip</td> <td>32126</td> </tr> <tr> <th>1</th> <td>21</td> <td>Call of Duty: Warzone</td> <td>FaZeBlaze</td> <td>7521</td> </tr> <tr> <th>2</th> <td>9</td> <td>Dota 2</td> <td>dota2mc_ru</td> <td>16118</td> </tr> <tr> <th>3</th> <td>2</td> <td>Escape From Tarkov</td> <td>summit1g</td> <td>33768</td> </tr> <tr> <th>4</th> <td>15</td> <td>Fortnite</td> <td>Fresh</td> <td>10371</td> </tr> <tr> <th>5</th> <td>8</td> <td>Hearthstone</td> <td>SilverName</td> <td>16765</td> </tr> <tr> <th>6</th> <td>22</td> <td>Just Chatting</td> <td>Trainwreckstv</td> <td>6927</td> </tr> <tr> <th>7</th> <td>0</td> <td>League of Legends</td> <td>LCK_Korea</td> <td>77613</td> </tr> <tr> <th>8</th> <td>10</td> <td>Minecraft</td> <td>Tfue</td> <td>15209</td> </tr> <tr> <th>9</th> <td>11</td> <td>VALORANT</td> <td>TenZ</td> <td>13617</td> </tr> </tbody> </table> </div> </details> <details> <summary>(1) What is the number of Fortnite and Valorant streams in the past 24 hours? (2) Is there any relationship between viewers and channel followers? </summary>
from dataprep.connector import connect
import pandas as pd

# You can get ”twitch_access_token“ by registering https://www.twitch.tv/signup
conn_twitch = connect("twitch", _auth = {"access_token":twitch_access_token}, _concurrency = 3)

df = await conn_twitch.query("streams", query = "%fortnite%VALORANT%", _count = 1000)

df = df[['stream_created_at', 'game', 'viewers', 'channel_followers']]
df['stream_created_at'] = df['stream_created_at'].astype('str') # Convert date to string

for idx, value in enumerate(df['stream_created_at']):
    df.loc[idx,'stream_created_at'] = value[0:9] + ' ' + value[-9:-1] # Extract datetime

df['stream_created_at'] = pd.to_datetime(df['stream_created_at']) 
df['diff'] = pd.Timestamp.now().normalize() - df['stream_created_at'] 
df['diff'] = df['diff'].dt.total_seconds().astype('int') 

df2 = df[['channel_followers', 'viewers']].corr(method='pearson') # Find correlation (part 2)

df = df[df['diff'] > 864000] # Find streams in last 24 hours

options = ['Fortnite', 'VALORANT']
df = df[df['game'].isin(options)]
df = df.groupby(['game'], as_index=False)['diff'].agg('count').rename(columns={'diff':'count'})

# Print correlation part 2
print("Correlation between viewers and channel followers:")
print(df2)

# Print part 1
print('Number of streams in the past 24 hours:')
df
Correlation between viewers and channel followers:
                   channel_followers   viewers
channel_followers           1.000000  0.851698
viewers                     0.851698  1.000000

Number of streams in the past 24 hours:

<div> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>game</th> <th>count</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Fortnite</td> <td>3</td> </tr> <tr> <th>1</th> <td>VALORANT</td> <td>3</td> </tr> </tbody> </table> </div> </details>

Twitter -- Collect Tweets Information

<details> <summary>What are the 10 latest english tweets by SFU handle (@SFU) ?</summary>
from dataprep.connector import connect

dc = connect('twitter', _auth={'client_id':client_id, 'client_secret':client_secret})

# Querying 100 tweets from @SFU
df = await dc.query("tweets", _q="from:@SFU -is:retweet", _count=100)

# Filtering english language tweets
df = df[df['iso_language_code'] == 'en'][['created_at', 'text']]

# Displaying latest 10 tweets
df = df.iloc[0:10,]
print('-----------')
for index, row in df.iterrows():   
    print(row['created_at'], row['text'])
    print('-----------')
-----------
Mon Feb 01 23:59:16 +0000 2021 Thank you to these #SFU student athletes for sharing their insights. #BlackHistoryMonth2021 https://t.co/WGCvGrQOzu
-----------
Mon Feb 01 23:00:56 +0000 2021 How can #SFU address issues of inclusion &amp; access for #Indigenous students &amp; work with them to support their educat… https://t.co/knEM0SSHYu
-----------
Mon Feb 01 21:37:30 +0000 2021 DYK: New #SFU research shows media gender bias; men are quoted 3 times more often than women. #GenderGapTracker loo… https://t.co/c77PsNUIqV
-----------
Mon Feb 01 19:55:03 +0000 2021 With the temperatures dropping, how will you keep warm this winter? Check out our tips on what to wear (and footwea… https://t.co/EOCuYbio4P
-----------
Mon Feb 01 18:06:49 +0000 2021 COVID-19 has affected different groups in unique ways. #SFU researchers looked at the stresses facing “younger” old… https://t.co/gMvcxOlWvb
-----------
Mon Feb 01 16:18:51 +0000 2021 Please follow @TransLink for updates. https://t.co/nQDZQ5JYlt
-----------
Fri Jan 29 23:00:02 +0000 2021 #SFU researchers Caroline Colijn and Paul Tupper performed a modelling exercise to see if screening with rapid test… https://t.co/07aU3SP0j2
-----------
Fri Jan 29 19:01:32 +0000 2021 un/settled, a towering photo-poetic piece at #SFU's Belzberg Library, aims to centre Blackness &amp; celebrate Black th… https://t.co/F6kp0Lwu5A
-----------
Fri Jan 29 17:02:34 +0000 2021 Learning that it’s okay to ask for help is an important part of self-care—and so is recognizing when you don't have… https://t.co/QARn1CRLyp
-----------
Fri Jan 29 00:44:11 +0000 2021 @shashjayy @shashjayy Hi Shashwat, I've spoken to my colleagues in Admissions. They're looking into it and will respond to you directly.
-----------
</details> <details> <summary>What are top 10 users based on retweet count ?</summary>
from dataprep.connector import connect

dc = connect('twitter', _auth={'client_id':client_id, 'client_secret':client_secret})

# Querying 1000 retweets and filtering only english language tweets
df = await dc.query("tweets", q='RT AND is:retweet', _count=1000)
df = df[df['iso_language_code'] == 'en']

# Iterating over tweets to get users and Retweet Count
retweets = {}
for index, row in df.iterrows():
  if row['text'].startswith('RT'):
      # Eg. tweet 'RT @Crazyhotboye: NMS?\nLeveled up to 80' 
      user_retweeted = row['text'][4:row['text'].find(':')]
      if user_retweeted in retweets:
          retweets[user_retweeted] += 1
      else:
          retweets[user_retweeted] = 1
          
# Sorting and displaying top 10 users
cols = ['User', 'RT_Count']
retweets_df = pd.DataFrame(list(retweets.items()), columns=cols)
retweets_df = retweets_df.sort_values(by=['RT_Count'], ascending=False).reset_index(drop=True).iloc[0:10,:]
retweets_df
idUserRT_Count
0John_Greed195
1uEatCrayons85
2Demo2020cracy78
3store_pup75
4miknitem_oasis61
5MarkCrypto2354
6realmamivee52
7trailblazers50
8devilsvalentine40
9SharingforCari138
</details> <details> <summary>What are the trending topics (Top 10) in twitter now based on hashtags count?</summary>
from dataprep.connector import connect
import pandas as pd
import json

dc = connect('twitter', _auth={'client_id':client_id, 'client_secret':client_secret})

pd.options.mode.chained_assignment = None
df = await dc.query("tweets", q=False, _count=2000)

def extract_tags(tags):
  tags_tolist = json.loads(tags.replace("'", '"'))
  only_tag = [str(t['text']) for t in tags_tolist]
  return only_tag

# remove tweets which do not have hashtag
has_hashtags = df[df['hashtags'].str.len() > 2]
# only 'en' tweets are our interests
has_hashtags = has_hashtags[has_hashtags['iso_language_code'] == 'en']
has_hashtags['tag_list'] = has_hashtags['hashtags'].apply(lambda t: extract_tags(t))
tags_and_text = has_hashtags[['text','tag_list']]
tag_count = tags_and_text.explode('tag_list').groupby(['tag_list']).agg(tag_count=('tag_list', 'count'))
# remove tag with only one occurence
tag_count = tag_count[tag_count['tag_count'] > 1]
tag_count = tag_count.sort_values(by=['tag_count'], ascending=False).reset_index()
# Top 10 hashtags
tag_count = tag_count.iloc[0:10,:]
tag_count
idtag_listtag_count
0jobs52
1TractorMarch24
2corpsehusbandallegations22
3SidNaazians10
4GodMorningTuesday8
5SupremeGodKabir7
6hiring7
7نماز_راہ_نجات_ہے6
8London5
9TravelTuesday5
</details>

Video

Youtube -- Collect Youtube's Content MetaData.

<details> <summary>What are the top 10 Fitness Channels?</summary>
from dataprep.connector import connect, info

dc = connect('youtube', _auth={'access_token': auth_token})

df = await dc.query('videos', q='Fitness', part='snippet', type='channel', _count=10)
df[['title', 'description']]
idtitledescription
0Jordan Yeoh FitnessHey! Welcome to my Youtube channel! I got noth...
1FitnessBlender600 free full length workout videos & counting...
2The Fitness MarshallGet early access to dances by clicking here: h...
3POPSUGAR FitnessPOPSUGAR Fitness offers fresh fitness tutorial...
4LiveFitnessHi, I am Nicola and I love all things fitness!...
5TpindellFitnessStrive for progress, not perfection.
6Love Sweat FitnessMy personal weight loss journey of 45 pounds c...
7Martial Arts FitnessWelcome To My Channel. I love Martial Arts 🥇 ...
8Zuzka LightMy name is Zuzka Light, and my channel is all ...
9Fitness Factory LüdenscheidSchaut unter ff-luedenscheid.com Kostenlos übe...
</details> <details> <summary>Whats the top Playlists of a list of Singers?</summary>
from dataprep.connector import connect, info
import pandas as pd

dc = connect('youtube', _auth={'access_token': auth_token})

df = pd.DataFrame()
singers = [
  'taylor swift',
  'ed sheeran',
  'shawn mendes',
  'ariana grande',
  'michael jackson',
  'selena gomez',
  'lady gaga',
  'shreya ghoshal',
  'bruno mars',
  ]

for singer in singers:
  df1 = await dc.query('videos', q=singer, part='snippet', type='playlist',
                 _count=1)
  df = df.append(df1, ignore_index=True)

df[['title', 'description', 'channelTitle']]
idtitledescriptionchannelTitle
0Taylor Swift DiscographySarah Bella
1Ed Sheeran - New And Best Songs (2021)Best Of Ed Sheeran 2021 || Ed Sheeran Greatest...Full Albums!
2Shawn Mendes: The Album 2018 (Full Album)WorldMusicStream
3Ariana Grande - Positions (Full Album)October 30, 2020.lo115
4Michael Jackson MixMichael Jackson's Songs.Leo Meneses
5Selena Gomez - Rare [FULL ALBUM 2020]selena gomez,selena gomez rare album,selena go...THUNDERS
6Lady Gaga - Greatest HitsLady Gaga - Greatest Hits 01 The Edge Of Glory...Gunther Ruymen
7Shreya Ghoshal Tamil Hit Songs | #TamilSongs |...Sony Music South
8The Best of Bruno MarsWarner Music Australia
</details> <details> <summary>What are the top 10 sports activities?</summary>
from dataprep.connector import connect, info
import pandas as pd
dc = connect('youtube', _auth={'access_token': auth_token})

df = await dc.query('videos', q='Sports', part='snippet', type='activity', _count=10)
df[['title', 'description', 'channelTitle']]
titledescriptionchannelTitle
0Sports TakSports Tak, as the name suggests, is all about...Sports Tak
1Sportssport : an activity involving physical exertio...Sports
2Greatest Sports MomentsUPDATE: I AM IN THE PROCESS OF MAKING REVISION...WTD Productions
3Viagra Boys - Sports (Official Video)Director: Simon Jung DOP: Paul Evans Producer:...viagra boys
4Volleyball Open Tournament, Jagdev Kalan || 12...Volleyball Open Tournament, Jagdev Kalan || 12...Fine Sports
5Beach Bunny - Sportsbooking/inquires: beachbunnymusic@gmail.com hu...Beach Bunny
6Top 100 Best Sports Bloopers 2020Watch the Top 100 best sports bloopers from 20...Crazy Laugh Action
7Memorable Moments in Sports HistoryMemorable Moments in Sports History! SUBSCRİBE...Cenk Bezirci
8Craziest “Saving Lives” Moments in Sports HistoryCraziest “Saving Lives” Moments in Sports Hist...Highlight Reel
9Most Savage Sports Highlights on Youtube (S01E01)I do these videos ever year or so, they are ba...Joseph Vincent
</details>

Weather

OpenWeatherMap -- Collect Current and Historical Weather Data

<details> <summary>What is the temperature of London, Ontario?</summary>
from dataprep.connector import connect

owm_connector = connect("openweathermap", _auth={"access_token":access_token})
df = await owm_connector.query('weather',q='London,Ontario,CA')
df[["temp"]]
idtemp
0267.96
</details> <details> <summary>What is the wind speed in each provincial capital city?</summary>
from dataprep.connector import connect
import pandas as pd
import asyncio

conn = connect("openweathermap", _auth={'access_token':'899b50a47d4c9dad99b6c61f812b786e'}, _concurrency = 5)

names = ["Edmonton", "Victoria", "Winnipeg", "Fredericton", "St. John's", "Halifax", "Toronto", "Charlottetown", \
 "Quebec City", "Regina", "Yellowknife", "Iqaluit", "Whitehorse"]

query_list = [conn.query("weather", q = name) for name in names]
results = asyncio.gather(*query_list)
df = pd.concat(await results)
df['name'] = names
df[["name", "wind"]].reset_index(drop=True)
idnamewind
0Edmonton6.17
1Victoria1.34
2Winnipeg2.57
3Fredericton4.63
4St. John's5.14
5Halifax5.14
6Toronto1.76
7Charlottetown5.14
8Quebec City3.09
9Regina4.12
10Yellowknife3.60
11Iqaluit5.66
12Whitehorse9.77
</details>

⬆️ Back to Index

Contributors ✨

Thanks goes to these wonderful people (emoji key):

<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section --> <!-- prettier-ignore-start --> <!-- markdownlint-disable --> <table> <tr> <td align="center"><a href="http://wooya.me"><img src="https://avatars1.githubusercontent.com/u/998606?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Weiyuan Wu</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=dovahcrow" title="Code">💻</a> <a href="#maintenance-dovahcrow" title="Maintenance">🚧</a></td> <td align="center"><a href="http://www.sfu.ca/~peiw/"><img src="https://avatars0.githubusercontent.com/u/15167104?v=4?s=100" width="100px;" alt=""/><br /><sub><b>peiwangdb</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=peiwangdb" title="Code">💻</a> <a href="#maintenance-peiwangdb" title="Maintenance">🚧</a></td> <td align="center"><a href="https://github.com/nick-zrymiak"><img src="https://avatars0.githubusercontent.com/u/35017006?v=4?s=100" width="100px;" alt=""/><br /><sub><b>nick-zrymiak</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=nick-zrymiak" title="Code">💻</a> <a href="#maintenance-nick-zrymiak" title="Maintenance">🚧</a></td> <td align="center"><a href="https://www.pallavibharadwaj.com"><img src="https://avatars1.githubusercontent.com/u/17384838?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Pallavi Bharadwaj</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=pallavibharadwaj" title="Code">💻</a></td> <td align="center"><a href="https://www.linkedin.com/in/hilal-asmat/"><img src="https://avatars1.githubusercontent.com/u/28606148?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Hilal Asmat</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=h-asmat" title="Documentation">📖</a></td> <td align="center"><a href="https://github.com/Wukkkinz-0725"><img src="https://avatars.githubusercontent.com/u/60677420?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Wukkkinz-0725</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=Wukkkinz-0725" title="Code">💻</a> <a href="#maintenance-Wukkkinz-0725" title="Maintenance">🚧</a></td> <td align="center"><a href="https://github.com/Yizhou150"><img src="https://avatars.githubusercontent.com/u/62522644?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Yizhou</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=Yizhou150" title="Code">💻</a> <a href="#maintenance-Yizhou150" title="Maintenance">🚧</a></td> </tr> <tr> <td align="center"><a href="https://github.com/Lakshay-sethi"><img src="https://avatars.githubusercontent.com/u/58126894?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Lakshay-sethi</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=Lakshay-sethi" title="Code">💻</a> <a href="#maintenance-Lakshay-sethi" title="Maintenance">🚧</a></td> <td align="center"><a href="https://github.com/kla55"><img src="https://avatars.githubusercontent.com/u/65270679?v=4?s=100" width="100px;" alt=""/><br /><sub><b>kla55</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=kla55" title="Code">💻</a></td> <td align="center"><a href="https://github.com/hwec0112"><img src="https://avatars.githubusercontent.com/u/74725096?v=4?s=100" width="100px;" alt=""/><br /><sub><b>hwec0112</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=hwec0112" title="Code">💻</a></td> <td align="center"><a href="https://github.com/yxie66"><img src="https://avatars.githubusercontent.com/u/22486719?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Yi Xie</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=yxie66" title="Code">💻</a> <a href="#maintenance-yxie66" title="Maintenance">🚧</a></td> <td align="center"><a href="https://github.com/liuyejia"><img src="https://avatars.githubusercontent.com/u/10182591?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Yejia Liu</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=liuyejia" title="Code">💻</a></td> <td align="center"><a href="https://github.com/sahmad11"><img src="https://avatars.githubusercontent.com/u/53022377?v=4?s=100" width="100px;" alt=""/><br /><sub><b>sahmad11</b></sub></a><br /><a href="https://github.com/sfu-db/DataConnectorConfigs/commits?author=sahmad11" title="Code">💻</a></td> </tr> </table> <!-- markdownlint-restore --> <!-- prettier-ignore-end --> <!-- ALL-CONTRIBUTORS-LIST:END -->