I recently pulled data from youtube API, and I'm trying to create a data frame using that information.
When I loop through each item with the "print" function, I get 25 rows output for each variable (which is what I want in the data frame I create).
How can I create a new data frame that contains 25 rows using this information instead of just 1 line in the data frame?
When I loop through each item like this:
df = pd.DataFrame(columns = ['video_title', 'video_id', 'date_created'])
#For Loop to Create columns for DataFrame
x=0
while x < len(response['items']):
video_title= response['items'][x]['snippet']['title']
video_id= response['items'][x]['id']['videoId']
date_created= response['items'][x]['snippet']['publishedAt']
x=x+1
#print(video_title, video_id)
df = df.append({'video_title': video_title,'video_id': video_id,
'date_created': date_created}, ignore_index=True)
=========ANSWER UPDATE==========
THANK YOU TO EVERYONE THAT GAVE INPUT !!!
The code that created the Dataframe was:
import pandas as pd
x=0
video_title = []
video_id = []
date_created = []
while x < len(response['items']):
video_title.append (response['items'][x]['snippet']
['title'])
video_id.append (response['items'][x]['id']['videoId'])
date_created.append (response['items'][x]['snippet'].
['publishedAt'])
x=x+1
#print(video_title, video_id)
df = pd.DataFrame({'video_title': video_title,'video_id':
video_id, 'date_created': date_created})
Based on what I know about youtube APIs return objects, the values of 'title' , 'videoId' and 'publishedAt' are strings.
A strategy of making a single df from these strings are:
Store these strings in a list. So you will have three lists.
Convert the lists into a df
You will get a df with x rows, based on x values that are retrieved.
Example:
import pandas as pd
x=0
video_title = []
video_id = []
date_created = []
while x < len(response['items']):
video_title.append (response['items'][x]['snippet']['title'])
video_id.append (response['items'][x]['id']['videoId'])
date_created.append (response['items'][x]['snippet']['publishedAt'])
x=x+1
#print(video_title, video_id)
df = pd.DataFrame({'video_title': video_title,'video_id':
video_id, 'date_created': date_created})
Related
I have written a code to retrieve JSON data from an URL. It works fine. I give the start and end date and it loops through the date range and appends everything to a dataframe.
The colums are populated with the JSON data sensor and its corresponding values, hence the column names are like sensor_1. When I request the data from the URL it sometimes happens that there are new sensors and the old ones are switched off and deliver no data anymore and often times the length of the columns change. In that case my code just adds new columns.
What I want is instead of new columns a new header in the ongoing dataframe.
What I currently get with my code:
datetime;sensor_1;sensor_2;sensor_3;new_sensor_8;new_sensor_9;sensor_10;sensor_11;
2023-01-01;23.2;43.5;45.2;NaN;NaN;NaN;NaN;NaN;
2023-01-02;13.2;33.5;55.2;NaN;NaN;NaN;NaN;NaN;
2023-01-03;26.2;23.5;76.2;NaN;NaN;NaN;NaN;NaN;
2023-01-04;NaN;NaN;NaN;75;12;75;93;123;
2023-01-05;NaN;NaN;NaN;23;31;24;15;136;
2023-01-06;NaN;NaN;NaN;79;12;96;65;72;
What I want:
datetime;sensor_1;sensor_2;sensor_3;
2023-01-01;23.2;43.5;45.2;
2023-01-02;13.2;33.5;55.2;
2023-01-03;26.2;23.5;76.2;
datetime;new_sensor_8;new_sensor_9;sensor_10;sensor_11;
2023-01-04;75;12;75;93;123;
2023-01-05;23;31;24;15;136;
2023-01-06;79;12;96;65;72;
My loop to retrieve the data:
start_date = datetime.datetime(2023,1,1,0,0)
end_date = datetime.datetime(2023,1,6,0,0)
sensor_data = pd.DataFrame()
while start_zeit < end_zeit:
q = 'url'
r = requests.get(q)
j = json.loads(r.text)
sub_data = pd.DataFrame()
if 'result' in j:
datetime = pd.to_datetime(np.array(j['result']['data'])[:,0])
sensors = np.array(j['result']['sensors'])
data = np.array(j['result']['data'])[:,1:]
df_new = pd.DataFrame(data, index=datetime, columns=sensors)
sub_data = pd.concat([sub_data, df_new])
sensor_data = pd.concat([sensor_data, sub_data])
start_date += timedelta(days=1)
if 2 DataFrames will do for you the you can simply split using the column names:
df1 = df[['datetime', 'sensor_1', 'sensor_2', 'sensor_3']]
df2 = df[['datetime', 'new_sensor_8', 'new-sensor_9', 'sensor_10', 'sensor_11']]
Note the [[ used.
and use .dropna() to lose the NaN rows
My data is in the form given below
Number
ABCD0001
ABCD0002
ABCD0003
GHIJ768O
GHIJ7681
GHIJ7682
SEDFTH1
SEDFTH2
SEDFTH3
I want to split this data into multiple colunms using postgreSQl/python script?
The output data should be like
Number1 Number2 Number3
ABCD0001 GHIJ7680 SEDFTH1
ABCD0002 GHIJ7681 SEDFTH2
Can I do this using an postgreSQl query or via a python script?
This is just a quick solution to your problem, i'm still learning python myself. So this code snippet could probaly be optimized alot. But it solves your problem.
import pandas as pd
number = ['ABCD0001','ABCD0002','ABCD0003','GHIJ768O','GHIJ7681','GHIJ7682','SEDFTH1','SEDFTH2','SEDFTH3']
def find_letters(list_of_str):
abc_arr = []
ghi_arr = []
sed_arr = []
for i in range(len(list_of_str)):
text = number[i]
if text.__contains__('ABC'):
abc_arr.append(text)
if text.__contains__('GHI'):
ghi_arr.append(text)
if text.__contains__('SED'):
sed_arr.append(text)
df = pd.DataFrame({'ABC':abc_arr, 'GHI':ghi_arr, 'SED':sed_arr})
return df
This code give this output.
Screenshot Of Output
Edit:
Just realized the first output you showed is prob a dataframe aswell, below code is how you would handle it if your data is from a df and not a list.
import pandas as pd
data = {'Numbers': ['ABCD0001','ABCD0002','ABCD0003','GHIJ768O','GHIJ7681','GHIJ7682','SEDFTH1','SEDFTH2','SEDFTH3']}
df = pd.DataFrame(data)
print(df)
def find_letters(list_of_str):
abc_arr = []
ghi_arr = []
sed_arr = []
list_of_str = list_of_str.values.tolist()
for i in range(len(list_of_str)):
text = list_of_str[i][0]
if text.__contains__('ABC'):
abc_arr.append(text)
if text.__contains__('GHI'):
ghi_arr.append(text)
if text.__contains__('SED'):
sed_arr.append(text)
df = pd.DataFrame({'ABC':abc_arr, 'GHI':ghi_arr, 'SED':sed_arr})
return df
find_letters(df)
Which gives this output output
I have been able to get the calculation to work but now I am having trouble appending the results back into the data frame e3. You can see from the picture that the values are printing out.
brand_list = list(e3["Brand Name"])
product_segment_list = list(e3['Product Segment'])
# Create a list of tuples: data
data = list(zip(brand_list, product_segment_list))
for i in data:
step1 = e3.loc[(e3['Brand Name']==i[0]) & (e3['Product Segment']==i[1])]
Delta_Price = (step1['Price'].diff(1).div(step1['Price'].shift(1),axis=0).mul(100.0))
print(Delta_Price)
it's easier to use groupby. In each loop 'r' will be just the grouped rows from e3 dataframe from each category and i an index.
new_df = []
for i,r in e3.groupby(['Brand Name','Product Segment']):
price_num = r["Price"].diff(1).values
price_den = r["Price"].shift(1).values
r['Price Delta'] = price_num/price_den
new_df.append(r)
e3_ = pd.concat(new_df, axis = 1)
I'm analyzing some data over a loop of 10 iterations, each of the iterations represents one of the data sets. I've managed to create a data frame with pandas at the end of each iteration, now i need to export each with a different name. here a take of the code.
for t in range(len(myFiles)):
DATA = np.array(importdata(t))
data = DATA[:,1:8]
Numbers = data[:,0:5]
Stars = data[:,5:7]
[numbers,repetitions]=(Frequence(Numbers))
rep_n,freq_n = (translate(repetitions,data))
[stars,Rep_s] = (Frequence(Stars))
rep_s,freq_s = (translate(Rep_s,data))
DF1 = dataframe(numbers,rep_n,freq_n)
DF2 = dataframe(stars,rep_s,freq_s)
Data frames DF1 and DF2 must be store separately with different names in each of the loop iterations.
You can create lists of DataFrames:
ListDF1, ListDF2 = [], []
for t in range(len(myFiles)):
...
rep_s,freq_s = (translate(Rep_s,data))
ListDF1.append(dataframe(numbers,rep_n,freq_n))
ListDF2.append(dataframe(stars,rep_s,freq_s))
Then for select DataFrame use indexing:
#get first DataFrame
print (ListDF1[0])
EDIT: If need export with different filenames use t variable for DF1_0.csv, DF2_0.csv, then DF1_1.csv, DF2_1.csv, ... filenames, because python counts from 0:
for t in range(len(myFiles)):
...
DF1.to_csv(f'DF1_{t}.csv')
DF2.to_csv(f'DF2_{t}.csv')
you can use microseconds from datetime, since it will be different
from datetime import datetime
for t in range(len(myFiles)):
DATA = np.array(importdata(t))
data = DATA[:,1:8]
Numbers = data[:,0:5]
Stars = data[:,5:7]
[numbers,repetitions]=(Frequence(Numbers))
rep_n,freq_n = (translate(repetitions,data))
[stars,Rep_s] = (Frequence(Stars))
rep_s,freq_s = (translate(Rep_s,data))
DF1 = dataframe(numbers,rep_n,freq_n)
DF2 = dataframe(stars,rep_s,freq_s)
DF1.to_csv(f'DF1_{datetime.now().strftime('%f')}.csv')
DF2.to_csv(f'DF2_{datetime.now().strftime('%f')}.csv')
I am currently using python to web scrape the three-point statistics for every NBA player and am trying to put this data in a data frame. The code below is my attempt at adding the values to the data frame. The variables players,teams,threePointAttempts, and threePointPercentage are all lists containing 50 values. These are refilled after every iteration of the while loop because the script moves through each page of the NBA site.
while i<10:
soup = BeautifulSoup(d.page_source, 'html.parser').find('table')
headers, [_, *data] = [i.text for i in soup.find_all('th')], [[i.text for i in b.find_all('td')] for b in soup.find_all('tr')]
final_data = [i for i in data if len(i) > 1]
data_attrs = [dict(zip(headers, i)) for i in final_data]
print(data_attrs)
players = [i['PLAYER'] for i in data_attrs]
teams = [i['TEAM'] for i in data_attrs]
threePointAttempts = [i['3PA'] for i in data_attrs]
threePointPercentage = [i['3P%'] for i in data_attrs]
data_df = data_df.append(pd.DataFrame(players, columns=['Player']),ignore_index=True)
data_df = data_df.append(pd.DataFrame(teams, columns=['Team']),ignore_index=True)
data_df = data_df.append(pd.DataFrame(threePointAttempts, columns=['3PA']),ignore_index=True)
data_df = data_df.append(pd.DataFrame(threePointPercentage, columns=['3P%']),ignore_index=True)
data_df = data_df[['Player','Team','3PA','3P%']]
The issue I am having is the data frame fills like this:
First columnSecond columnThird column
Try:
temp_df = pd.DataFrame({'Player': players,
'Team': teams,
'3PA': threePointAttempts,
'3P%': threePointPercentage})
data_df = data_df.append(temp_df, ignore_index=True)