How to create a dataframe? - python
df4 = []
for i in (my_data.points.values.tolist()[0]):
df3 = pd.json_normalize(j)
df4.append(df3)
df5 = pd.DataFrame(df4)
df5.head()
When I run this code I get this error: Must pass 2-d input. shape=(16001, 1, 3)
pd.json_normalize will change the json data to table format, but what you need to have is an array of dictionaries to be able to convert to a dataframe.
For example
dict_list=[
{"id":1,"name":"apple","price":10},
{"id":1,"name":"orange","price":20},
{"id":1,"name":"pineapple","price":15},
]
df=pd.DataFrame(dict_list)
In your case
df4 = []
for i in (my_data.points.values.tolist()[0]):
# df3 = pd.json_normalize(j) since the structure is not mentioned,
# I'm assuming "i" as a dictionary which has the relevant row
df4.append(i)
df5 = pd.DataFrame(df4)
df5.head()
Related
Appending dictionaries generated from a loop to the same dataframe
I have a loop within a nested loop that at the end generates 6 dictionaries. Each dictionary has the same key but different values, I would at the end of every iteration to append the dictionary to the same dataframe but it keeps failing. At the end I would like to have a table with 6 columns plus an index which holds the keys. This is the idea behind what I'm trying to do: dictionary = dict() for i in blahh: dictionary[i] = dict(zip(blahh['x'][i], blahh['y'][i])) df = pd.DataFrame(dictionary) df_final = pd.concat([dictionary, df]) I get the error: cannot concatenate object of type '<class 'dict'>'; only series and dataframe objs are valid I created a practice dataset set if necessary here: letts = [ ('a','b','c'),('e','f','g'),('h','i','j'),('k','l','m'),('n','o','p')] numns = [(1,2,3),(4,5,6),(7,8,9),(10,11,12),(13,14,15)] dictionary = dict() for i in letts: for j in numns: dictionary = dict(zip(i, j))
i am confusing about your practice dataset, but modifications below could provide an idea... df_final = pd.DataFrame() dictionary = dict() for i in blahh: dictionary[i] = dict(zip(blahh['x'][i], blahh['y'][i])) df = pd.DataFrame(dictionary, index="index must be passed") df_final = pd.concat([df_final, df])
Add values from a nested JSON to a pandas dataframe
I have the following JSON object: {"code":"Ok","matchings":[{"confidence":0.025755,"geometry":"qnp{bBww{kH??~D_I}E_J{EaJ{E{I{AsCoJgQfKuTjJwNtF}HdBuBnAgBpFsF~EeEzAsAt#i#lA}#x#q#lEmCjDuBdDoAvFmAfYmEtAUrJyDj#_#h#m#`#u#T}#J{#B_A?gAGmAM}#Su#]u#wN{QwI{KcA}Aa#gASiAWsBOwCGmDCoJ??cEH?{FA{HgIXuG`#eHrAsLdDkI|CkIfDq#VoDlB_GzDaE`D_A|#kA`AeAx#sI~G}DlDk#j#mClCiOrQwGvJiGxJoFdK_HjP{Pne#aLt\\sK~]oKb_#sG~TeJ`_#q#fD{#dEoBlMwBxQaAbI{Dh\\wKrfAiRbvBy#`KaLjwAyHj_AANM~AUxC}#tKi#bHe#jGfBj#t#V|#\\TFjAXz#HhASxAy#vCcBjX~GvG`BlEjAv\\xJfBf#dThG~Ad#nFrBnCbBdCvBzB`DbCfEr{#b~A","legs":[{"annotation":{"nodes":[330029575,5896466632,330029575,5896466588,5896466587,5896466586,5896466637,330029340,330029339,330029338,1497356855,1880770263,46388213,1880770262,1880770257,2021835257,3306177380,46387099,2021835255,6909770873,46385948,6909770874,46384887,46382454]},"steps":[],"distance":332.2,"duration":93.1,"summary":"","weight":93.1},{"annotation":{"nodes":[46384887,46382454,5888264001,6909802199,3296872014,6909802198,5888264003,6909802197,3296872012,6909802194,6909802195,6909802193,6909802196,3296872013,3296872015]},"steps":[],"distance":88.1,"duration":13.5,"summary":"","weight":13.5},{"annotation":{"nodes":[3296872013,3296872015,6909802186,6909802187,6909770884,3296872017,6909802185,4904066416,3296872018,1614187163]},"steps":[],"distance":62.3,"duration":12.4,"summary":"","weight":12.4},{"annotation":{"nodes":[3296872018,1614187163,2054127599,1614187129,5896479942,6909802219,46384372,1027299576,6909802220,46389815]},"steps":[],"distance":144,"duration":25.2,"summary":"","weight":25.2},{"annotation":{"nodes":[6909802220,46389815,6296436095,6296436094,298079716,6296436096,46391324,1083528076,6909802221,6909802222,46393158]},"steps":[],"distance":90.6,"duration":10.1,"summary":"","weight":10.1},{"annotation":{"nodes":[6909802222,46393158,46393795,6909802223,1027299602,6909802224,46396846,46398397,2054127645,46399502,46400708,1027299589,6712474212,6903665704,46402805,46403163,4374153462]},"steps":[],"distance":422.9,"duration":40.1,"summary":"","weight":40.1},{"annotation":{"nodes":[46403163,4374153462,46404084,1027299603,364146312,2262500170]},"steps":[],"distance":273.6,"duration":24.7,"summary":"","weight":24.7},{"annotation":{"nodes":[364146312,2262500170,5289718695]},"steps":[],"distance":170.9,"duration":15.3,"summary":"","weight":15.3},{"annotation":{"nodes":[2262500170,5289718695,2054127657,1693195716,46408565,6913837768,1693195721,2262500247,1693195714,2262500104,1693195717]},"steps":[],"distance":56.9,"duration":14.2,"summary":"","weight":14.2},{"annotation":{"nodes":[46397705,46401323,46405521]},"steps":[],"distance":86.6,"duration":12.6,"summary":"","weight":12.6},{"annotation":{"nodes":[46401323,46405521,46410773]},"steps":[],"distance":156.5,"duration":22.5,"summary":"","weight":22.5},{"annotation":{"nodes":[46405521,46410773,452003319,452003320]},"steps":[],"distance":95.4,"duration":13.8,"summary":"","weight":13.8},{"annotation":{"nodes":[452003319,452003320,46411428,46414457,46419384,46421801]},"steps":[],"distance":226.4,"duration":32.6,"summary":"","weight":32.6},{"annotation":{"nodes":[46419384,46421801,46421802,46421735]},"steps":[],"distance":69.2,"duration":10,"summary":"","weight":10},{"annotation":{"nodes":[46421802,46421735,46421416]},"steps":[],"distance":34.1,"duration":4.9,"summary":"","weight":4.9},{"annotation":{"nodes":[46421735,46421416,46420466]},"steps":[],"distance":2.7,"duration":0.3,"summary":"","weight":0.3},{"annotation":{"nodes":[46421416,46420466]},"steps":[],"distance":31.4,"duration":4.6,"summary":"","weight":4.6},{"annotation":{"nodes":[46421416,46420466,452003307,452003308,46421260,46422467,5761752102,46423905]},"steps":[],"distance":135.5,"duration":25,"summary":"","weight":25},{"annotation":{"nodes":[5761752102,46423905,46424346,5777055555,5713213408,46425605,5777055050,5777346784,5777055556,5713221227,46426685,46427741,3175895442,3183752428,5826014405,46428227]},"steps":[],"distance":106.5,"duration":14.9,"summary":"","weight":14.9},{"annotation":{"nodes":[5826014405,46428227,3175895443,5826014406,3175895444,5826014368,5826014369,5826014374,46429570,5826014373,5826014375,5826014372,5826014358,5826014371,5826014370,5826014376]},"steps":[],"distance":172.7,"duration":15.7,"summary":"","weight":15.7},{"annotation":{"nodes":[2054127660,2054127638,2054127605,6296435009,2054127599,6909770882,3296872018,4904066416,6909802185,3296872017,6909770884,6909802187,6909802186,3296872015,3296872013,6909802196,6909802193,6909802195,6909802194,3296872012,6909802197,5888264003,6909802198,3296872014,6909802199,5888264001,46382454,46384887,6909770874,46385948,6909770873,2021835255,46387099,3306177380,2021835257]},"steps":[],"distance":317.7,"duration":46.1,"summary":"","weight":46.1},{"annotation":{"nodes":[3306177380,2021835257,1880770257,1880770262,46388213,1880770263,1497356855,330029338,330029339,330029340,5896466637]},"steps":[],"distance":150.4,"duration":29.4,"summary":"","weight":29.4}],"distance":80317.8,"duration":10983.5,"weight_name":"duration","weight":10983.5}],"tracepoints":[{"alternatives_count":0,"waypoint_index":0,"matchings_index":0,"location":[4.929932,52.372217],"name":"Willem Theunisse Blokstraat","distance":10.791613,"hint":"CAkHgHAJBwAlAAAAAAAAAAAAAAAAAAAALCd0QQAAAAAAAAAAAAAAACUAAAAAAAAAAAAAAAAAAAABAAAAjDlLAPkiHwP3OEsAGiMfAwAArxMz7Ejh"},null,{"alternatives_count":0,"waypoint_index":1,"matchings_index":0,"location":[4.932506,52.3709],"name":"Frans de Wollantstraat","distance":11.915926,"hint":"pwUBAPYEAYAHAAAARwAAAAAAAAAAAAAA3_qaQE0JPUIAAAAAAAAAAAcAAABHAAAAAAAAAAAAAAABAAAAmkNLANQdHwPtQksAxB0fAwAA_xUz7Ejh"},{"alternatives_count":0,"waypoint_index":472,"matchings_index":0,"location":[4.932745,52.373288],"name":"Piet Heinkade","distance":0.98867,"hint":"gwUBgMgFAQAFAAAADQAAABoBAABYAAAAQMS3QHTNW0HsWZ1DmZ2WQgUAAAANAAAAGgEAAFgAAAABAAAAiURLACgnHwN9REsAIycfAwoADwkz7Ejh"},null,null,{"alternatives_count":1,"waypoint_index":473,"matchings_index":0,"location":[4.934022,52.371637],"name":"Piet Heinkade","distance":2.713742,"hint":"NA8HADsPB4ACAAAADwAAADoAAAA-AAAAjU82QIAqg0FUpSdCLoWJQgIAAAAPAAAAOgAAAD4AAAABAAAAhklLALUgHwNfSUsAsCAfAwQAvxUz7Ejh"},null,null,{"alternatives_count":1,"waypoint_index":474,"matchings_index":0,"location":[4.93213,52.371794],"name":"Frans de Wollantstraat","distance":10.337677,"hint":"AgUBgAcFAQABAAAABAAAAAwAAAAAAAAA1paeP-KrBUAomAdBAAAAAAEAAAAEAAAADAAAAAAAAAABAAAAIkJLAFIhHwOrQksAeiEfAwIA7xQz7Ejh"},{"alternatives_count":1,"waypoint_index":475,"matchings_index":0,"location":[4.93074,52.372528],"name":"Isaac Titsinghkade","distance":0.65222,"hint":"AwkHgAYJBwA5AAAACwAAAAAAAACMAAAA_Fe_QWP_k0AAAAAA33FqQjkAAAALAAAAAAAAAIwAAAABAAAAtDxLADAkHwOtPEsANCQfAwAADw4z7Ejh"},null,null]} I want to add all values that belong to the key nodes to one column in a pandas dataframe When I run: for i in output["matchings"][0]['legs']: result = i['annotation']['nodes'] df = pd.DataFrame(result, columns=['node']) df only a fraction gets added to the dataframe. What am I doing wrong?
At the end of your for loop, 'df' keeps the last 'node' key of your json. You have to append all 'nodes' keys in a single dataframe instead. Extending your code: df = pd.DataFrame({'node':{}}) for i in output["matchings"][0]['legs']: result = i['annotation']['nodes'] df_temp = pd.DataFrame(result, columns=['node']) df = df.append(df_temp, ignore_index=True)
Cannot assign to function call when looping through and converting excel files
With this code: xls = pd.ExcelFile('test.xlsx') sn = xls.sheet_names for i,snlist in list(zip(range(1,13),sn)): 'df{}'.format(str(i)) = pd.read_excel('test.xlsx',sheet_name=snlist, skiprows=range(6)) I get this error: 'df{}'.format(str(i)) = pd.read_excel('test.xlsx',sheet_name=snlist, skiprows=range(6)) ^ SyntaxError: cannot assign to function call I can't understand the error and how solve. What's the problem? df+str(i) also return error i want to make result as: df1 = pd.read_excel.. list1... df2 = pd.read_excel... list2....
You can't assign the result of df.read_excel to 'df{}'.format(str(i)) -- which is a string that looks like "df0", "df1", "df2" etc. That is why you get this error message. The error message is probably confusing since its treating this as assignment to a "function call". It seems like you want a list or a dictionary of DataFrames instead. To do this, assign the result of df.read_excel to a variable, e.g. df and then append that to a list, or add it to a dictionary of DataFrames. As a list: dataframes = [] xls = pd.ExcelFile('test.xlsx') sn = xls.sheet_names for i, snlist in list(zip(range(1, 13), sn)): df = pd.read_excel('test.xlsx', sheet_name=snlist, skiprows=range(6)) dataframes.append(df) As a dictionary: dataframes = {} xls = pd.ExcelFile('test.xlsx') sn = xls.sheet_names for i, snlist in list(zip(range(1, 13), sn)): df = pd.read_excel('test.xlsx', sheet_name=snlist, skiprows=range(6)) dataframes[i] = df In both cases, you can access the DataFrames by indexing like this: for i in range(len(dataframes)): print(dataframes[i]) # Note indexes will start at 0 here instead of 1 # You may want to change your `range` above to start at 0 Or more simply: for df in dataframes: print(df) In the case of the dictionary, you'd probably want: for i, df in dataframes.items(): print(i, df) # Here, `i` is the key and `df` is the actual DataFrame If you really do want df1, df2 etc as the keys, then do this instead: dataframes[f'df{i}'] = df
Creating a definition that takes undefined number of parameters
What would be the best method of turning a code like below to be able to accept as many dataframes as we would like? def q_grab(df, df2, df3, q): #accepts three dataframes and a column name. Looks up column in all dataframes and combine to one data = df[q], df2[q], df3[q] headers = [q+"_1", q+"_2", q+"_3"] data2 = pd.concat(data, axis = 1, keys=headers) return data2 q = 'covid_condition' data2 = q_grab(df, df2, df3, q) #If I run function pid_set first, it will create new df based on pID it looks like
One approach is to use * operator to get a list of arguments (but name your final argument, so it isn't part of the list): Something like this: def q_grab(*dfs, q=None): # q is a named argument to signal end of positional arguments data = [df[q] for df in dfs] headers = [q+"_"+str(i) for i in range(len(dfs))] data2 = pd.concat(data, axis = 1, keys=headers) return data2 q = 'covid_condition' data2 = q_grab(df, df2, df3, q=q) A probably cleaner alternative, is to go ahead and pass a list of dataframes as the first argument: def q_grab(dfs,q): called with: data2 = q.grab([df,df2,df3], q) using the function code as above
Separate column data with a comma to two columns for dataframe
The data set I pulled from an API return looks like this: ([['Date', 'Value']], [[['2019-08-31', 445000.0], ['2019-07-31', 450000.0], ['2019-06-30', 450000.0]]]) I'm trying to create a DataFrame with two columns from the data: Date & Value Here's what I've tried: df = pd.DataFrame(city_data, index =['a', 'b'], columns =['Names'] . ['Names1']) city_data[['Date','Value']] = city_data['Date'].str.split(',',expand=True) city_data city_data.append({"header": column_value, "Value": date_value}) city_data = pd.DataFrame() This code was used to create the dataset. I pulled the lists from the API return: column_value = data["dataset"]["column_names"] date_value = data["dataset"]["data"] city_data = ([column_value], [date_value]) city_data Instead of creating a dataframe with two columns from the data, in most cases I get the "TypeError: list indices must be integers or slices, not str"
is it what you are looking for: d = ([['Date', 'Value']], [[['2019-08-31', 445000.0], ['2019-07-31', 450000.0], ['2019-06-30', 450000.0]]]) pd.DataFrame(d[1][0], columns=d[0][0]) return: