Rows are Int64Index, python states they are string and dataframe - python

I formed a dataset, where the two columns are edge lists taken from another dataset and I mistakenly formed one of the columns as an Int64Index type when extracting the indexes as pictured here
I am trying to extract the numbers from each cell, but run into problems. When I try to handle the number as a string using the int() command, I get an error that 'DataFrame' object is not callable. However when I try to use a pandas dataframe commmand, such as to_numeric(), I get a AttributeError: 'str' object has no attribute 'to_numeric'.
df1 = pd.DataFrame(np.array(["boo","foo","bar"]),columns=['col1'])
d = {'col1': ["boo","boo","boo","bar","foo","bar"], 'Title': ["no","yes","stop","yes","stop","go"],'Example': ["p","y","x","f","v","g"] }
df2 = pd.DataFrame(data=d)
d = {'Example': ["p","y","x","f","v","g"], 'Title': ["no","yes","stop","yes","stop","go"]}
df3 = pd.DataFrame(data=d)
for i in range(0,len(df1)):
val = df1['col1'][i]
stuff = df2[df2["col1"] == val]
stuff = stuff.reset_index()
for k in range(0,len(stuff)):
s = stuff["Example"][k]
p = stuff["Title"][k]
j = df2.index[(df2['Example'] == s) & (df2['Title'] == p)]
edges = edges.append({'source': i, 'target': j}, ignore_index=True)
m = edges['target'].tolist()[3]
print(m)
output
pattern = re.compile("[\d*]")
print(type("".join(pattern.findall(m)[2:-2])))
"".join(pattern.findall(m)[2:-2])
output
int("".join(pattern.findall(m)[2:-2]))
output

Related

Having problems trying to convert a json to a DataFrame

I am trying to solve this problem
1.Create a function 'wbURL' that takes an indicator, a country code, and begin and end years to creates a URL of the following format:
http://api.worldbank.org/countries/COUNTRYCODE/indicators/INDICATORformat=json&date=BEGIN_DATE:END_DATE
Create another function 'wbDF' that takes the same inputs and returns a dataframe constructed from the response (the response will come as a JSON list, and the 1st (i.e., index 1) element of this list contains the relevant data. Extract this element, which is a list of dictionaries---this is what you want to construct the dataframe out of.Drop all columns except indicator country date and value. For the country and value columns, notice that the data is itself a dictionary: use apply to extract the value out of these dictionaries.
This is the code I wrote out:
def wbURL(contcode, ind, begin, end):
return f'http://api.worldbank.org/countries{contcode}/indicators/{ind}?format=json&date={begin}:{end}'
def wbDF(contcode, ind, begin, end):
url = wbURL(contcode, ind, begin, end)
response = requests.get(url)
wb_raw = response.content
wb = json.loads(wb_raw)
data = wb[1]
df = pd.DataFrame(data)
df = df.drop(columns=['countryiso3code', 'unit', 'obs_status', 'decimal'])
df['indicator'] = [d['id'] for d in [d['indicator'] for d in data]]
df['country'] = [d['value'] for d in [d['country'] for d in data]]
df['date'] = [d['date'] for d in data]
df['value'] = [d['value'] for d in data]
return df
test = wbDF('GBR', 'SP.DYN.LE00.IN', 2000, 2019)
print(test)
When I ren it, i got an error:
IndexError: list index out of range
I would like someone to have a look at the code I have written and give me some advice on how I should change it to make ir working

Appending dictionaries generated from a loop to the same dataframe

I have a loop within a nested loop that at the end generates 6 dictionaries. Each dictionary has the same key but different values, I would at the end of every iteration to append the dictionary to the same dataframe but it keeps failing.
At the end I would like to have a table with 6 columns plus an index which holds the keys.
This is the idea behind what I'm trying to do:
dictionary = dict()
for i in blahh:
dictionary[i] = dict(zip(blahh['x'][i], blahh['y'][i]))
df = pd.DataFrame(dictionary)
df_final = pd.concat([dictionary, df])
I get the error:
cannot concatenate object of type '<class 'dict'>'; only series and dataframe objs are valid
I created a practice dataset set if necessary here:
letts = [ ('a','b','c'),('e','f','g'),('h','i','j'),('k','l','m'),('n','o','p')]
numns = [(1,2,3),(4,5,6),(7,8,9),(10,11,12),(13,14,15)]
dictionary = dict()
for i in letts:
for j in numns:
dictionary = dict(zip(i, j))
i am confusing about your practice dataset, but modifications below could provide an idea...
df_final = pd.DataFrame()
dictionary = dict()
for i in blahh:
dictionary[i] = dict(zip(blahh['x'][i], blahh['y'][i]))
df = pd.DataFrame(dictionary, index="index must be passed")
df_final = pd.concat([df_final, df])

Cannot assign to function call when looping through and converting excel files

With this code:
xls = pd.ExcelFile('test.xlsx')
sn = xls.sheet_names
for i,snlist in list(zip(range(1,13),sn)):
'df{}'.format(str(i)) = pd.read_excel('test.xlsx',sheet_name=snlist, skiprows=range(6))
I get this error:
'df{}'.format(str(i)) = pd.read_excel('test.xlsx',sheet_name=snlist,
skiprows=range(6))
^ SyntaxError: cannot assign to function call
I can't understand the error and how solve. What's the problem?
df+str(i) also return error
i want to make result as:
df1 = pd.read_excel.. list1...
df2 = pd.read_excel... list2....
You can't assign the result of df.read_excel to 'df{}'.format(str(i)) -- which is a string that looks like "df0", "df1", "df2" etc. That is why you get this error message. The error message is probably confusing since its treating this as assignment to a "function call".
It seems like you want a list or a dictionary of DataFrames instead.
To do this, assign the result of df.read_excel to a variable, e.g. df and then append that to a list, or add it to a dictionary of DataFrames.
As a list:
dataframes = []
xls = pd.ExcelFile('test.xlsx')
sn = xls.sheet_names
for i, snlist in list(zip(range(1, 13), sn)):
df = pd.read_excel('test.xlsx', sheet_name=snlist, skiprows=range(6))
dataframes.append(df)
As a dictionary:
dataframes = {}
xls = pd.ExcelFile('test.xlsx')
sn = xls.sheet_names
for i, snlist in list(zip(range(1, 13), sn)):
df = pd.read_excel('test.xlsx', sheet_name=snlist, skiprows=range(6))
dataframes[i] = df
In both cases, you can access the DataFrames by indexing like this:
for i in range(len(dataframes)):
print(dataframes[i])
# Note indexes will start at 0 here instead of 1
# You may want to change your `range` above to start at 0
Or more simply:
for df in dataframes:
print(df)
In the case of the dictionary, you'd probably want:
for i, df in dataframes.items():
print(i, df)
# Here, `i` is the key and `df` is the actual DataFrame
If you really do want df1, df2 etc as the keys, then do this instead:
dataframes[f'df{i}'] = df

Filtering a Pandas DataFrame through a list dictionary

Movie Dataframe
I have a DataFrame that contains movie information and I'm trying to filter the rows so that if the list of dictionaries contains 'name' == 'specified genre' it will display movies containing that genre.
I have tried using a list comprehension
filter = ['Action']
expectedResult = [d for d in df if d['name'] in filter]
however I end up with an error:
TypeError: string indices must be integers
d is a column name in your code. That's why you are getting this error.
See the following example:
import pandas as pd
df = pd.DataFrame({"abc": [1,2,3], "def": [4,5,6]})
for d in df:
print(d)
Gives:
abc
def
I think what you are trying to do could be achieved by:
df = pd.DataFrame({"genre": ["something", "soemthing else"], "abc": ["movie1", "movie2"]})
movies = df.to_dict("records")
[m["abc"] for m in movies if m["genre"] == "something"]
Which gives:
['movie1']
your loop,for d in df, will give the headings for your values.
your d will have generes as a value.
try to run:-
for d in df:
print(d)
you will understand

Trying to split output by ','

I have an object for my output. Now I want to split my output and
create a df with the values.
This is the output I work with:
Seriennummer
701085.0 ([1525.5804581812297, 255.9005481721001, 0.596...
701086.0 ([1193.0420594479258, 271.17468806239793, 0.65...
701087.0 ([1265.5151604213813, 217.26487934586433, 0.60...
701088.0 ([1535.8282855508626, 200.6196628705149, 0.548...
701089.0 ([1500.4964672930257, 247.8883736673866, 0.583...
701090.0 ([1203.6453723293514, 258.5749562983118, 0.638...
701091.0 ([1607.1851164005993, 209.82194423587782, 0.56...
701092.0 ([1711.7277933836879, 231.1560159770871, 0.567...
dtype: object
This is what I am doing and my attempt to split my output:
x=df.T.iloc[1]
y=df.T.iloc[2]
def logifunc(x,c,a,b):
return c / (1 + (a) * np.exp(-b*(x)))
result = df.groupby('Seriennummer').apply(lambda grp:
opt.curve_fit(logifunc, grp.mrwSmpVWi, grp.mrwSmpP, p0=[110, 400, -2]))
print(result)
for element in result:
parts = element.split(',')
print (parts)
It doesn't work. I get the Error:
AttributeError: 'tuple' object has no attribute 'split'
#jezrael
It works. Now it shows a lot of data I don't need. Do you have an idea how I can drop every with the data I don't need.
Seriennummer 0 1 2
701085.0 1525.5804581812297 255.9005481721001 0.5969011082719918
701085.0 [ 9.41414894e+03 -2.07982124e+03 -2.30130078e+00] [-2.07982124e+03 1.44373786e+03 9.59282709e-01] [-2.30130078e+00 9.59282709e-01 7.75807643e-04]
701086.0 1193.0420594479258 271.17468806239793 0.6592054681687264
701086.0 [ 5.21906135e+03 -2.23855187e+03 -2.11896425e+00] [-2.23855187e+03 2.61036500e+03 1.67396324e+00] [-2.11896425e+00 1.67396324e+00 1.22581746e-03]
701087.0 1265.5151604213813 217.26487934586433 0.607183527397275
Use Series.explode with DataFrame constructor:
s = result.explode()
df1 = pd.DataFrame(s.tolist(), index=s.index)
If small data and/or performnace is not important:
df1 = result.explode().apply(pd.Series)

Categories