how to convert string to datatable excel using pandas? - python

Following my previous question, now i'm trying to put data in a table and convert it to an excel file but i can't get the table i want, if anyone can help or explain what's the cause of it, this is the final output i want to get
this the data i'm printing
Hotel1 : chambre double - {'lpd': ('112', '90','10'), 'pc': ('200', '140','10')}
and here is my code
import pandas as pd
import ast
s="Hotel1 : chambre double - {'lpd': ('112', '90','10'), 'pc': ('200', '140','10')}"
ds = []
for l in s.splitlines():
d = l.split("-")
if len(d) > 1:
df = pd.DataFrame(ast.literal_eval(d[1].strip()))
ds.append(df)
for df in ds:
df.reset_index(drop=True, inplace=True)
df = pd.concat(ds, axis= 1)
cols = df.columns
cols = [((col.split('.')[0], col)) for col in df.columns]
df.columns=pd.MultiIndex.from_tuples(cols)
print(df.T)
df.to_excel("v.xlsx")
but this is what i get
How can i solve the probleme please this the final and most important part and thank you in advance.

Within the for loop, the value "Hotel1 : chambre double" is held in d[0]
(try it by yourself by printing d[0].)
In your previous question, the "Name3" column was built by the following line of code:
cols = [((col.split('.')[0], col)) for col in df.columns]
Now, to save "Hotel1 : chambre double", you need to access it within the first for loop.
import pandas as pd
import ast
s="Hotel1 : chambre double - {'lpd': ('112', '90','10'), 'pc': ('200', '140','10')}"
ds = []
cols = []
for l in s.splitlines():
d = l.split("-")
if len(d) > 1:
df = pd.DataFrame(ast.literal_eval(d[1].strip()))
ds.append(df)
cols2 = df.columns
cols = [((d[0], col)) for col in df.columns]
for df in ds:
df.reset_index(drop=True, inplace=True)
df = pd.concat(ds, axis= 1)
df.columns=pd.MultiIndex.from_tuples(cols)
print(df.T)
df.T.to_csv(r"v.csv")
This works, because you are taking the d[0] (hotel name) within the for loop, and creating tuples for your column names whilst you have access to that object.
you then create a multi index column in the line of code you already had, outside the loop:
df.columns=pd.MultiIndex.from_tuples(cols)
Finally, to answer the output to excel query you had, please add the following line of code at the bottom:
df.T.to_csv(r"v.csv")

Related

Adding empty rows in Pandas dataframe

I'd like to append consistently empty rows in my dataframe.
I have following code what does what I want but I'm struggling in adjusting it to my needs:
s = pd.Series('', data_only_trades.columns)
f = lambda d: d.append(s, ignore_index=True)
set_rows = np.arange(len(data_only_trades)) // 4
empty_rows = data_only_trades.groupby(set_rows, group_keys=False).apply(f).reset_index(drop=True)
How can I adjust the code so I add two or more rows instead of one?
How can I set a starting point (e.g. it should start with row 5 -- Do I have to use .loc then in arange?)
Also tried this code but I was struggling in setting the starting row and the values to blank (I got NaN):
df_new = pd.DataFrame()
for i, row in data_only_trades.iterrows():
df_new = df_new.append(row)
for _ in range(2):
df_new = df_new.append(pd.Series(), ignore_index=True)
Thank you!
import numpy as np
v = np.ndarray(shape=(numberOfRowsYouWant,df.values.shape[1]), dtype=object)
v[:] = ""
pd.DataFrame(np.vstack((df.values, v)))
I think you can use NumPy
but, if you want to use your manner, simply convert NaN to "":
df.fillna("")

Columns in a dataframe(header) replace the contents having special format_Python 3.6

I have a excel sheet and the Columns header contains dynamic suffix like "s.FName", "g.LName", "Age", "Address" , "P.CAR", "S.Licsence" etc (about 100 of columns).
Problem is that, some of the columns does not have suffix. so If i use the below code that columns head is empty. I tried pathlib as well but doesnt work.
dataset = pd.read_excel(fileloc)
df = pd.DataFrame(dataset)
df.columns = df.columns.str.split('.').str[1]
So is there any way i can put a condition in the 3rd line of code.
I also used the below code but giving error
dataset = pd.read_excel(fileloc)
df = pd.DataFrame(dataset)
colindex = 0
for (columnName, columndata) in df.iteritems():
if str(columnName).__contains__('.'):
df.insert(colindex,str(columnName[0]).split('.')[1],columndata.value,True)
else:
df.insert(colindex, str(columnName[0]).split('.')[0],columndata.value,True)
EDIT
===Solution===
This may not be the optimal solution or the correct way to do this but other solution are always welcome.
collist =[]
dataset = pd.read_excel(fileloc)
df = pd.DataFrame(dataset)
#df.columns = df.columns.str.split('.').str[-1]
for col in (dataset.head(0)):
if str(col).__contains__('.'):
ncol = str(col).split('.')[1]
collist.append(str(ncol))
#df.rename(columns={col: ncol for col, ncol in zip(col, ncol)}, inplace=True)
else:
ncol = str(col)
collist.append(str(ncol))
#df.rename(columns={col: ncol for col, ncol in zip(str(col), ncol)}, inplace=True)
col_rename_dict = {i: j for i, j in zip(dataset.head(0), collist)}
df.rename(columns=col_rename_dict, inplace=True)

loop over columns in dataframes python

I want to loop over 2 columns in a specific dataframe and I want to access the data by the name of the column but it gives me this error (type error) on line 3
i=0
for name,value in df.iteritems():
q1=df[name].quantile(0.25)
q3=df[name].quantile(0.75)
IQR=q3-q1
min=q1-1.5*IQR
max=q3+1.5*IQR
minout=df[df[name]<min]
maxout=df[df[name]>max]
new_df=df[(df[name]<max) & (df[name]>min)]
i+=1
if i==2:
break
It looks like you want to exclude outliers based on the 1.5*IQR rule. Here is a simpler solution:
Input dummy data:
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'col%s' % (i+1): np.random.normal(size=1000)
for i in range(4)})
Removing the outliers (keep data: Q1-1.5IQR < data < Q3+1.5IQR):
Q1 = df.iloc[:, :2].quantile(.25)
Q3 = df.iloc[:, :2].quantile(.75)
IQR = Q3-Q1
non_outliers = (df.iloc[:, :2] > Q1-1.5*IQR) & (df.iloc[:, :2] < Q3+1.5*IQR)
new_df = df[non_outliers.all(axis=1)]
output:
Type error might happen for a lot of reasons so it will be better if you add part of the DF to try to understand the issue.
Also to loop over columns you can also use the iterrows() function:
import pandas as pd
df = pd.read_csv('filename.csv')
for _, content in df.iterrows():
print(content['columnname']) #add the name of the columns you want to loop over
refer to the following link for more information
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows

Cannot assign to function call when looping through and converting excel files

With this code:
xls = pd.ExcelFile('test.xlsx')
sn = xls.sheet_names
for i,snlist in list(zip(range(1,13),sn)):
'df{}'.format(str(i)) = pd.read_excel('test.xlsx',sheet_name=snlist, skiprows=range(6))
I get this error:
'df{}'.format(str(i)) = pd.read_excel('test.xlsx',sheet_name=snlist,
skiprows=range(6))
^ SyntaxError: cannot assign to function call
I can't understand the error and how solve. What's the problem?
df+str(i) also return error
i want to make result as:
df1 = pd.read_excel.. list1...
df2 = pd.read_excel... list2....
You can't assign the result of df.read_excel to 'df{}'.format(str(i)) -- which is a string that looks like "df0", "df1", "df2" etc. That is why you get this error message. The error message is probably confusing since its treating this as assignment to a "function call".
It seems like you want a list or a dictionary of DataFrames instead.
To do this, assign the result of df.read_excel to a variable, e.g. df and then append that to a list, or add it to a dictionary of DataFrames.
As a list:
dataframes = []
xls = pd.ExcelFile('test.xlsx')
sn = xls.sheet_names
for i, snlist in list(zip(range(1, 13), sn)):
df = pd.read_excel('test.xlsx', sheet_name=snlist, skiprows=range(6))
dataframes.append(df)
As a dictionary:
dataframes = {}
xls = pd.ExcelFile('test.xlsx')
sn = xls.sheet_names
for i, snlist in list(zip(range(1, 13), sn)):
df = pd.read_excel('test.xlsx', sheet_name=snlist, skiprows=range(6))
dataframes[i] = df
In both cases, you can access the DataFrames by indexing like this:
for i in range(len(dataframes)):
print(dataframes[i])
# Note indexes will start at 0 here instead of 1
# You may want to change your `range` above to start at 0
Or more simply:
for df in dataframes:
print(df)
In the case of the dictionary, you'd probably want:
for i, df in dataframes.items():
print(i, df)
# Here, `i` is the key and `df` is the actual DataFrame
If you really do want df1, df2 etc as the keys, then do this instead:
dataframes[f'df{i}'] = df

Loop through cell range (Every 3 cells) and add ranking to it

The problem is I am trying to make a ranking for every 3 cells in that column
using pandas.
For example:
This is the outcome I want
I have no idea how to make it.
I tried something like this:
for i in range(df.iloc[1:],df.iloc[,:],3):
counter = 0
i['item'] += counter + 1
The code is completely wrong, but I need help with the range and put df.iloc in the brackets in pandas.
Does this match the requirements ?
import pandas as pd
df = pd.DataFrame()
df['Item'] = ['shoes','shoes','shoes','shirts','shirts','shirts']
df2 = pd.DataFrame()
for i, item in enumerate(df['Item'].unique(), 1):
df2.loc[i-1,'rank'] = i
df2.loc[i-1, 'Item'] = item
df2['rank'] = df2['rank'].astype('int')
print(df)
print("\n")
print(df2)
df = df.merge(df2, on='Item', how='inner')
print("\n")
print(df)

Categories