How to write the data to an excel using python - python

Im writing the data inside my dictionary to an excel which looks like below
my_dict = { 'one': 100, 'two': 200, 'three': 300}
df = pd.DataFrame(my_dict.items(), columns=['Summary','Count'])
with pd.ExcelWriter('outfile.xlsx') as writer:
df.to_excel(writer, sheet_name='sheet1', index=False)
for the above code im getting the desired output like below.
I have one more list which have some values which needs to be pasted in the 3rd column of the excel.
my_list = [10,20,30]
expected output:
Edit: I need to add the data in my_dict and the my_list at the same time.
I have tried finding out a solution unfortunately couldn't able to. Any help is appreciated!
Many thanks!!

To add the data in my_dict and the my_list at the same time to define the dataframe df, you can chain the pd.DataFrame() call with .assign() to define the column named my_list using the input list my_list as input:
df = pd.DataFrame(my_dict.items(), columns=['Summary','Count']).assign(my_list=my_list)
Of course, the most trivial way of doing that is to separate them into 2 statements, defining the dataframe by pd.DataFrame first and then add column, as below. But this will be in 2 statement and not sure whether you still count it as "at the same time".
df = pd.DataFrame(my_dict.items(), columns=['Summary','Count']) # Your existing statement unchanged
df['my_list'] = my_list
Result:
print(df)
Summary Count my_list
0 one 100 10
1 two 200 20
2 three 300 30

This may also solve your problem
import pandas as pd
my_dict = {'summary': ['one', 'two', 'three'], 'count': [100, 200, 300]}
my_list = [10,20,30]
df = pd.DataFrame.from_dict(my_dict)
df['my_list'] = my_list
df.to_excel('df.xlsx')

Related

Cannot assign to function call when looping through and converting excel files

With this code:
xls = pd.ExcelFile('test.xlsx')
sn = xls.sheet_names
for i,snlist in list(zip(range(1,13),sn)):
'df{}'.format(str(i)) = pd.read_excel('test.xlsx',sheet_name=snlist, skiprows=range(6))
I get this error:
'df{}'.format(str(i)) = pd.read_excel('test.xlsx',sheet_name=snlist,
skiprows=range(6))
^ SyntaxError: cannot assign to function call
I can't understand the error and how solve. What's the problem?
df+str(i) also return error
i want to make result as:
df1 = pd.read_excel.. list1...
df2 = pd.read_excel... list2....
You can't assign the result of df.read_excel to 'df{}'.format(str(i)) -- which is a string that looks like "df0", "df1", "df2" etc. That is why you get this error message. The error message is probably confusing since its treating this as assignment to a "function call".
It seems like you want a list or a dictionary of DataFrames instead.
To do this, assign the result of df.read_excel to a variable, e.g. df and then append that to a list, or add it to a dictionary of DataFrames.
As a list:
dataframes = []
xls = pd.ExcelFile('test.xlsx')
sn = xls.sheet_names
for i, snlist in list(zip(range(1, 13), sn)):
df = pd.read_excel('test.xlsx', sheet_name=snlist, skiprows=range(6))
dataframes.append(df)
As a dictionary:
dataframes = {}
xls = pd.ExcelFile('test.xlsx')
sn = xls.sheet_names
for i, snlist in list(zip(range(1, 13), sn)):
df = pd.read_excel('test.xlsx', sheet_name=snlist, skiprows=range(6))
dataframes[i] = df
In both cases, you can access the DataFrames by indexing like this:
for i in range(len(dataframes)):
print(dataframes[i])
# Note indexes will start at 0 here instead of 1
# You may want to change your `range` above to start at 0
Or more simply:
for df in dataframes:
print(df)
In the case of the dictionary, you'd probably want:
for i, df in dataframes.items():
print(i, df)
# Here, `i` is the key and `df` is the actual DataFrame
If you really do want df1, df2 etc as the keys, then do this instead:
dataframes[f'df{i}'] = df

How to convert a python list to pandas Dataframe in python

I have this python list and I need to convert it into pandas dataframe.
This is how my list looks like:
thisdict = {}
thisdict["Column1"] = 1
thisdict["Column2"] = 2
thisdict # this print
{'Column1': 1, 'Column2': 2}
And I need to convert this to pandas dataframe.
I tried:
df = pd.DataFrame(thisdict)
and I got the error as below:
ValueError: If using all scalar values, you must pass an index
Can someone please help me?
You are supposed to assign the column as lists in your code.
Try replacing lines 2 and 3 with :
thisdict["Column1"] = [1]
thisdict["Column2"] = [2]
Complete code :
thisdict = {}
thisdict["Column1"] = [1]
thisdict["Column2"] = [2]
df = pd.DataFrame(thisdict)
Output :
Column1 Column2
0 1 2
If for some reason you didn't want to make your columns as lists, you could do this.
df = pd.DataFrame(pd.Series(thisdict)).transpose()
If your dictionary is going to one row of a dataframe you need to pass in a list with a single element.
pd.DataFrame(thisdict, index=[0])
Output:
Column1 Column2
0 1 2
It's not clear from your question what you want, but here are a couple options; I think you probably want the second option. To achieve it, make sure you use a list when you build your dictionary.
Option-1
thisdict = {}
thisdict["Column1"] = 1
thisdict["Column2"] = 2
print(thisdict)
print("Below is Option1:")
df = pd.DataFrame(list(thisdict.items()),columns = ['ColumnA','ColumnB'])
display(df)
Option-2
thisdict = {}
thisdict["Column1"] = [1]
thisdict["Column2"] = [2]
print(thisdict)
print("Below is Option2:")
df = pd.DataFrame(thisdict)
display(df)

Adding a pandas.dataframe to another one with it's own name

I have data that I want to retrieve from a couple of text files in a folder. For each file in the folder, I create a pandas.DataFrame to store the data. For now it works correctly and all the fils has the same number of rows.
Now what I want to do is to add each of these dataframes to a 'master' dataframe containing all of them. I would like to add each of these dataframes to the master dataframe with their file name.
I already have the file name.
For example, let say I have 2 dataframes with their own file names, I want to add them to the master dataframe with a header for each of these 2 dataframes representing the name of the file.
What I have tried now is the following:
# T0 data
t0_path = "C:/Users/AlexandreOuimet/Box Sync/Analyse Opto/Crunch/GF data crunch/T0/*.txt"
t0_folder = glob.glob(t0_path)
t0_data = pd.DataFrame()
for file in t0_folder:
raw_data = parseGFfile(file)
file_data = pd.DataFrame(raw_data, columns=['wavelength', 'max', 'min'])
file_name = getFileName(file)
t0_data.insert(loc=len(t0_data.columns), column=file_name, value=file_data)
Could someone help me with this please?
Thank you :)
Edit:
I think I was not clear enough, this is what I am expecting as an output:
output
You may be looking for the concat function. Here's an example:
import pandas as pd
A = pd.DataFrame({'Col1': [1, 2, 3], 'Col2': [4, 5, 6]})
B = pd.DataFrame({'Col1': [7, 8, 9], 'Col2': [10, 11, 12]})
a_filename = 'a_filename.txt'
b_filename = 'b_filename.txt'
A['filename'] = a_filename
B['filename'] = b_filename
C = pd.concat((A, B), ignore_index = True)
print(C)
Output:
Col1 Col2 filename
0 1 4 a_filename.txt
1 2 5 a_filename.txt
2 3 6 a_filename.txt
3 7 10 b_filename.txt
4 8 11 b_filename.txt
5 9 12 b_filename.txt
There are a couple changes to make here in order to make this happen in an easy way. I'll list the changes and reasoning below:
Specified which columns your master DataFrame will have
Instead of using some function that it seems like you were trying to define, you can simply create a new column called "file_name" that will be the filepath used to make the DataFrame for every record in that DataFrame. That way, when you combine the DataFrames, each record's origin is clear. I commented that you can make edits to that particular portion if you want to use string methods to clean up the filenames.
At the end, don't use insert. For combining DataFrames with the same columns (a union operation if you're familiar with SQL or with set theory), you can use the append method.
# T0 data
t0_path = "C:/Users/AlexandreOuimet/Box Sync/Analyse Opto/Crunch/GF data crunch/T0/*.txt"
t0_folder = glob.glob(t0_path)
t0_data = pd.DataFrame(columns=['wavelength', 'max', 'min','file_name'])
for file in t0_folder:
raw_data = parseGFfile(file)
file_data = pd.DataFrame(raw_data, columns=['wavelength', 'max', 'min'])
file_data['file_name'] = file #You can make edits here
t0_data = t0_data.append(file_data,ignore_index=True)

How to build a dataframe from values in a json string that is a list of dictionaries?

I have a giant list of values that I've downloaded and I want to build and insert them into a dataframe.
I thought it would be as easy as:
import pandas as pd
df = pd.DataFrame()
records = giant list of dictionary
df['var1'] = records[0]['key1']
df['var2'] = records[0]['key2']
and I would get a dataframe such as
var1 var2
val1 val2
However, my dataframe appears to be empty? I can individually print values from records no problem.
Simple Example:
t = [{'v1': 100, 'v2': 50}]
df['var1'] = t[0]['v1']
df['var2'] = t[0]['v2']
I would like to be:
var1 var2
100 50
One entry of your list of dictionaries looks like something you'd pass to the pd.Series constructor. You can turn that into a pd.DataFrame if you want to with the series method pd.Series.to_frame. I transpose at the end because I assume you wanted the dictionary to represent one row.
pd.Series(t[0]).to_frame().T
v1 v2
0 100 50
Pandas do exactly that for you !
>>> import pandas as pd
>>> t = [{'v1': 100, 'v2': 50}]
>>> df=pd.DataFrame(t)
>>> df
v1 v2
0 100 50
EDIT
>>> import pandas as pd
>>> t = [{'v1': 100, 'v2': 50}]
>>> df=pd.DataFrame([t[0]['v1']], index=None, columns=['var1'])
>>> df
0
0 100

how to convert csv to dictionary using pandas

How can I convert a csv into a dictionary using pandas? For example I have 2 columns, and would like column1 to be the key and column2 to be the value. My data looks like this:
"name","position"
"UCLA","73"
"SUNY","36"
cols = ['name', 'position']
df = pd.read_csv(filename, names = cols)
Since the 1st line of your sample csv-data is a "header",
you may read it as pd.Series using the squeeze keyword of pandas.read_csv():
>>> pd.read_csv(filename, index_col=0, header=None, squeeze=True).to_dict()
{'UCLA': 73, 'SUNY': 36}
If you want to include also the 1st line, remove the header keyword (or set it to None).
Convert the columns to a list, then zip and convert to a dict:
In [37]:
df = pd.DataFrame({'col1':['first','second','third'], 'col2':np.random.rand(3)})
print(df)
dict(zip(list(df.col1), list(df.col2)))
col1 col2
0 first 0.278247
1 second 0.459753
2 third 0.151873
[3 rows x 2 columns]
Out[37]:
{'third': 0.15187291615699894,
'first': 0.27824681093923298,
'second': 0.4597530377539677}
ankostis answer in my opinion is the most elegant solution when you have the file on disk.
However, if you do not want to or cannot go the detour of saving and loading from the file system, you can also do it like this:
df = pd.DataFrame({"name": [73, 36], "position" : ["UCLA", "SUNY"]})
series = df["position"]
series.index = df["name"]
series.to_dict()
Result:
{'UCLA': 73, 'SUNY': 36}

Categories