Saving lists to excel in Python - python

I have lists with a different number of variables in them. When I want to save to excel as a list, it saves to excel, but when I query the cells in Excel, I see that it saves as a string, not a list.
Is there a solution to this? Or am I looking at it from the wrong angle?
With the code below, what I mean will be better understood.
import streamlit as st
import pandas as pd
df_path = 'database/meters_try.xlsx'
df = pd.read_excel(df_path)
a = ['2023-02-11']
b = 'PR'
c = ['A', 'B']
d = 'AAA'
e = 'SHIFT'
f = ['PERSON1', 'PERSON2']
g = ['PERSON3', 'PERSON4', 'PERSON5']
h = ['QQ']
i = ['0']
j = ['50', '110']
k = ['50', '60',]
l = 'NOTES.'
b_type = type(b)
st.write(b_type)
c_type = type(c)
st.write(c_type)
def add_data(a, b, c, d, e, f, g, h, i, j, k, l, df):
temp_df = pd.DataFrame({
"column1": [a],
"column2": [b],
"column3": [c],
"column4": [d],
"column5": [e],
"column6": [f],
"column7" : [g],
"column8": [h],
"column9": [i],
"column10": [j],
"column11": [k],
"column12": [l]
})
df_meters = pd.read_excel(df_path)
df_meters = df_meters.append(temp_df, ignore_index=True)
df_meters.to_excel(df_path, index=False)
st.write(df_meters)
button = st.button("Save!")
if button:
add_data(a, b, c, d, e, f, g, h, i, j, k, l, df)
st.success("Saved")
def find_column_data_type(df):
for col in df.columns:
col_type = type(df[col][0])
st.write(f"{col} column data type: {col_type}")
if st.button("Find the columns data type"):
find_column_data_type(df)

I' guessing Excel doesnt know what a python list is.
After you've read the excel you can convert it back to a list using eval:
import pandas as pd
file = "/home/bera/Desktop/testexcel.xlsx"
df = pd.DataFrame(data={'A':[[1,2,3], [4,5,6]]})
# A
# [1, 2, 3]
# [4, 5, 6]
#type(df.iloc[0]["A"])
#list
df.to_excel(file, index=False)
df2 = pd.read_excel(file)
# A
# [1, 2, 3]
# [4, 5, 6]
#type(df2.iloc[0]["A"])
#str #Now it's a string
df2["A"]= df2["A"].map(eval)
# df2.iloc[0]["A"])
# list

Related

Python apply function to each row of DataFrame

I have DataFrame with two columns: Type and Name. The values in each cell are lists of equal length, i.e we have pairs (Type, Name). I want to:
Group Name by it's Type
Create column Type with the values of Names
My current code is a for loop:
for idx, row in df.iterrows():
for t in list(set(row["Type"])):
df.at[idx, t] = [row["Name"][i] for i in range(len(row["Name"])) if row["Type"][i] == t]
but it works very slow. How can I speed up this code?
EDIT Here is the code example which ilustrates what I want to obtain but in a faster way:
import pandas as pd
df = pd.DataFrame({"Type": [["1", "1", "2", "3"], ["2","3"]], "Name": [["A", "B", "C", "D"], ["E", "F"]]})
unique = list(set(row["Type"]))
for t in unique:
df[t] = None
df[t] = df[t].astype('object')
for idx, row in df.iterrows():
for t in unique:
df.at[idx, t] = [row["Name"][i] for i in range(len(row["Name"])) if row["Type"][i] == t]
You could write a function my_function(param) and then do something like this:
df['type'] = df['name'].apply(lambda x: my_function(x))
There are likely better alternatives to using lambda functions, but lambdas are what I remember. If you post a simplified mock of your original data and what the desired output should look like, it may help you find the best answer to your question. I'm not certain I understand what you're trying to do. A literal group by should be done using Dataframes' groupby method.
If I understand correctly your dataframe looks something like this:
df = pd.DataFrame({'Name':['a,b,c','d,e,f,g'], 'Type':['3,3,2','1,2,2,1']})
Name Type
0 a,b,c 3,3,2
1 d,e,f,g 1,2,2,1
where the elements are lists of strings.
Start with running:
df['Name:Type'] = (df['Name']+":"+df['Type']).map(process)
using:
def process(x):
x_,y_ = x.split(':')
x_ = x_.split(','); y_ = y_.split(',')
s = zip(x_,y_)
str_ = ','.join(':'.join(y) for y in s)
return str_
Then you will get:
This reduces the problem to a single column.
Finally produce the dataframe required by:
l = ','.join(df['Name:Type'].to_list()).split(',')
pd.DataFrame([i.split(':') for i in l], columns=['Name','Type'])
Giving:
is it the result you want? (if not then add to your question an example of desired output):
res = df.explode(['Name','Type']).groupby('Type')['Name'].agg(list)
print(res)
'''
Type
1 [A, B]
2 [C, E]
3 [D, F]
Name: Name, dtype: object
UPD
df1 = df.apply(lambda x: pd.Series(x['Name'],x['Type']).groupby(level=0).agg(list).T,1)
res = pd.concat([df,df1],axis=1)
print(res)
'''
Type Name 1 2 3
0 [1, 1, 2, 3] [A, B, C, D] [A, B] [C] [D]
1 [2, 3] [E, F] NaN [E] [F]

How to split one row into multiple rows in python

I have a pandas dataframe that has one long row as a result of a flattened json list.
I want to go from the example:
{'0_id': 1, '0_name': a, '0_address': USA, '1_id': 2, '1_name': b, '1_address': UK, '1_hobby': ski}
to a table like the following:
id
name
address
hobby
1
a
USA
2
b
UK
ski
Any help is greatly appreciated :)
There you go:
import json
json_data = '{"0_id": 1, "0_name": "a", "0_address": "USA", "1_id": 2, "1_name": "b", "1_address": "UK", "1_hobby": "ski"}'
arr = json.loads(json_data)
result = {}
for k in arr:
kk = k.split("_")
if int(kk[0]) not in result:
result[int(kk[0])] = {"id":"", "name":"", "hobby":""}
result[int(kk[0])][kk[1]] = arr[k]
for key in result:
print("%s %s %s" % (key, result[key]["name"], result[key]["address"]))
if you want to have field more dynamic, you have two choices - either go through all array and gather all possible names and then build template associated empty array, or just check if key exist in result when you returning results :)
This way only works if every column follows this pattern, but should otherwise be pretty robust.
data = {'0_id': '1', '0_name': 'a', '0_address': 'USA', '1_id': '2', '1_name': 'b', '1_address': 'UK', '1_hobby': 'ski'}
df = pd.DataFrame(data, index=[0])
indexes = set(x.split('_')[0] for x in df.columns)
to_concat = []
for i in indexes:
target_columns = [col for col in df.columns if col.startswith(i)]
df_slice = df[target_columns]
df_slice.columns = [x.split('_')[1] for x in df_slice.columns]
to_concat.append(df_slice)
new_df = pd.concat(to_concat)

2D list to csv - by column

I'd like to export the content of a 2D-list into a csv file.
The size of the sublists can be different. For example, the 2D-list can be something like :
a = [ ['a','b','c','d'], ['e','f'], ['g'], [], ['h','i'] ]
I want my csv to store the data like this - "by column" :
a,e,g, ,h
b,f, , ,i
c
d
Do I have to add some blank spaces to get the same size for each sublist ? Or is there another way to do so ?
Thank you for your help
You can use itertools.zip_longest:
import itertools, csv
a = [ ['a','b','c','d'], ['e','f'], ['g'], [], ['h','i'] ]
with open('filename.csv', 'w') as f:
write = csv.writer(f)
write.writerows(list(itertools.zip_longest(*a, fillvalue='')))
Output:
a,e,g,,h
b,f,,,i
c,,,,
d,,,,
It can be done using pandas and transpose function (T)
import pandas as pd
pd.DataFrame(a).T.to_csv('test.csv')
Result:
(test.csv)
,0,1,2,3,4
0,a,e,g,,h
1,b,f,,,i
2,c,,,,
3,d,,,,
import itertools
import pandas as pd
First create a dataframe using a nested array:
a = ['a','b','c','d']
b = ['e','f']
c = ['g']
d = []
e = ['h','i']
nest = [a,b,c,d,e]
df = pd.DataFrame((_ for _ in itertools.zip_longest(*nest)), columns=['a', 'b', 'c', 'd', 'e'])
like that:
a b c d e
0 a e g None h
1 b f None None i
2 c None None None None
3 d None None None None
and then store it using pandas:
df.to_csv('filename.csv', index=False)
We have three task to do here: fill sublist so all have same length, transpose, write to csv.
Sadly Python has not built-in function for filling, however it can be done relatively easily, I would do it following way
(following code is intended to give result as requested in OP):
a = [['a','b','c','d'],['e','f'],['g'],[],['h','i']]
ml = max([len(i) for i in a]) #number of elements of longest sublist
a = [(i+[' ']*ml)[:ml] for i in a] #adding ' ' for sublist shorter than longest
a = list(zip(*a)) #transpose
a = [','.join(i) for i in a] #create list of lines to be written
a = [i.rstrip(', ') for i in a] #jettison spaces if not followed by value
a = '\n'.join(a) #create string to be written to file
with open('myfile.csv','w') as f: f.write(a)
Content of myfile.csv:
a,e,g, ,h
b,f, , ,i
c
d

Python: Adding integer elements of a nested list to a list

So, I have two lists whose integer elements need to be added.
nested_lst_1 = [[6],[7],[8,9]]
lst = [1,2,3]
I need to add them such that every element in the nested list, will be added to its corresponding integer in 'lst' to obtain another nested list.
nested_list_2 = [[6 + 1],[7 + 2],[8 + 3,9 + 3]]
or
nested_list_2 = [[7],[9],[11,12]]
Then, I need to use the integers from nested_list_1 and nested_list_2 as indices to extract a substring from a string.
nested_list_1 = [[6],[7],[8,9]] *obtained above*
nested_list_2 = [[7],[9],[11,12]] *obtained above*
string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
string[6:7] = 'CG'
string[7:9] = 'GTA'
string[8:11] = 'TACG'
string[9:12] = 'ACGA'
Then, I need to create a nested list of the substrings obtained:
nested_list_substrings = [['CG'],['GTA'],['TACG','ACGA']]
Finally, I need to use these substrings as key values in a dictionary which also possesses keys of type string.
keys = ['GG', 'GTT', 'TCGG']
nested_list_substrings = [['CG'],['GTA'],['TACG','ACGA']]
DNA_mutDNA = {'GG':['CG'], 'GTT':['GTA'], 'TCGG':['TACG','ACGA']}
I understand that this is a multi-step problem, but if you could assist in any way, I really appreciate it.
Assuming you don't need the intermediate variables, you can do all this with a dictionary comprehension:
a = [[6],[7],[8,9]]
b = [1,2,3]
keys = ['GG', 'GTT', 'TCGG']
s = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
DNA_mutDNA = {k: [s[start:start+length+1] for start in starts]
for k, starts, length in zip(keys, a, b)}
You can produce the substring list directly with a nested list comprehension, nested_lst_2 isn't necessary.
nested_lst_1 = [[6],[7],[8,9]]
lst = [1,2,3]
string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
keys = ['GG', 'GTT', 'TCGG']
substrings = [[string[v:i+v+1] for v in u] for i, u in zip(lst, nested_lst_1)]
print(substrings)
DNA_mutDNA = dict(zip(keys, substrings))
print(DNA_mutDNA)
output
[['CG'], ['GTA'], ['TACG', 'ACGA']]
{'GG': ['CG'], 'GTT': ['GTA'], 'TCGG': ['TACG', 'ACGA']}
In[2]: nested_lst_1 = [[6],[7],[8,9]]
...: lst = [1,2,3]
...: string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
...: keys = ['GG', 'GTT', 'TCGG']
In[3]: nested_lst_2 = [[elem + b for elem in a] for a, b in zip(nested_lst_1, lst)]
In[4]: nested_list_substrings = []
...: for a, b in zip(nested_lst_1, nested_lst_2):
...: nested_list_substrings.append([string[c:d + 1] for c, d in zip(a, b)])
...:
In[5]: {k: v for k, v in zip(keys, nested_list_substrings)}
Out[5]: {'GG': ['CG'], 'GTT': ['GTA'], 'TCGG': ['TACG', 'ACGA']}
Surely not the most readable way to do it, here is a bit of functional style fun:
nested_lst_1 = [[6], [7], [8,9]]
lst = [1, 2, 3]
nested_lst_2 = list(map(
list,
map(map, map(lambda n: (lambda x: n+x), lst), nested_lst_1)))
nested_lst_2
Result looks as expected:
[[7], [9], [11, 12]]
Then:
from itertools import starmap
from operator import itemgetter
make_slices = lambda l1, l2: starmap(slice, zip(l1, map(lambda n: n+1, l2)))
string = 'AGTCATCGTACGATCATCGAAGCTAGCAGCATGAC'
get_slice = lambda s: itemgetter(s)(string)
nested_list_substrings = list(map(
lambda slices: list(map(get_slice, slices)),
starmap(make_slices, zip(nested_lst_1, nested_lst_2))))
nested_list_substrings
Result:
[['CG'], ['GTA'], ['TACG', 'ACGA']]
And finally:
keys = ['GG', 'GTT', 'TCGG']
DNA_mutDNA = dict(zip(keys, nested_list_substrings))
DNA_mutDNA
Final result:
{'GG': ['CG'], 'GTT': ['GTA'], 'TCGG': ['TACG', 'ACGA']}

Python: elegant and save coding way to create several list

Normally in order to store result in several lists in python, i create before the loop the corresponding empty lists.
A = []
B = []
c = []
D = []
E = []
F = []
for i in range(100):
# do some stuff
i there a method to create the lists in a single code line (or few)?
If the lists are logically similar (I hope so, because one hundred different variables is violence on programmer), create a dictionary of lists:
list_names = ['a', 'b', 'c' ]
d = {name:[] for name in list_names}
This creates a dictionary:
d = {'a': [], 'b': [], 'c': []}
where you can access individual lists:
d['a'].append(...)
or work on all of them at once:
for v in d.itervalues():
v.append(...)
The further advantage over individual lists is that you can pass your whole dict to a method.
You can use an object:
>>> from string import ascii_uppercase
>>> class MyLists(object):
def __init__(self):
for char in ascii_uppercase:
setattr(self, char, [])
>>> l = MyLists()
>>> l.A
[]
>>> l.B
[]
a, b, c, d, e, f = [], [], [], [], [], []
lists = ([] for _ in range(6))
lists[5].append(...)
or
A, B, C, D, E, F = ([] for _ in range(6))
or with defaultdict
d = defaultdict(list)
d['F'].append(...)

Categories