I've a csv file containing lines like this:
A,x1
A,x2
A,x3
B,x4
B,x5
B,x6
The first part reflects the group (A or B) a value (x1, x2, ...) belongs to.
What I want to do now is importing that csv file in Python, so I have two lists in the end:
ListA = [x1, x2, x3]
ListB = [x4, x5, x6]
Can someone help me out with that?
Thanks in advance :)
import sys
file_path = "path_to_your_csv"
stream_in = open(file_path, 'rb')
A = [];
B = [];
for line in stream_in.readlines():
add_to_List = line.split(",")[1].strip()
if 'A' in line:
A.append(add_to_List);
if 'B' in line:
B.append(add_to_List)
stream_in.close()
print A
print B
after putting your data in a pandas Series object names ser, just type in ser.loc("A")
and ser.loc("B") to get the data slice you want.
Using preassigned names for your vectors lead to lots of duplicated logic, that gets more and more complicated if you add new vectors to your data description...
It's much better to use dictionaries
data=[['a', 12.3], ['a', 12.4], ['b', 0.4], ['c', 1.2]]
vectors = {} # an empty dictionary
for key, value in data:
vectors.setdefault(key,[]).append(value)
The relevant docs, from the python official documentation
setdefault(key[, default])
If key is in the dictionary, return its value.
If not, insert key with a value of default and return default.
default defaults to None.
append(x)
appends x to the end of the sequence (same as s[len(s):len(s)] = [x])
You could try:
In[1]: import pandas as pd
In[2]: df = pd.read_csv(file_name, header=None)
In[3]: print(df)
out[3]:
0 1
0 A x1
1 A x2
2 A x3
3 B x4
4 B x5
In[4]: ListA = df[0].tolist()
In[5]: print(ListB)
Out[5]: ['A', 'A', 'A', 'B', 'B', 'B']
In[6]: ListB = t_df[1].tolist()
In[7]: print(ListB)
Out[7]: ['x1', 'x2', 'x3', 'x4', 'x5', 'x6']
Related
I have a dataframe with strings and a dictionary which values are lists of strings.
I need to check if each string of the dataframe contains any element of every value in the dictionary. And if it does, I need to label it with the appropriate key from the dictionary. All I need to do is to categorize all the strings in the dataframe with keys from the dictionary.
For example.
df = pd.DataFrame({'a':['x1','x2','x3','x4']})
d = {'one':['1','aa'],'two':['2','bb']}
I would like to get something like this:
df = pd.DataFrame({
'a':['x1','x2','x3','x4'],
'Category':['one','two','x3','x4']})
I tried this, but it has not worked:
df['Category'] = np.nan
for k, v in d.items():
for l in v:
df['Category'] = [k if l in str(x).lower() else x for x in df['a']]
Any ideas appreciated!
Firstly create a function that do this for you:-
def func(val):
for x in range(0,len(d.values())):
if val in list(d.values())[x]:
return list(d.keys())[x]
Now make use of split() and apply() method:-
df['Category']=df['a'].str.split('',expand=True)[2].apply(func)
Finally use fillna() method:-
df['Category']=df['Category'].fillna(df['a'])
Now if you print df you will get your expected output:-
a Category
0 x1 one
1 x2 two
2 x3 x3
3 x4 x4
Edit:
You can also do this by:-
def func(val):
for x in range(0,len(d.values())):
if any(l in val for l in list(d.values())[x]):
return list(d.keys())[x]
then:-
df['Category']=df['a'].apply(func)
Finally:-
df['Category']=df['Category'].fillna(df['a'])
I've come up with the following heuristic, which looks really dirty.
It outputs what you desire, albeit with some warnings, since I've used indices to append values to dataframe.
import pandas as pd
import numpy as np
def main():
df = pd.DataFrame({'a': ['x1', 'x2', 'x3', 'x4']})
d = {'one': ['1', 'aa'], 'two': ['2', 'bb']}
found = False
i = 0
df['Category'] = np.nan
for x in df['a']:
for k,v in d.items():
for item in v:
if item in x:
df['Category'][i] = k
found = True
break
else:
df['Category'][i] = x
if found:
found = False
break
i += 1
print(df)
main()
Having issues with building a find and replace tool in python. Goal is to search a column in an excel file for a string and swap out every letter of the string based on the key value pair of the dictionary, then write the entire new string back to the same cell. So "ABC" should convert to "BCD". I have to find and replace any occurrence of individual characters.
The below code runs without debugging, but newvalue never creates and I don't know why. No issues writing data to the cell if newvalue gets created.
input: df = pd.DataFrame({'Code1': ['ABC1', 'B5CD', 'C3DE']})
expected output: df = pd.DataFrame({'Code1': ['BCD1', 'C5DE', 'D3EF']})
mycolumns = ["Col1", "Col2"]
mydictionary = {'A': 'B', 'B': 'C', 'C': 'D'}
for x in mycolumns:
# 1. If the mycolumn value exists in the headerlist of the file
if x in headerlist:
# 2. Get column coordinate
col = df.columns.get_loc(x) + 1
# 3. iterate through the rows underneath that header
for ind in df.index:
# 4. log the row coordinate
rangerow = ind + 2
# 5. get the original value of that coordinate
oldval = df[x][ind]
for count, y in enumerate(oldval):
# 6. generate replacement value
newval = df.replace({y: mydictionary}, inplace=True, regex=True, value=None)
print("old: " + str(oldval) + " new: " + str(newval))
# 7. update the cell
ws.cell(row=rangerow, column=col).value = newval
else:
print("not in the string")
else:
# print(df)
print("column doesn't exist in workbook, moving on")
else:
print("done")
wb.save(filepath)
wb.close()
I know there's something going on with enumerate and I'm probably not stitching the string back together after I do replacements? Or maybe a dictionary is the wrong solution to what I am trying to do, the key:value pair is what led me to use it. I have a little programming background but ery little with python. Appreciate any help.
newvalue never creates and I don't know why.
DataFrame.replace with inplace=True will return None.
>>> df = pd.DataFrame({'Code1': ['ABC1', 'B5CD', 'C3DE']})
>>> df = df.replace('ABC1','999')
>>> df
Code1
0 999
1 B5CD
2 C3DE
>>> q = df.replace('999','zzz', inplace=True)
>>> print(q)
None
>>> df
Code1
0 zzz
1 B5CD
2 C3DE
>>>
An alternative could b to use str.translate on the column (using its str attribute) to encode the entire Series
>>> df = pd.DataFrame({'Code1': ['ABC1', 'B5CD', 'C3DE']})
>>> mydictionary = {'A': 'B', 'B': 'C', 'C': 'D'}
>>> table = str.maketrans('ABC','BCD')
>>> df
Code1
0 ABC1
1 B5CD
2 C3DE
>>> df.Code1.str.translate(table)
0 BCD1
1 C5DD
2 D3DE
Name: Code1, dtype: object
>>>
I'd like to export the content of a 2D-list into a csv file.
The size of the sublists can be different. For example, the 2D-list can be something like :
a = [ ['a','b','c','d'], ['e','f'], ['g'], [], ['h','i'] ]
I want my csv to store the data like this - "by column" :
a,e,g, ,h
b,f, , ,i
c
d
Do I have to add some blank spaces to get the same size for each sublist ? Or is there another way to do so ?
Thank you for your help
You can use itertools.zip_longest:
import itertools, csv
a = [ ['a','b','c','d'], ['e','f'], ['g'], [], ['h','i'] ]
with open('filename.csv', 'w') as f:
write = csv.writer(f)
write.writerows(list(itertools.zip_longest(*a, fillvalue='')))
Output:
a,e,g,,h
b,f,,,i
c,,,,
d,,,,
It can be done using pandas and transpose function (T)
import pandas as pd
pd.DataFrame(a).T.to_csv('test.csv')
Result:
(test.csv)
,0,1,2,3,4
0,a,e,g,,h
1,b,f,,,i
2,c,,,,
3,d,,,,
import itertools
import pandas as pd
First create a dataframe using a nested array:
a = ['a','b','c','d']
b = ['e','f']
c = ['g']
d = []
e = ['h','i']
nest = [a,b,c,d,e]
df = pd.DataFrame((_ for _ in itertools.zip_longest(*nest)), columns=['a', 'b', 'c', 'd', 'e'])
like that:
a b c d e
0 a e g None h
1 b f None None i
2 c None None None None
3 d None None None None
and then store it using pandas:
df.to_csv('filename.csv', index=False)
We have three task to do here: fill sublist so all have same length, transpose, write to csv.
Sadly Python has not built-in function for filling, however it can be done relatively easily, I would do it following way
(following code is intended to give result as requested in OP):
a = [['a','b','c','d'],['e','f'],['g'],[],['h','i']]
ml = max([len(i) for i in a]) #number of elements of longest sublist
a = [(i+[' ']*ml)[:ml] for i in a] #adding ' ' for sublist shorter than longest
a = list(zip(*a)) #transpose
a = [','.join(i) for i in a] #create list of lines to be written
a = [i.rstrip(', ') for i in a] #jettison spaces if not followed by value
a = '\n'.join(a) #create string to be written to file
with open('myfile.csv','w') as f: f.write(a)
Content of myfile.csv:
a,e,g, ,h
b,f, , ,i
c
d
I have few global variables and I have a list. Within a function I am using the list and updating the values as below , but the global variables doesn't seem to be updated.
a = "hello"
b ="how"
c = "are you"
data = ([a,"abc","xyz"],[b,"pqr","mno"],[c,"test","quest"])
def checklist():
global data , a, b, c
for values in data:
values[0] = values[1]
checklist()
print a + ":" + b + ":"+ c
Now when i expect the global variables to be updated which is not happening, I still see old variables, could some one explain how to update global variables from the list.
The loop in data changes data's value, which won't change other variable.
When you run values[0] = values[1], it means values[0] repoints to another object, but a will stays the same.
In [52]: a = '12'
In [53]: li = [a, 'b', 'c']
In [54]: id(li[0])
Out[54]: 140264171560632
In [55]: id(a)
Out[55]: 140264171560632
In [56]: li[0] = 'a'
In [57]: li
Out[57]: ['a', 'b', 'c']
In [58]: a
Out[58]: '12'
In [60]: id(li[0])
Out[60]: 140264267728616
In [61]: id(a)
Out[61]: 140264171560632
You intend the values in data to be changes in the for loop?
The reason that's not happening is that you're changing value, not the actual data.
a,b,c = "hello", "how", "are you"
data = ([a,"abc","xyz"],[b,"pqr","mno"],[c,"test","quest"])
def checklist():
global data , a, b, c
for values in data:
values[0] = values[1]
checklist()
print a + ":" + b + ":"+ c
hello:how:are you
print(data)
(['abc', 'abc', 'xyz'], ['pqr', 'pqr', 'mno'], ['test', 'test', 'quest'])
You are wrong if you want do something like this, edit a variable that you inserted in a list.
values[0] = values[1]
Set in the position 0 of values -> values[1]. But no modify a-b-c!!
Suppose I have a 2D list,
a= [['a','b','c',1],
['a','b','d',2],
['a','e','d',3],
['a','e','c',4]]
I want to obtain a list such that if the first two elements in rows are identical, sum the fourth element, drop the third element and combine these rows together, like the following,
b = [['a','b',3],
['a','e',7]]
What is the most efficient way to do this?
If your list is already sorted, then you can use itertools.groupby. Once you group by the first two elements, you can use a generator expression to sum the 4th element and create your new lists.
>>> from itertools import groupby
>>> a= [['a','b','c',1],
['a','b','d',2],
['a','e','d',3],
['a','e','c',4]]
>>> [g[0] + [sum(i[3] for i in g[1])] for g in groupby(a, key = lambda i : i[:2])]
[['a', 'b', 3],
['a', 'e', 7]]
Using pandas's groupby:
import pandas as pd
df = pd.DataFrame(a)
df.groupby([0, 1]).sum().reset_index().values.tolist()
Output:
df.groupby([0, 1]).sum().reset_index().values.tolist()
Out[19]: [['a', 'b', 3L], ['a', 'e', 7L]]
You can use pandas groupby methods to achieve that goal.
import pandas as pd
a= [['a','b','c',1],
['a','b','d',2],
['a','e','d',3],
['a','e','c',4]]
df = pd.DataFrame(a)
df_sum = df.groupby([0,1])[3].sum().reset_index()
array_return = df_sum.values
list_return = array_return.tolist()
print(list_return)
list_reuturn is the result you want.
If you're interested. Here is an implementation using raw python. I've only tested it on the dataset you provided.
a= [['a','b','c',1],
['a','b','d',2],
['a','e','d',3],
['a','e','c',4]]
b_dict = {}
for row in a:
key = (row[0], row[1])
b_dict[key] = b_dict[key] + row[3] if key in b_dict else row[3]
b = [[key[0], key[1], value] for key, value in b_dict.iteritems()]