Python list of global variables not updated - python

I have few global variables and I have a list. Within a function I am using the list and updating the values as below , but the global variables doesn't seem to be updated.
a = "hello"
b ="how"
c = "are you"
data = ([a,"abc","xyz"],[b,"pqr","mno"],[c,"test","quest"])
def checklist():
global data , a, b, c
for values in data:
values[0] = values[1]
checklist()
print a + ":" + b + ":"+ c
Now when i expect the global variables to be updated which is not happening, I still see old variables, could some one explain how to update global variables from the list.

The loop in data changes data's value, which won't change other variable.
When you run values[0] = values[1], it means values[0] repoints to another object, but a will stays the same.
In [52]: a = '12'
In [53]: li = [a, 'b', 'c']
In [54]: id(li[0])
Out[54]: 140264171560632
In [55]: id(a)
Out[55]: 140264171560632
In [56]: li[0] = 'a'
In [57]: li
Out[57]: ['a', 'b', 'c']
In [58]: a
Out[58]: '12'
In [60]: id(li[0])
Out[60]: 140264267728616
In [61]: id(a)
Out[61]: 140264171560632

You intend the values in data to be changes in the for loop?
The reason that's not happening is that you're changing value, not the actual data.

a,b,c = "hello", "how", "are you"
data = ([a,"abc","xyz"],[b,"pqr","mno"],[c,"test","quest"])
def checklist():
global data , a, b, c
for values in data:
values[0] = values[1]
checklist()
print a + ":" + b + ":"+ c
hello:how:are you
print(data)
(['abc', 'abc', 'xyz'], ['pqr', 'pqr', 'mno'], ['test', 'test', 'quest'])
You are wrong if you want do something like this, edit a variable that you inserted in a list.
values[0] = values[1]
Set in the position 0 of values -> values[1]. But no modify a-b-c!!

Related

Python find and replace tool using pandas and a dictionary

Having issues with building a find and replace tool in python. Goal is to search a column in an excel file for a string and swap out every letter of the string based on the key value pair of the dictionary, then write the entire new string back to the same cell. So "ABC" should convert to "BCD". I have to find and replace any occurrence of individual characters.
The below code runs without debugging, but newvalue never creates and I don't know why. No issues writing data to the cell if newvalue gets created.
input: df = pd.DataFrame({'Code1': ['ABC1', 'B5CD', 'C3DE']})
expected output: df = pd.DataFrame({'Code1': ['BCD1', 'C5DE', 'D3EF']})
mycolumns = ["Col1", "Col2"]
mydictionary = {'A': 'B', 'B': 'C', 'C': 'D'}
for x in mycolumns:
# 1. If the mycolumn value exists in the headerlist of the file
if x in headerlist:
# 2. Get column coordinate
col = df.columns.get_loc(x) + 1
# 3. iterate through the rows underneath that header
for ind in df.index:
# 4. log the row coordinate
rangerow = ind + 2
# 5. get the original value of that coordinate
oldval = df[x][ind]
for count, y in enumerate(oldval):
# 6. generate replacement value
newval = df.replace({y: mydictionary}, inplace=True, regex=True, value=None)
print("old: " + str(oldval) + " new: " + str(newval))
# 7. update the cell
ws.cell(row=rangerow, column=col).value = newval
else:
print("not in the string")
else:
# print(df)
print("column doesn't exist in workbook, moving on")
else:
print("done")
wb.save(filepath)
wb.close()
I know there's something going on with enumerate and I'm probably not stitching the string back together after I do replacements? Or maybe a dictionary is the wrong solution to what I am trying to do, the key:value pair is what led me to use it. I have a little programming background but ery little with python. Appreciate any help.
newvalue never creates and I don't know why.
DataFrame.replace with inplace=True will return None.
>>> df = pd.DataFrame({'Code1': ['ABC1', 'B5CD', 'C3DE']})
>>> df = df.replace('ABC1','999')
>>> df
Code1
0 999
1 B5CD
2 C3DE
>>> q = df.replace('999','zzz', inplace=True)
>>> print(q)
None
>>> df
Code1
0 zzz
1 B5CD
2 C3DE
>>>
An alternative could b to use str.translate on the column (using its str attribute) to encode the entire Series
>>> df = pd.DataFrame({'Code1': ['ABC1', 'B5CD', 'C3DE']})
>>> mydictionary = {'A': 'B', 'B': 'C', 'C': 'D'}
>>> table = str.maketrans('ABC','BCD')
>>> df
Code1
0 ABC1
1 B5CD
2 C3DE
>>> df.Code1.str.translate(table)
0 BCD1
1 C5DD
2 D3DE
Name: Code1, dtype: object
>>>

how to apply on certain condition and append value inside it separated by - in pandas through python

I have created a dataframe which contains the columns Name and Mains.
data = [['Anshu', '8321-1328-11'], ['Hero', '83211-1128-11'], ['Naman', '65432-8765-4']]
df = pd.DataFrame(data, columns = ['Name', 'Mains'])
I want to update the Mains column into a new column with an updated value i.e. df['new_mains'] with the following condition: if the number is seprated by 4-4-2, then should be added with 0 and updated separated number will be 5-4-2? is it possible to do so in pandas?
Pretty sure it could be done. For example,
def my_func(strn):
a,b,c = strn.split('-')
new_a = '0'+a if len(a)==4 else a
new_b = '0'+b if len(b)==3 else b
new_c = '9'+c if len(c)==1 else c
return '-'.join([new_a,new_b,new_c])
And then,
df['New_Mains'] = df['Mains'].apply(my_func)
Note: Going off the assumption 'a' is either of length 4 or 5. If 'a', 'b', 'c' are of any other length then, you can also do something like (works for current scenario as well)
new_a = '0'*5-len(a) + a
new_b = '0'*4-len(b) + b
new_c = '9'*2-len(c) + c
More on str.split here. Basically, in your case the string reads something like "99999-9999-99" with "-" as a separator. So,
"99999-9999-99".split('-') #would return
['99999', '9999', '99']
where a = '99999', b = '9999', c = '99'.
new_a, new_b, new_c are variables to hold new values of a, b and c after checking for the conditional statements. Finally join the strings new_a, new_b, new_c to look like original strings from 'Mains' column. More on str.join

How can I create a DataFrame slice object piece by piece?

I have a DataFrame, and I want to select certain rows and columns from it. I know how to do this using loc. However, I want to be able to specify each criteria individually, rather than in one go.
import numpy as np
import pandas as pd
idx = pd.IndexSlice
index = [np.array(['foo', 'foo', 'qux', 'qux']),
np.array(['a', 'b', 'a', 'b'])]
columns = ["A", "B"]
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=columns)
print df
print df.loc[idx['foo', :], idx['A':'B']]
A B
foo a 0.676649 -1.638399
b -0.417915 0.587260
qux a 0.294555 -0.573041
b 1.592056 0.237868
A B
foo a -0.470195 -0.455713
b 1.750171 -0.409216
Requirement
I want to be able to achieve the same result with something like the following bit of code, where I specify each criteria one by one. It's also important that I'm able to use a slice_list to allow dynamic behaviour [i.e. the syntax should work whether there are two, three or ten different criteria in the slice_list].
slice_1 = 'foo'
slice_2 = ':'
slice_list = [slice_1, slice_2]
column_slice = "'A':'B'"
print df.loc[idx[slice_list], idx[column_slice]]
You can achieve this using the slice built-in function. You can't build slices with strings as ':' is a literal character and not a syntatical one.
slice_1 = 'foo'
slice_2 = slice(None)
column_slice = slice('A', 'B')
df.loc[idx[slice_1, slice_2], idx[column_slice]]
You might have to build your "slice lists" a little differently than you intended, but here's a relatively compact method using df.merge() and df.ix[]:
# Build a "query" dataframe
slice_df = pd.DataFrame(index=[['foo','qux','qux'],['a','a','b']])
# Explicitly name columns
column_slice = ['A','B']
slice_df.merge(df, left_index=True, right_index=True, how='inner').ix[:,column_slice]
Out[]:
A B
foo a 0.442302 -0.949298
qux a 0.425645 -0.233174
b -0.041416 0.229281
This method also requires you to be explicit about your second index and columns, unfortunately. But computers are great at making long tedious lists for you if you ask nicely.
EDIT - Example of method to dynamically built a slice list that could be used like above.
Here's a function that takes a dataframe and spits out a list that could then be used to create a "query" dataframe to slice the original by. It only works with dataframes with 1 or 2 indices. Let me know if that's an issue.
def make_df_slice_list(df):
if df.index.nlevels == 1:
slice_list = []
# Only one level of index
for dex in df.index.unique():
if input("DF index: " + dex + " - Include? Y/N: ") == "Y":
# Add to slice list
slice_list.append(dex)
if df.index.nlevels > 1:
slice_list = [[] for _ in xrange(df.index.nlevels)]
# Multi level
for i in df.index.levels[0]:
print "DF index:", i, "has subindexes:", [dex for dex in df.ix[i].index]
sublist = input("Enter a the indexes you'd like as a list: ")
# if no response, the first entry
if len(sublist)==0:
sublist = [df.ix[i].index[0]]
# Add an entry to the first index list for each sub item passed
[slice_list[0].append(i) for item in sublist]
# Add each of the second index list items
[slice_list[1].append(item) for item in sublist]
return slice_list
I'm not advising this as a way to communicate with your user, just an example. When you use it you have to pass strings (e.g. "Y" and "N") and lists of string (["a","b"]) and empty lists [] at prompts. Example:
In [115]: slice_list = make_df_slice_list(df)
DF index: foo has subindexes: ['a', 'b']
Enter a the indexes you'd like as a list: []
DF index: qux has subindexes: ['a', 'b']
Enter a the indexes you'd like as a list: ['a','b']
In [116]:slice_list
Out[116]: [['foo', 'qux', 'qux'], ['a', 'a', 'b']]
# Back to my original solution, but now passing the list:
slice_df = pd.DataFrame(index=slice_list)
column_slice = ['A','B']
slice_df.merge(df, left_index=True, right_index=True, how='inner').ix[:,column_slice]
Out[117]:
A B
foo a -0.249547 0.056414
qux a 0.938710 -0.202213
b 0.329136 -0.465999
Building up on the answer by Ted Petrou:
slices = [('foo', slice(None)), slice('A', 'B')]
print df.loc[tuple(idx[s] for s in slices)]
A B
foo a -0.465421 -0.591763
b -0.854938 1.221204
slices = [('foo', slice(None)), 'A']
print df.loc[tuple(idx[s] for s in slices)]
foo a -0.465421
b -0.854938
Name: A, dtype: float64
slices = [('foo', slice(None))]
print df.loc[tuple(idx[s] for s in slices)]
A B
foo a -0.465421 -0.591763
b -0.854938 1.221204
You have to use tuples when calling __getitem__ (loc[...]) with a 'dynamic' argument.
You could also avoid building the slice objects by hand:
def to_selector(s):
if isinstance(s, tuple) or isinstance(s, list):
return tuple(map(to_selector, s))
ps = [None if len(p) == 0 else p for p in s.split(':')]
assert len(ps) > 0 and len(ps) <= 2
if len(ps) == 1:
assert ps[0] is not None
return ps[0]
return slice(*ps)
query = [('foo', ':'), 'A:B']
df.loc[tuple(idx[to_selector(s)] for s in query)]
do you mean this?
import numpy as np
import pandas as pd
idx = pd.IndexSlice
index = [np.array(['foo', 'foo', 'qux', 'qux']),
np.array(['a', 'b', 'a', 'b'])]
columns = ["A", "B"]
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=columns)
print df
#
la1 = lambda df: df.loc[idx['foo', :], idx['A':'B']]
la2 = lambda df: df.loc[idx['qux', :], idx['A':'B']]
laList = [la1, la2]
result = map(lambda la: la(df), laList)
print result[0]
print result[1]
A B
foo a 0.162138 -1.382822
b -0.822986 -0.403766
qux a 0.191695 -1.125841
b 0.669254 -0.704894
A B
foo a 0.162138 -1.382822
b -0.822986 -0.403766
A B
qux a 0.191695 -1.125841
b 0.669254 -0.704894
Did you simply mean this?
df.loc[idx['foo',:], :].loc[idx[:,'a'], :]
In a slightly more general form, for example:
def multiindex_partial_row_slice(df, part_idx, criteria):
slc = idx[tuple([slice(None) if i != part_idx else criteria
for i in range(len(df.index.levels))])]
return df.loc[slc, :]
multiindex_partial_row_slice(df, 1, slice('a','b'))
Similarly you can always narrow your current column set by appending .loc[:, columns] to your currently sliced view.

Python concatenate arrays based on information from a list

I have a list with about 500 elements in it. For illustration I have:
list3 = [ 'a', 'b', 'c', 'a' ]
Where 'a', 'b', 'c' is name of the arrays as:
a = np.random.normal( 0, 1, ( 500, 20 ) )
b = np.random.normal( 0, 1, ( 500, 30 ) )
c = np.random.normal( 0, 1, ( 500, 30 ) )
I want to concatenate the arrays in the list in the order present in the list.
So, for my example I want to obtain:
C = np.concatenate( ( a, b, c, a ), 1 )
I don't have an idea how to approach this other than to store the arrays in a dictionary and then do a string search and concatenation in a for loop. Is there an elegant way to do this ?
You can use the locals() dictionary to access the variables by name
d = locals()
np.concatenate([d[x] for x in list3], 1)
If you want to be compact:
np.concatenate([dict(a=a, b=b, c=c)[x] for x in list3], 1)
Or to avoid the redundant dictionary creation:
by_label = dict(a=a, b=b, c=c)
np.concatenate([by_label[x] for x in list3], 1)
You can use the globals object to get the arrays based on name.
globals()["a"] # array a
So can do
np.concatenate(tuple(globals()[x] for x in list3),1)
You can easily get such a dictionary of all local variables by calling the locals() function. For example, to look up a variable named 'a':
var = 'a'
locals()[var]
Since np.concatenate appears to take a tuple, you could use:
lc = locals()
C = np.concatenate(tuple(lc[var] for var in list3), 1)
Why don't you store directly the variables instead of their names ?
Like :
list3 = [a, b, c, a]
C = np.concatenate(list3, axis=1)
Or you can use eval() (which don't seems to be recommanded most of the time) :
list3 = ['a', 'b', 'c','a']
CC = np.concatenate([eval(i) for i in list3], axis=1)

How to get two arrays out of one csv file in Python?

I've a csv file containing lines like this:
A,x1
A,x2
A,x3
B,x4
B,x5
B,x6
The first part reflects the group (A or B) a value (x1, x2, ...) belongs to.
What I want to do now is importing that csv file in Python, so I have two lists in the end:
ListA = [x1, x2, x3]
ListB = [x4, x5, x6]
Can someone help me out with that?
Thanks in advance :)
import sys
file_path = "path_to_your_csv"
stream_in = open(file_path, 'rb')
A = [];
B = [];
for line in stream_in.readlines():
add_to_List = line.split(",")[1].strip()
if 'A' in line:
A.append(add_to_List);
if 'B' in line:
B.append(add_to_List)
stream_in.close()
print A
print B
after putting your data in a pandas Series object names ser, just type in ser.loc("A")
and ser.loc("B") to get the data slice you want.
Using preassigned names for your vectors lead to lots of duplicated logic, that gets more and more complicated if you add new vectors to your data description...
It's much better to use dictionaries
data=[['a', 12.3], ['a', 12.4], ['b', 0.4], ['c', 1.2]]
vectors = {} # an empty dictionary
for key, value in data:
vectors.setdefault(key,[]).append(value)
The relevant docs, from the python official documentation
setdefault(key[, default])
If key is in the dictionary, return its value.
If not, insert key with a value of default and return default.
default defaults to None.
append(x)
appends x to the end of the sequence (same as s[len(s):len(s)] = [x])
You could try:
In[1]: import pandas as pd
In[2]: df = pd.read_csv(file_name, header=None)
In[3]: print(df)
out[3]:
0 1
0 A x1
1 A x2
2 A x3
3 B x4
4 B x5
In[4]: ListA = df[0].tolist()
In[5]: print(ListB)
Out[5]: ['A', 'A', 'A', 'B', 'B', 'B']
In[6]: ListB = t_df[1].tolist()
In[7]: print(ListB)
Out[7]: ['x1', 'x2', 'x3', 'x4', 'x5', 'x6']

Categories