I have a knot in my head. I wasn't even sure what to google for. (Or how to formulate my title)
I want to do the following: I want to write a function that takes a term that occurs in the name of a .csv, but at the same time I want a df to be named after it.
Like so:
def read_data_into_df(name):
df_{name} = pd.read_csv(f"file_{name}.csv")
Of course the df_{name} part is not working. But I hope you get the idea.
Is this possible without hard coding?
Thanks!
IIUC, you can use globals :
def read_data_into_df(name):
globals()[f"df_{name}"] = pd.read_csv(f"file_{name}.csv")
If I were you I would create a dictionary and create keys with
dictionary = f"df_{name}: {whatever_you_want}"
If there are only a couple of dataframes, just accept the minimal code repetition:
def read_data_into_df(name):
return pd.read_csv(f"file_{name}.csv")
...
df_ham = read_data_into_df('ham')
df_spam = read_data_into_df('spam')
df_bacon = read_data_into_df('bacon')
...
# Use df_ham, df_spam and df_bacon
If there's a lot of them, or the exact data frames are generated, I would use a dictionary to keep track of the dataframes:
dataframes = {}
def read_data_into_df(name):
return pd.read_csv(f"file_{name}.csv")
...
for name in ['ham', 'spam', 'bacon']:
dataframes[name] = read_data_into_df('name')
...
# Use dataframes['ham'], dataframes['spam'] and dataframes['bacon']
# Or iterate over dataframes.values() or dataframes.items()!
Related
I would like to change the word consolidation (two times in the following string) with an other value with a variable or ? (ex. breakout/outofconsolidation/inside)
Can I help me to achieve this, please?
dfconsolidationcsv.to_csv(r'symbols\stocks_consolidation_sp500.csv', index = False)
a = ('breakout')
df{a}csv.to_csv(r'symbols\stocks_{a}_sp500.csv', index = False)
Unless there is a justifiable reason to be creating dynamic variable assignments, I would avoid doing so. In this case, defining your DataFrame variables in a dict is probably sufficient:
# store df in a dict instead of separate variables
df_dict = dict()
df_dict['consolidation'] = dfconslidationcv
df_dict['breakout'] = dfbreakoutcv
...
# invoke command for a specific variable
a = 'breakout'
df_dict[a].to_csv(r'symbols\stocks_%s_sp500.csv' % a, index = False)
Now, if there is an overwhelming reason why you HAVE to use pre-existing variable names that need to be changed dynamically, I think you can do something like this:
a = 'breakout'
exec("df%scsv.to_csv(r'symbols\stocks_%s_sp500.csv', index=False)" % (a, a))
I have some code I'm trying to refactor, which looks a bit like this in python 3:
# some_obj.query(streetaddress="burdon")
# some_obj.query(area="bungo")
# some_obj.query(region="bingo")
# some_obj.query(some_other_key="bango")
How I can DRY this up so I have something like this?
# define a list of tuples like so:
set_o_tuples = [
("streetaddress", "burdon")
("area", "bungo"),
("region", "bingo"),
("some_other_key", "bango"),
])
And then call it in a function
for key, val in set_o_tuples:
some_obj.query(key=val)
When I try to run this code, I get an exception like the following - as Python doesn't like keywords being passed in like this:
SyntaxError: keyword can't be an expression
What is the idiomatic way to DRY this up, so I don't have to repeat loads of code like the example above?
Update: sorry folks, I think the example I put together above missed a few important details. I basically have some pytest code like so
def test_can_search_by_location(self, db, docs_from_csv):
"""
We want to be able to query the contents of all the below fields when we query by location:
[ streetaddress, locality, region, postcode]
"""
search = SomeDocument.search()
locality_query = search.query('match', locality="some val")
locality_res = locality_query.execute()
region_query = search.query('match', region="region val")
region_query_res = region_query.execute()
postcode_query = search.query('match', postcode="postcode_val")
postcode_query_res = postcode_query.execute()
streetaddress_query = search.query('match', concat_field="burdon")
field_query_res = field_query.execute()
location_query = search.query('match', location=concat_value)
location_query_res = location_query.execute()
assert len(locality_query_res) == len(location_query_res)
assert len(region_query_res) == len(location_query_res)
assert len(streetaddress_query_res) == len(location_query_res)
assert len(postcode_query_res) == len(location_query_res)
I was trying to DRY up some of this, as there are similar examples I have, but after reading the comments, I've rethought it - the savings in space don't really justify the changes. Thanks for the pointers.
You could define a list of dictionaries instead, and then unpack them when calling the method in a loop:
list_of_dicts = [
{"streetaddress": "burdon"}
{"area": "bungo"},
{"region": "bingo"},
{"some_other_key": "bango"},
]
for kwargs in list_of_dicts:
some_obj.query(**kwargs)
Use dictionary unpacking
some_obj.query(**{key: val})
I wouldn't recommend what you're doing though. The original method is clean and obvious. Your new one could be confusing. I would keep it as is. This looks to be a poorly designed python API, some_obj.query should just take multiple keyword arguments in one function. You could make your own like this:
def query(obj, **kwargs):
# python 3.6 or later to preserve kwargs order
for key, value in kwargs.items():
obj.query(**{key: value})
And then call like so
query(some_obj, streetaddress='burdon', area='bungo', region='bingo', some_other_key='bango')
This is probably a very basic question but I haven't been able to figure this out.
I'm currently using the following to append values to an empty list
shoes = {'groups':['running','walking']}
df_shoes_group_names = pd.DataFrame(shoes)
shoes_group_name=[]
for type in df_shoes_group_names['groups']:
shoes_group_name.append(type)
shoes_group_name
['running', 'walking']
I'm trying to accomplish the same using a for loop, however, when I execute the loop the list comes back as blank
shoes_group_name=[]
def list_builder(dataframe_name):
if 'shoes' in dataframe_name:
for type in df_shoes_group_names['groups']:
shoes_group_name.append(type)
list_builder(df_shoes_group_names)
shoes_group_name
[]
Reason for the function is that eventually I'll have multiple DF's with different product's so i'd like to just have if statements within the function to handle the creation of each list
so for example future examples could look like this:
df_shoes_group_names
df_boots_group_names
df_sandals_group_names
shoes_group_name=[]
boots_group_name=[]
sandals_group_name=[]
def list_builder(dataframe_name):
if 'shoes' in dataframe_name:
for type in df_shoes_group_names['groups']:
shoes_group_name.append(type)
elif 'boots' in dataframe_name:
for type in df_boots_group_names['groups']:
boots_group_name.append(type)
elif 'sandals' in dataframe_name:
for type in df_sandals_group_names['groups']:
sandals_group_name.append(type)
list_builder(df_shoes_group_names)
list_builder(df_boots_group_names)
list_builder(df_sandals_group_names)
Not sure if I'm approaching this the right way so any advice would be appreciated.
Best,
You should never call or search a variable name as if it were a string.
Instead, use a dictionary to store a variable number of variables.
Bad practice
# dataframes
df_shoes_group_names = pd.DataFrame(...)
df_boots_group_names = pd.DataFrame(...)
df_sandals_group_names = pd.DataFrame(...)
def foo(x):
if shoes in df_shoes_group_names: # <-- THIS WILL NOT WORK
# do something with x
Good practice
# dataframes
df_shoes_group_names = pd.DataFrame(...)
df_boots_group_names = pd.DataFrame(...)
df_sandals_group_names = pd.DataFrame(...)
dfs = {'shoes': df_shoes_group_names,
'boots': df_boots_group_names,
'sandals': df_sandals_group_names}
def foo(key):
if 'shoes' in key: # <-- THIS WILL WORK
# do something with dfs[key]
I'm trying to access a key in a dictionary before "declaring" it.
Similar to this:
test_dict = {'path': '/root/secret/', 'path2': test_dict['path']+'meow/'}
I am aware that I could accomplish this by doing in the next line, like:
test_dict['path2'] = test_dict['path']+'meow'
however for readability i would prefer writing all the keys in the dict for a config file.
Is this possible in Python?
Convince yourself that this is not possible. You cannot refer to an object that hasn't even been created. What you can, however, do, is use a string variable. This should do what you want relatively easily.
p = '/root/secret/'
test_dict = {'path' : p, 'path2' : os.path.join(p, 'meow')}
Also, it's good practice to use os.path.join when concatenating sub-paths together.
#cᴏʟᴅsᴘᴇᴇᴅ, I think this is more readable, imagine if OP were to add 15 paths.
p = '/root/secret/'
# initiate dict
test_dict = {}
# assign values
test_dict['path'] = p
test_dict['path2'] = os.path.join(p, 'meow')
I have a dictionary:
big_dict = {1:"1",
2:"2",
...
1000:"1000"}
(Note: My dictionary isn't actually numbers to strings)
I am passing this dictionary into a function that calls for it. I use the dictionary often for different functions. However, on occasion I want to send in big_dict with an extra key:item pair such that the dictionary I want to send in would be equivalent to:
big_dict[1001]="1001"
But I don't want to actually add the value to the dictionary. I could make a copy of the dictionary and add it there, but I'd like to avoid the memory + CPU cycles this would consume.
The code I currently have is:
big_dict[1001]="1001"
function_that_uses_dict(big_dict)
del big_dict[1001]
While this works, it seems rather kludgy.
If this were a string I'd do:
function_that_uses_string(myString + 'what I want to add on')
Is there any equivalent way of doing this with a dictionary?
As pointed out by Veedrac in his answer, this problem has already been solved in Python 3.3+ in the form of the ChainMap class:
function_that_uses_dict(ChainMap({1001 : "1001"}, big_dict))
If you don't have Python 3.3 you should use a backport, and if for some reason you don't want to, then below you can see how to implement it by yourself :)
You can create a wrapper, similarly to this:
class DictAdditionalValueWrapper:
def __init__(self, baseDict, specialKey, specialValue):
self.baseDict = baseDict
self.specialKey = specialKey
self.specialValue = specialValue
def __getitem__(self, key):
if key == self.specialKey:
return self.specialValue
return self.baseDict[key]
# ...
You need to supply all other dict method of course, or use the UserDict as a base class, which should simplify this.
and then use it like this:
function_that_uses_dict(DictAdditionalValueWrapper(big_dict, 1001, "1001"))
This can be easily extended to a whole additional dictionary of "special" keys and values, not just single additional element.
You can also extend this approach to reach something similar as in your string example:
class AdditionalKeyValuePair:
def __init__(self, specialKey, specialValue):
self.specialKey = specialKey
self.specialValue = specialValue
def __add__(self, d):
if not isinstance(d, dict):
raise Exception("Not a dict in AdditionalKeyValuePair")
return DictAdditionalValueWrapper(d, self.specialKey, self.specialValue)
and use it like this:
function_that_uses_dict(AdditionalKeyValuePair(1001, "1001") + big_dict)
If you're on 3.3+, just use ChainMap. Otherwise use a backport.
new_dict = ChainMap({1001: "1001"}, old_dict)
You can add the extra key-value pair leaving original dictionary as such like this:
>>> def function_that_uses_bdict(big_dict):
... print big_dict[1001]
...
>>> dct = {1:'1', 2:'2'}
>>> function_that_uses_bdict(dict(dct.items()+[(1001,'1001')]))
1001
>>> dct
{1: '1', 2: '2'} # original unchanged
This is a bit annoying too, but you could just have the function take two parameters, one of them being big_dict, and another being a temporary dictionary, created just for the function (so something like fxn(big_dict, {1001,'1001'}) ). Then you could access both dictionaries without changing your first one, and without copying big_dict.