How to iteratively create a vector with different name in python - python

I have a pandas data.frame
temp = pd.DataFrame({'country':['C1','C1','C1','C1','C2','C2','C2','C2'],
'seg': ['S1','S2','S1','S2','S1','S2','S1','S2'],
'agegroup': ['1', '2', '2', '1','1', '2', '2', '1'],
'N' : [21,22,23,24,31,32,33,34]})
and a vector like
vector = ['country', 'seg']
what i want to do is to create two vectors with names vector_country and vector_seg which will contain the respective columns of the temp, in this case the columns country and seg
I have tried
for vec in vector:
'vector_' + str(vec) = temp[[vec]]
So in the end I would like to end up with two vectors:
vector_country, which will contain the temp.country and
vector_seg, which will contain the temp.seg
Is it possible to do something like that in python ?

Do not try and dynamically name variables. This is bad practice, will make your code intractable.
A better alternative is to use dictionaries, as so:
v = {}
for vec in ['country', 'seg']:
v[vec] = temp[vec].values

Related

Function, for-loop, calculate nr of cars at certain speed, 2d list

I am trying to create a function that calculates the total nr. of cars that pass a checkpoint at each specified speed (40,50, 60,...) km/h from a csv. file. I am a total novice in python and have not had much luck. I have tried to use different variations of for loops to extract the "Gällande Hastighet" (speed) column from the 2d list into a new list, but I am not allowed to use pandas. I have tried to use csv.reader and dictreader to append every 5th element into the new list but don't get any output. I have also tried using range. I have tried so many different alternatives and dont really know how to approach the question anymore. Any advice, resources, example code is appreciated.
The list looks like this:
[['MätplatsID', 'Gällande Hastighet', 'Hastighet', 'Datum', 'Tid'],
['14075010', '40', '55', '2021-09-11', '11:15:31'],
['14075010', '40', '54', '2021-09-11', '08:09:17'],
['14075010', '40', '53', '2021-09-11', '13:02:41'],
]
The end result should look like this:
There are 69 measurements where the speed is 40 km/h
My rough code so far gives no output:
import csv
def number_of_cars(kamera_data):
with open('kameraData.csv', 'r', encoding = 'UTF-8') as csvfile:
csv_reader =list(csv.reader (csvfile, delimiter = ';'))
count = 0
for i in range(len(csv_reader)):
for j in range(len(csv_reader[i])):
count +=data[i][j]
print (count)
So, there is many ways to achieve what you need.
The CSV File:
MätplatsID,Gällande Hastighet,Hastighet,Datum,Tid
14075010,40,55,2021-09-11,11:15:31
14075010,40,54,2021-09-11,08:09:17
14075010,40,53,2021-09-11,13:02:41
14075010,41,53,2021-09-11,13:02:41
14075010,41,53,2021-09-11,13:02:41
14075010,44,53,2021-09-11,13:02:41
Using DictReader, you can read the csv file, iterate over the lines, and get the measured speed stored in another dict, where you can have multiple speeds.
After this, you just need to print each of the keys stored in this new dict.
Here is a example:
import csv
filename = 'carros.csv'
qty = 0
speeds = {}
with open(filename, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# used the method .get() to get the existing value, or 0, if is the first speed
speeds[row['Gällande Hastighet']] = speeds.get(row['Gällande Hastighet'], 0) + 1
for k in speeds.keys():
print(f"There are {speeds[k]} measurements where the speed is {k} km/h.")
The result will be like this:
There are 3 measurements where the speed is 40 km/h.
There are 2 measurements where the speed is 41 km/h.
There are 1 measurements where the speed is 44 km/h.
IIUC, you want to transpose your list-of-lists and then count -
col_names, *data = lst
print(col_names)
# ['MätplatsID', 'Gällande Hastighet', 'Hastighet', 'Datum', 'Tid']
print(data)
# [['14075010', '40', '55', '2021-09-11', '11:15:31'],
# ['14075010', '40', '54', '2021-09-11', '08:09:17'],
# ['14075010', '40', '53', '2021-09-11', '13:02:41']]
transposed_data = zip(*data)
data_dict = dict()
for col_name, values in zip(col_names, transposed_data):
data_dict[col_name] = values
Output
# print(data_dict)
{'MätplatsID': ('14075010', '14075010', '14075010'),
'Gällande Hastighet': ('40', '40', '40'),
'Hastighet': ('55', '54', '53'),
'Datum': ('2021-09-11', '2021-09-11', '2021-09-11'),
'Tid': ('11:15:31', '08:09:17', '13:02:41')}
Now you can count the number of entries for a particular speed -
print(len([speed for speed in data_dict['Gällande Hastighet'] if int(speed) == 40]))
# 3

tinyDB update all occurances of a value

How do I update all instances of a value within a tinyDB?
So, for example, if I have a db like
db = TinyDB('test.json') # assume empty
db.insert({'a':'1', 'b':'2', 'c':'1'})
db.insert({'a':'2', 'b':'3', 'c':'1'})
db.insert({'a':'4', 'b':'1', 'c':'3'})
How do I update all values of '1' to say '5'?
How to update all values within say column 'c' only?
As an addition. If some of my columns contain arrays,
e.g.
db = TinyDB('test.json') # assume empty
db.insert({'a':'1', 'b':['2', '5', '1'], 'c':'1'})
db.insert({'a':'2', 'b':['3', '4', '1'], 'c':'1'})
db.insert({'a':'4', 'b':['1', '4', '4'], 'c':'3'})
how could I perform the same points above?

How to look for a string in a dataframe name but not in a column?

I would like to know how to use str.contains for a dataframe name but not in a column of the dataframe. I have a df1 with a column target that contains strings such as lung, tum, liver, etc. And I have multiples df2 which contains the specific target in their names. I would like to create a loop to do condition when the name of the df2 contains the specific target that is in df1. It would be something like :
if df2.str.contains(target in df1):
do condition
I can create a list that contains all targets with this :
target_available = df1['target'].unique().tolist()
So it would be :
if df2.str.contains(target_available):
do condition
So for my df2 called XXX_dataframe, when XXX is equal to the target that is in target_available, then do condition / when XXX is equal to another target that is in target_available, then do condition / etc.
Examples :
# Few columns of a df
# The column target can have other possibilities
Model_rad = {'model_uid': [1, 2, 3, 4, 5, 6, 7, 8],
'pathology': ['nsclc', 'nsclc', 'nsclc', 'covid', 'glioma', 'meningioma', 'gbm', 'breast-cancer'],
'purpose': ['', '', '', '', '', '', '', ''],
'target': ['lung', 'tum', 'tum', 'lung', 'tum', 'tum', 'tum', 'tum'],
'version_availability': ['v0.0', 'v1.0', '', 'v1.0', 'v1.0', 'v1.0', '', '']
}
# Creating the df
Model_rad= pd.DataFrame(Model_rad)
# Before creating target_ref_series data frames we need to get unique values from target in Model_rad
target_unique = Model_rad['target'].unique().tolist()
# Creating data frames target ref series
for target in target_unique: # For each target in model rad
# We create empty data frames target ref series from rad
vars()[target+'_ref_series_from_rad'] = pd.DataFrame()
# The data frames target_ref_series_from_rad have some columns but we don't need them for now
# for each models available in the column version availability
for models in Model_rad.version_availability:
#print(models)
# If models exist (column where models not empty so different than '')
if models != '':
print(models)
# We create the variable patho_and_target_available which is the pathology where models are not empty
patho_and_target_available = Model_rad[Model_rad['version_availability'] != '']
So at this moment the idea is like:
when target in vars()[target+'_ref_series_from_rad'] is equals to target_available in patho_and_target_available['target'] :
do condition
Thank you for your help !
I am not too sure what you mean, but if you want to check if a column name equals some string you can use df.columns.values. This gives you a list of column names.
If you want to check if the contents of a column match a specific string you could use df['column'] == 'query' to obtain a true/false series.
Furthermore, you could use this as df_query = df[df['column'] == 'query'] to get a filtered dataframe.
Might you want to store a filtered dataframe for each of your look up values, then consider a dictionary:
df_dict = {}
for val in lookup_values:
df_dict[val] = df[df['column'] == val]

Python dictionary with multiple lists in to pandas dataframe

I'm trying to get a dictionary with multiple list values and one string in to one dataframe.
Here's the information i'm trying to get in to the dataframe:
{'a': ['6449.70000', '1', '1.000'],
'b': ['6446.40000', '1', '1.000'],
'c': ['6449.80000', '0.04879000'],
'h': ['6449.90000', '6449.90000'],
'l': ['6362.00000', '6120.30000'],
'o': '6442.30000',
'p': ['6413.12619', '6353.50910'],
't': [5272, 16027],
'v': ['1299.86593468', '4658.87787321']}
The 3 values represented by key "a" all have their own names, say a1, a2 and a3 then b1, b2, and b3.. Preferably i want to define them myself. This goes for all information. So there should be 19 columns.
I've read a lot about this.
Take multiple lists into dataframe
https://pythonprogramming.net/data-analysis-python-pandas-tutorial-introduction/
http://pbpython.com/pandas-list-dict.html
Video tutorials youtube
Based on these readings i think i could iterate trough it with a for loop, build separate dataframes and then join/merge them. But that seems more work then i think is required.
What is the most efficient / readable / logic way to do this using Python 3.6?
Do the cleaning up in pure Python:
colnames = []
values = []
for key,value in d.iteritems():
if type(value) == list:
for c in range(len(value)):
colnames.append(key + str(c+1))
values += value
else:
colnames.append(key + '1')
values.append(value)
df = pd.DataFrame(values, index = colnames).T

Create multiple dataframes in loop

I have a list, with each entry being a company name
companies = ['AA', 'AAPL', 'BA', ....., 'YHOO']
I want to create a new dataframe for each entry in the list.
Something like
(pseudocode)
for c in companies:
c = pd.DataFrame()
I have searched for a way to do this but can't find it. Any ideas?
Just to underline my comment to #maxymoo's answer, it's almost invariably a bad idea ("code smell") to add names dynamically to a Python namespace. There are a number of reasons, the most salient being:
Created names might easily conflict with variables already used by your logic.
Since the names are dynamically created, you typically also end up using dynamic techniques to retrieve the data.
This is why dicts were included in the language. The correct way to proceed is:
d = {}
for name in companies:
d[name] = pd.DataFrame()
Nowadays you can write a single dict comprehension expression to do the same thing, but some people find it less readable:
d = {name: pd.DataFrame() for name in companies}
Once d is created the DataFrame for company x can be retrieved as d[x], so you can look up a specific company quite easily. To operate on all companies you would typically use a loop like:
for name, df in d.items():
# operate on DataFrame 'df' for company 'name'
In Python 2 you are better writing
for name, df in d.iteritems():
because this avoids instantiating a list of (name, df) tuples.
You can do this (although obviously use exec with extreme caution if this is going to be public-facing code)
for c in companies:
exec('{} = pd.DataFrame()'.format(c))
Adding to the above great answers. The above will work flawless if you need to create empty data frames but if you need to create multiple dataframe based on some filtering:
Suppose the list you got is a column of some dataframe and you want to make multiple data frames for each unique companies fro the bigger data frame:-
First take the unique names of the companies:-
compuniquenames = df.company.unique()
Create a data frame dictionary to store your data frames
companydict = {elem : pd.DataFrame() for elem in compuniquenames}
The above two are already in the post:
for key in DataFrameDict.keys():
DataFrameDict[key] = df[:][df.company == key]
The above will give you a data frame for all the unique companies with matching record.
Below is the code for dynamically creating data frames in loop:
companies = ['AA', 'AAPL', 'BA', ....., 'YHOO']
for eachCompany in companies:
#Dynamically create Data frames
vars()[eachCompany] = pd.DataFrame()
For difference between vars(),locals() and globals() refer to the below link:
What's the difference between globals(), locals(), and vars()?
you can do this way:
for xxx in yyy:
globals()[f'dataframe_{xxx}'] = pd.Dataframe(xxx)
The following is reproducable -> so lets say you have a list with the df/company names:
companies = ['AA', 'AAPL', 'BA', 'YHOO']
you probably also have data, presumably also a list? (or rather list of lists) like:
content_of_lists = [
[['a', '1'], ['b', '2']],
[['c', '3'], ['d', '4']],
[['e', '5'], ['f', '6']],
[['g', '7'], ['h', '8']]
]
in this special example the df´s should probably look very much alike, so this does not need to be very complicated:
dic={}
for n,m in zip(companies, range(len(content_of_lists))):
dic["df_{}".format(n)] = pd.DataFrame(content_of_lists[m]).rename(columns = {0: "col_1", 1:"col_2"})
Here you would have to use dic["df_AA"] to get to the dataframe inside the dictionary.
But Should you require more "distinct" naming of the dataframes I think you would have to use for example if-conditions, like:
dic={}
for n,m in zip(companies, range(len(content_of_lists))):
if n == 'AA':
special_naming_1 = pd.DataFrame(content_of_lists[m]).rename(columns = {0:
"col_1", 1:"col_2"})
elif n == 'AAPL':
special_naming_2 ...
It is a little more effort but it allows you to grab the dataframe object in a more conventional way by just writing special_naming_1 instead of dic['df_AA'] and gives you more controll over the dataframes names and column names if that´s important.

Categories