iterating through csv dataframe to create/assign variables

iterating through csv dataframe to create/assign variables - python

I have a csv file that will contain a frequently updated (overwritten) dataframe with a few rows of purchase orders, something like this:
uniqueId item action quantity price
123 widget1 buy 10 99.44
234 widget2 sell 15 19.99
345 widget3 buy 2 999.99
This csv file will be passed to my python code by another program; my code will check for its presence every few minutes. Once it appears, the code will read it. I'm not including the code for that, since that's not the issue.
The idea is to turn this purchase order dataframe into something that I can pass to my (already written) place-the-order code. I want to iterate through each row in order (enumerate?), and assign the values from that row to variables that I use in the order code, then reassign the new values to the same variable for the next row after the order from that row has been placed.
As I understand it, itertuples are probably the way to go for iterating through it, but I'm new enough to python that I can't figure out the actual mechanism/syntax of using it to do what I want. All my trial-and-error tests for assigning the values to reusable variables result in syntax errors.
I'm having a mental block on what is probably very basic python! I know how to iterate through the rows and print 'em out--plenty of examples out there show me how to do that--but not how to turn the data into something I can use elsewhere. Can someone walk me through an example or two that actually applies to what I'm trying to do?

Like you said, you can quite easily iterate over a dataframe with .itertuples()
Here's how I would go about it (df is your dataframe; for it I used the data from your example):
Code:
for row in df.itertuples():
print(row)
Output:
Pandas(Index=0, uniqueId=123, item='widget1', action='buy', quantity=10, price=99.44)
Pandas(Index=1, uniqueId=234, item='widget2', action='sell', quantity=15, price=19.99)
Pandas(Index=2, uniqueId=345, item='widget3', action='buy', quantity=2, price=999.999)
If you want to get specific entries of the tuples you need to use the position in the tuple as index:
Code:
for row in df.itertuples():
uniqueID = row[1]
print(uniqueID)
Output:
123
234
345
I'm not sure how the rest of your code looks like. If you have the place-the-order code inside a function you could just call the function in the for-loop after assigning the variables to your liking:
for row in df.itertuples():
uniqueID = row[1]
item = row[2]
action = row[3]
quantity = row[4]
price = row[5]
place-the-order(uniqueID, item, action, quantity, price)
(You could even skip assigning the variables and just call place-the-order(row[1], row[2], ...). In my opinion it is more readable to assign the variables.)
If your place-the-order code is not in a function I would recommend using a nested dictionary with the row index as key and a dictionary of the content of the row as value. The row index is easily accessible as it is the first item in the tuple.
content_of_rows = {}
for row in df.itertuples():
index = row[0]
uniqueID = row[1]
item = row[2]
action = row[3]
quantity = row[4]
price = row[5]
content_of_rows.update({index:{"uniqueID":uniqueID, "item": item, "action": action, "quantity": quantity, "price": price}})
print(content_of_rows)
Output:
{0: {'uniqueID': 123, 'item': 'widget1', 'action': 'buy', 'quantity': 10, 'price': 99.44},
1: {'uniqueID': 234, 'item': 'widget2', 'action': 'sell', 'quantity': 15, 'price': 19.99},
2: {'uniqueID': 345, 'item': 'widget3', 'action': 'buy', 'quantity': 2, 'price': 999.999}}
This way you can't use the same variable for every row, since it's generally speaking just a different way of writing a dataframe. You can iterate over dictionaries pretty much the same as over tuples but instead of numerical indices you have to use the key.
for row in content_of_rows:
# row is the key, so in the first iteration it would be 0, in the second iteration it would be 1, and so on
print(content_of_rows[row])
Output:
{'uniqueID': 123, 'item': 'widget1', 'action': 'buy', 'quantity': 10, 'price': 99.44}
{'uniqueID': 234, 'item': 'widget2', 'action': 'sell', 'quantity': 15, 'price': 19.99}
{'uniqueID': 345, 'item': 'widget3', 'action': 'buy', 'quantity': 2, 'price': 999.999}
If you want to get the uniqueID of the rows, you would do something like this:
Code:
for row in content_of_rows:
print(content_of_rows[row]["uniqueID"]) # just put the second key you're looking for right after the first, also in []
Output:
123
234
345
It's usually best to put different parts of your code into functions, so if you haven't done that already I'd recommend you do so. That way you can use the same variables for each row.
I hope this (kinda long, sorry about that) answer could help you. Greetings from Bavaria!

Related

How would I pull out similar value pairs from a dictionary and put them in a new dictionary?

I have a dictionary filled with 3 keys: "Ticker", "Title" and "Value". The ticker key contains 100 stock tickers that corresponds to the value of the purchase and title that are in the same position. So here is an example of the dictionary:
{'Ticker': {0: 'AKUS', 1: 'HHC', 2: 'IOVA', 3: 'DDOG'},
'Title': {0: '10%', 1: 'Dir, 10%', 2: 'Dir', 3: 'Dir'},
'Value': {0: '+$374,908,350', 1: '+$109,214,243', 2: '+$65,000,000', 3: '+$49,999,940'}}
So "AKUS" corresponds with the 10% and +$374,908,350.
I am only showing 4 items in the dictionary but my actual dictionary has 100.
My question is regarding a new dictionary that only contains tickers and values but everything in that dictionary has the same title.
For example, I want to create a 10% dictionary that contains all the tickers and values of stocks where the title contained 10%.
I know some stocks have multiple titles but I don't mind the same stocks being in more than one dictionary. Would some one be able to let me know how I should go about doing this? Thank you in advance, I have been stuck on this for a while.

Simple to do using pandas if you are OK using that; so assuming your dictionary is named d:
df = pd.DataFrame.from_dict(d)
df10 = df[df['Title'].str.contains('10%')]
print(df10)
produces
Ticker Title Value
0 AKUS 10% +$374,908,350
1 HHC Dir, 10% +$109,214,243

As you only want to get the "Ticker" and "Value" values that have a "Title" that contains "10%" in its value, you need to filter both "Ticker" and "Value" against that. You can do it verbose or using dictionary comprehension:
stocks = {'Ticker': {0: 'AKUS', 1: 'HHC', 2: 'IOVA', 3: 'DDOG'},
'Title': {0: '10%', 1: 'Dir, 10%', 2: 'Dir', 3: 'Dir'},
'Value': {0: '+$374,908,350', 1: '+$109,214,243', 2: '+$65,000,000', 3: '+$49,999,940'}
}
ten_percent_stocks = {"Ticker":{}, "Value": {}}
for k, v in stocks["Title"].items():
if "10%" in v:
ten_percent_stocks["Ticker"][k] = stocks["Ticker"][k]
ten_percent_stocks["Value"][k] = stocks["Value"][k]
With dictionary comprehension you could get the same result by doing this:
ten_percent_stocks = {"Ticker": {k: v for k, v in stocks["Ticker"].items()
if "10%" in stocks["Title"][k]},
"Value": {k: v for k, v in stocks["Value"].items()
if "10%" in stocks["Title"][k]}
}
But I'll find writing the actual for loop a bit cleaner.
The result in both cases is:
{'Ticker': {0: 'AKUS', 1: 'HHC'}, 'Value': {0: '+$374,908,350', 1: '+$109,214,243'}}
An additional change in your origin dictionary could be, as you use always the same indices, instead of storing informations in three separate dictionaries, make use of tuples, that stores stocks in the order ticker, title and value, i.e.:
stocks = {0: ('AKUS', '10%', '+$374,908,350'),
1: ('HHC', 'Dir, 10%', '+$109,214,243'),
2: ('IOVA', 'Dir', '+$65,000,000'),
3: ('DDOG', 'Dir', '+$49,999,940')
}
# Filtering stocks where title contains "10%":
ten_percent_stocks = {k: (v[0], v[2]) for k, v in stocks.items() if "10%" in v[1]}
Giving you the result as follows:
{0: ('AKUS', '+$374,908,350'), 1: ('HHC', '+$109,214,243')}

I have a dictionary filled with 3 keys: "Ticker", "Title" and "Value". The ticker key contains 100 stock tickers that corresponds to the value of the purchase and title that are in the same position.
This organization of the data makes no sense. Your data represents 100 stock tickers; a stock ticker is a coherent thing that can be represented with a dict; therefore, the data should be a list of dicts, where each dict has the 'Ticker', 'Title' and 'Value' keys and gives that data for that stock.
So, the first step is to organize the data properly; then any further manipulation is trivial.
We know how to get the necessary data for a stock with a given ID number: we access each key, and then for each of those keys we index again with the ID. So, getting the ticker for stock 0 is just all_stocks['Ticker'][0] (supposing our original dict is named all_stocks), etc. We can easily use that sort of logic to create a dictionary for stock 0. Let's wrap that logic in a function:
def make_stock_dict(all_stocks, stock_id):
return {
'Ticker': all_stocks['Ticker'][stock_id],
'Title': all_stocks['Title'][stock_id],
'Value': all_stocks['Value'][stock_id]
}
Now it's trivial to use a list comprehension to apply the function:
converted_stocks = [make_stock_dict(all_stocks, i) for i in range(100)]
For example, I want to create a 10% dictionary that contains all the tickers and values of stocks where the title contained 10%.
Same idea, now that we have properly organized data: we figure out the code that tells us whether a stock qualifies, then we apply it to each stock.
[s for s in converted_stocks if '10%' in s['Title']]

Python 3.9.5: One dictionary assignment is overwriting multiple keys [BUG?]

I am reading a .csv called courses. Each row corresponds to a course which has an id, a name, and a teacher. They are to be stored in a Dict. An example:
list_courses = {
1: {'id': 1, 'name': 'Biology', 'teacher': 'Mr. D'},
...
}
While iterating the rows using enumerate(file_csv.readlines()) I am performing the following:
list_courses={}
for idx, row in enumerate(file_csv.readlines()):
# Skip blank rows.
if row.isspace(): continue
# If we're using the row, turn it into a list.
row = row.strip().split(",")
# If it's the header row, take note of the header. Use these values for the dictionaries' keys.
# As of 3.7 a Dict remembers the order in which the keys were inserted.
# Since the order is constant, simply load each other row into the corresponding key.
if not idx:
sheet_item = dict.fromkeys(row)
continue
# Loop through the keys in sheet_item. Assign the value found in the row, converting to int where necessary.
for idx, key in enumerate(list(sheet_item)):
sheet_item[key] = int(row[idx].strip()) if key == 'id' or key == 'mark' else row[idx].strip()
# Course list
print("ADDING COURSE WITH ID {} TO THE DICTIONARY:".format(sheet_item['id']))
list_courses[sheet_item['id']] = sheet_item
print("\tADDED: {}".format(sheet_item))
print("\tDICT : {}".format(list_courses))
Thus, the list_courses dictionary is printed after each sheet_item is added to it.
Now comes the issue - when reading in two courses, I expect that list_courses should read:
list_courses = {
1: {'id': 1, 'name': 'Biology', 'teacher': 'Mr. D'},
2: {'id': 2, 'name': 'History', 'teacher': 'Mrs. P'}
}
However, the output of my print statements (substantiated by errors later in my program) is:
ADDING COURSE WITH ID 1 TO THE DICTIONARY:
ADDED: {'id': 1, 'name': 'Biology', 'teacher': 'Mr. D'}
DICT : {1: {'id': 1, 'name': 'Biology', 'teacher': 'Mr. D'}}
ADDING COURSE WITH ID 2 TO THE DICTIONARY:
ADDED: {'id': 2, 'name': 'History', 'teacher': 'Mrs. P'}
DICT : {1: {'id': 2, 'name': 'History', 'teacher': 'Mrs. P'}, 2: {'id': 2, 'name': 'History', 'teacher': 'Mrs. P'}}
Thus, the id with which the sheet_item is being added to courses_list is correct (1 or 2), however the assignment which occurs for the second course appears to be overwriting the value for key 1. I'm not even sure how this is possible. Please let me know your thoughts.

You're using the same dictionary for both the header and all the rows. You never create any new dictionaries after the header. Key assignments are overwriting previous ones, because there are no new dictionaries to write to.
Store the keys in a list, and make a new sheet_item before the for loop:
list_courses={}
keys = None # Let Python know this is defined
for idx, row in enumerate(file_csv.readlines()):
# Skip blank rows.
if row.isspace(): continue
# If we're using the row, turn it into a list.
row = row.strip().split(",")
# If it's the header row, take note of the header. Use these values for the dictionaries' keys.
# As of 3.7 a Dict remembers the order in which the keys were inserted.
# Since the order is constant, simply load each other row into the corresponding key.
if not idx:
keys = row
continue
sheet_item = {}
# Loop through the keys in sheet_item. Assign the value found in the row, converting to int where necessary.
for idx, key in enumerate(keys):
sheet_item[key] = int(row[idx].strip()) if key == 'id' or key == 'mark' else row[idx].strip()
# Course list
print("ADDING COURSE WITH ID {} TO THE DICTIONARY:".format(sheet_item['id']))
list_courses[sheet_item['id']] = sheet_item
print("\tADDED: {}".format(sheet_item))
print("\tDICT : {}".format(list_courses))

Get data from list of API response

price_list = [{'symbol': 'ETHBTC', 'lastPrice': '0.03574700'}, {'symbol': 'BTCUSDT', 'lastPrice': '57621.08000000'}]
print(price_list[1]['lastPrice']) # index = 1 for BTCUSDT, print 57621.08000000 => OK.
I need to get lastPrice for BTCUSDT.
Currently I can get it by index.
However, is it possible to get it by referring symbol?

Given you only have two keys which equate to identifier and value.
You could create a dict and flatten your price_list into a key value pair of symbol: lastPrice making it much easier to access the price you need via dict[symbol]
data = {k['symbol']: float(k['lastPrice']) for k in price_list}
data
#{'ETHBTC': 0.035747, 'BTCUSDT': 57621.08}
data['BTCUSDT']
#57621.08

how to normalize this below json using panda in django

using this view.py query my output is showing something like this. you can see in choices field there are multiple array so i can normalize in serial wise here is my json
{"pages":[{"name":"page1","title":"SurveyWindow Pvt. Ltd. Customer Feedback","description":"Question marked * are compulsory.",
"elements":[{"type":"radiogroup","name":"question1","title":"Do you like our product? *","isRequired":true,
"choices":[{"value":"Yes","text":"Yes"},{"value":"No","text":"No"}]},{"type":"checkbox","name":"question2","title":"Please Rate Our PM Skill","isRequired":false,"choices":[{"value":"High","text":"High"},{"value":"Low","text":"Low"},{"value":"Medium","text":"Medium"}]},{"type":"radiogroup","name":"question3","title":"Do you like our services? *","isRequired":true,"choices":[{"value":"Yes","text":"Yes"},{"value":"No","text":"No"}]}]}]}
this is my view.py
jsondata=SurveyMaster.objects.all().filter(survey_id='1H2711202014572740')
q = jsondata.values('survey_json_design')
qs_json = pd.DataFrame.from_records(q)
datatotable = pd.json_normalize(qs_json['survey_json_design'], record_path=['pages','elements'])
qs_json = datatotable.to_html()

Based on your comments and picture here's what I would do to go from the picture to something more SQL-friendly (what you refer to as "normalization"), but keep in mind this might blow up if you don't have sufficient memory.
Create a new list which you'll fill with the new data, then iterate over the pandas table's rows, and then over every item in your list. For every iteration in the inner loop use the data from the row (minus the column you're iteration over). For convenience I added it as the last element.
# Example data
df = pd.DataFrame({"choices": [[{"text": "yes", "value": "yes"},
{"text": "no", "value": "no"}],
[{"ch1": 1, "ch2": 2}, {"ch3": "ch3"}]],
"name": ["kostas", "rajesh"]})
data = []
for i, row in df.iterrows():
for val in row["choices"]:
data.append((*row.drop("choices").values, val))
df = pd.DataFrame(data, columns=["names", "choices"])
print(df)
names choices
0 kostas {'text': 'yes', 'value': 'yes'}
1 kostas {'text': 'no', 'value': 'no'}
2 george {'ch1': 1, 'ch2': 2}
3 george {'ch3': 'ch3'}
This is where I guess you want to go. All that's left is to just modify the column / variable names with your own data.

How to extract specific values from a list of dictionaries in python

I have a list of dictionaries like shown below and i would like to extract the partID and the corresponding quantity for a specific orderID using python, but i don't know how to do it.
dataList = [{'orderID': 'D00001', 'customerID': 'C00001', 'partID': 'P00001', 'quantity': 2},
{'orderID': 'D00002', 'customerID': 'C00002', 'partID': 'P00002', 'quantity': 1},
{'orderID': 'D00003', 'customerID': 'C00003', 'partID': 'P00001', 'quantity': 1},
{'orderID': 'D00004', 'customerID': 'C00004', 'partID': 'P00003', 'quantity': 3}]
So for example, when i search my dataList for a specific orderID == 'D00003', i would like to receive both the partID ('P00001'), as well as the corresponding quantity (1) of the specified order. How would you go about this? Any help is much appreciated.

It depends.
You are not going to do that a lot of time, you can just iterate over the list of dictionaries until you find the "correct" one:
search_for_order_id = 'D00001'
for d in dataList:
if d['orderID'] == search_for_order_id:
print(d['partID'], d['quantity'])
break # assuming orderID is unique
Outputs
P00001 2
Since this solution is O(n), if you are going to do this search a lot of times it will add up.
In that case it will be better to transform the data to a dictionary of dictionaries, with orderID being the outer key (again, assuming orderID is unique):
better = {d['orderID']: d for d in dataList}
This is also O(n) but you pay it only once. Any subsequent lookup is an O(1) dictionary lookup:
search_for_order_id = 'D00001'
print(better[search_for_order_id]['partID'], better[search_for_order_id]['quantity'])
Also outputs
P00001 2

I believe you would like to familiarize yourself with the pandas package, which is very useful for data analysis. If these are the kind of problems you're up against, I advise you to take the time and take a tutorial in pandas. It can do a lot, and is very popular.
Your dataList is very similar to a DataFrame structure, so what you're looking for would be as simple as:
import pandas as pd
df = pd.DataFrame(dataList)
df[df['orderID']=='D00003']

You can use this:
results = [[x['orderID'], x['partID'], x['quantity']] for x in dataList]
for i in results:
print(i)
Also,
results = [['Order ID: ' + x['orderID'], 'Part ID: ' + x['partID'],'Quantity:
' + str(x['quantity'])] for x in dataList]

To get the partID you can make use of the filter function.
myData = [{"x": 1, "y": 1}, {"x": 2, "y": 5}]
filtered = filter(lambda item: item["x"] == 1) # Search for an object with x equal to 1
# Get the next item from the filter (the matching item) and get the y property.
print(next(filtered)["y"])
You should be able to apply this to your situation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

iterating through csv dataframe to create/assign variables - python

Related

How would I pull out similar value pairs from a dictionary and put them in a new dictionary?

Python 3.9.5: One dictionary assignment is overwriting multiple keys [BUG?]

Get data from list of API response

how to normalize this below json using panda in django

How to extract specific values from a list of dictionaries in python

Categories

Resources