Save dataframes to multiple CSVs retaining dataframe name - python

How can I export multiple dataframes to CSVs that have the same title, in general code?
I tried:
dframes_list = [economy, finance, language]
for i, df in enumerate(dframes_list, 1):
filename_attempt1 = "{}.csv".format(i)
filename_attempt2= f"{i}.csv"
df.to_save(filename_attempt2)
Expected Output:
file saved: "economy.csv"
file saved: "finance.csv"
file saved: "language.csv"

I think in python is strongly not recommended create strings variables, because then generate strings is not trivial.
Then best is create another list for names in strings and use zip:
dframes_list = [economy, finance, language]
names = ['economy','finance','language']
for i, df in zip(names, dframes_list):
filename_attempt1 = "df_{}.csv".format(i)
Another idea is create dict of DataFrames:
dframes_dict = {'economy': economy, 'finance': finance, 'language': language}
for i, df in dframes_dict.items():
filename_attempt1 = "df_{}.csv".format(i)
If need working with dict of DataFrames use:
for k, v in dframes_dict.items():
v = v.set_index('date')
#another code for processing each DataFrame
dframes_dict[k] = v

If you're doing this on a notebook, you can use a hack where you search the locals() and you could use regex to match 'dframes_list = [.+]` that should return a string value
'dframes_list = [economy, finance, language]'
which then you can do replacing until you get to 'economy, finance, language' at which point you can split and get a list.
A colab version works like this,
temp_local = dict(locals())
data = {}
for k,v in temp_local.items():
try:
if re.match('dframes_list = \[.+\]', v):
data[k] = v
print(k, v)
except:
pass
then,
names = re.findall('\[.+\]', data[key])[0].replace('[', '').replace(']', '').split(',')
where key has been identified from the data dict.
This isn't recommended tho.

Related

Looking for a better way do accomplish dataframe to dictionary by series

Here's a portion of what the Excel file looks like. Meant to include this the first time. Thanks for the help so far.
Name Phone Number Carrier
FirstName LastName1 3410142531 Alltel
FirstName LastName2 2437201754 AT&T
FirstName LastName3 9247224091 Boost Mobile
FirstName LastName4 6548310018 Cricket Wireless
FirstName LastName5 8811620411 Project Fi
I am converting a list of names, phone numbers, and carriers to a dictionary for easy reference by other code. The idea is separate code will be able to call a name and access that person's phone number and carrier.
I got the output I need, but I'm wondering if there were an easier way I could have accomplished this task and get the same output. Though it's fairly concise, I'm interested in any module or built in of which I'm not aware. My python skills are beginner at best. I wrote this in Thorny with Python 3.6.4. Thanks!
#Imports
import pandas as pd
import math
# Assign spreadsheet filename to `file`
file = 'Phone_Numbers.xlsx'
# Load spreadsheets
xl = pd.ExcelFile(file)
# Load a sheet into a DataFrame by name: df1
df1 = xl.parse('Sheet1', header=0)
# Put the dataframe into a dictionary to start
phone_numbers = df1.to_dict(orient='records')
# Converts PhoneNumbers.xlsx to a dictionary
x=0
temp_dict = {}
for item in phone_numbers:
temp_list = []
for key in phone_numbers[x]:
tempholder = phone_numbers[x][key]
if (isinstance(tempholder, float) or isinstance(tempholder, int)) and math.isnan(tempholder) == False: # Checks to see if there is a blank and if the phone number comes up as a float
# Converts any floats to string for use in later code
tempholder = str(int(tempholder))
else:
pass
temp_list.append(tempholder)
temp_dict[temp_list[0]] = temp_list[1:] # Makes the first item in the list the key and add the rest as values
x += 1
print(temp_dict)
Here's the desired output:
{'FirstName LastName1': ['3410142531', 'Alltel'], 'FirstName LastName2': [2437201754, 'AT&T'], 'FirstName LastName3': [9247224091, 'Boost Mobile'], 'FirstName LastName4': [6548310018, 'Cricket Wireless'], 'FirstName LastName5': [8811620411, 'Project Fi']
One way to do it would be to iterate through the dataframe and use a dictionary comprehension:
temp_dict = {row['Name']:[row['Phone Number'], row['Carrier']] for _, row in df.iterrows()}
where df is your original dataframe (the result of xl.parse('Sheet1', header=0)). This basically iterates through all rows in your dataframe, creating a dictionary key for each Name, with Phone number and carrier as it's values (in a list), as you indicated in your output.
To make sure that your phone number is not null (as you did in your loop), you could add an if clause to your dict comprehension, such as this:
temp_dict = {row['Name']:[row['Phone Number'], row['Carrier']]
for _, row in df.iterrows()
if not math.isnan(row['Phone Number'])}
df.set_index('Name').T.to_dict('list')
should do the job ,Here df is your dataframe

How to create a hierarchical dictionary from a csv?

I am trying to build a hierarchical dict (please see below the desired output I am looking for) from my csv file.
The following is my code so far, I was searching through itertools possibly I think that's the best tool I need for this task. I cannot use pandas. I think I need to maybe put the values of the key into a new dictionary and then try to map the policy interfaces and build a new dict?
import csv
import pprint
from itertools import groupby
new_dict=[]
with open("test_.csv", "rb") as file_data:
reader = csv.DictReader(file_data)
for keys, grouping in groupby(reader, lambda x: x['groupA_policy']):
new_dict.append(list(grouping))
pprint.pprint(new_dict)
My csv file looks like this:
GroupA_Host,groupA_policy,groupA_policy_interface,GroupB_Host,GroupB_policy,GroupB_policy_interface
host1,policy10,eth0,host_R,policy90,eth9
host1,policy10,eth0.1,host_R,policy90,eth9.1
host2,policy20,eth2,host_Q,policy80,eth8
host2,policy20,eth2.1,host_Q,policy80,eth8.1
The desired output I want achieve is this:
[{'GroupA_Host': 'host1',
'GroupB_Host': 'host_R',
'GroupB_policy': 'policy90',
'groupA_policy': 'policy10',
'interfaces': [{'GroupB_policy_interface': 'eth9',
'group_a_policy_interfaces': 'eth0'},
{'GroupB_policy_interface': 'eth9.1',
'group_a_policy_interface': 'eth0.1'}]},
{'GroupA_host': 'host2',
'GroupB_Host': 'host_Q',
'GroupB_policy': 'policy80',
'groupA_policy': 'policy20',
'interfaces': [{'GroupB_policy_interface': 'eth8',
'groupA_policy_interfaces': 'eth2'},
{'groupA_policy_interface': 'eth8.1',
'groupA_policy_interfaces': 'eth2.1'}]}]
I don't think itertools is necessary here. The important thing is to recognize that you're using ('GroupA_Host', 'GroupB_Host', 'groupA_policy', 'GroupB_policy') as the key for the grouping -- so you can use a dictionary to collect interfaces keyed on this key:
d = {}
for row in reader:
key = row['GroupA_Host'], row['GroupB_Host'], row['groupA_policy'], row['GroupB_policy']
interface = {'groupA_policy_interface': row['groupA_policy_interface'],
'GroupB_policy_interface': row['GroupB_policy_interface']
}
if key in d:
d[key].append(interface)
else:
d[key] = [interface]
as_list = []
for key, interfaces in d.iteritems():
record = {}
record['GroupA_Host'] = key[0]
record['GroupB_Host'] = key[1]
record['groupA_policy'] = key[2]
record['GroupB_policy'] = key[3]
record['interfaces'] = interfaces
as_list.append(record)

Reading a csv file and counting a row depending on another row

I have a csv file where i need to read different columns and sum their numbers up depending on another row in the dataset.
The question is:
How do the flight phases (ex. take off, cruise, landing..) contribute
to fatalities?
I have to sum up column number 23 for each different data in column 28.
I have a solution with masks and a lot of IF statements:
database = pd.read_csv('Aviation.csv',quotechar='"',skipinitialspace=True, delimiter=',', encoding='latin1').fillna(0)
data = database.as_matrix()
TOcounter = 0
for r in data:
if r[28] == "TAKEOFF":
TOcounter += r[23]
print(TOcounter)
This example shows the general idea of my solution. Where i would have to add a lot of if statements and counters for every different data in column 28.
But i was wondering if there is a smarter solution to the issue.
The raw data can be found at: https://raw.githubusercontent.com/edipetres/Depressed_Year/master/Dataset_Assignment/AviationDataset.csv
It sounds like what you are trying to achieve is
df.groupby('Broad.Phase.of.Flight')['Total.Fatal.Injuries'].sum()
This is a quick solution, not checking for errors like if can convert a string for float. Also you should think about in searching for the right column(with text) instead of reliing on the column index (like 23 and 28)
but this should work:
import csv
import urllib2
import collections
url = 'https://raw.githubusercontent.com/edipetres/Depressed_Year/master/Dataset_Assignment/AviationDataset.csv'
response = urllib2.urlopen(url)
df = csv.reader(response)
d = collections.defaultdict(list)
for i,row in enumerate(df):
key = row[28]
if key == "" or i == 0 : continue
val = 0 if(row[23]) =="" else float(row[23])
d.setdefault(key,[]).append(val)
d2 = {}
for k, v in d.iteritems(): d2[k] = sum(v)
for k, v in d2.iteritems(): print "{}:{}".format(k,v)
Result:
TAXI:110.0
STANDING:193.0
MANEUVERING:6430.0
DESCENT:1225.0
UNKNOWN:919.0
TAKEOFF:5267.0
LANDING:592.0
OTHER:107.0
CRUISE:6737.0
GO-AROUND:783.0
CLIMB:1906.0
APPROACH:4493.0

Python reading Excel spreadsheet, creating multiple lists according to variables and conditions

Hi there’s an Excel spreadsheet showing Product ID and Location, like below.
I want to list all the locations of each product ID in sequence with no duplication.
For example:
53424 has Phoenix, Matsuyama, Phoenix, Matsuyama, Phoenix, Matsuyama, Phoenix.
56224 has Samarinda, Boise. Seoul.
etc.
What's the best way to achieve it with Python?
I can only read the cells in the spreadsheet but have no idea what’s good to proceed.
Thank you.
the_file = xlrd.open_workbook("C:\\excel file.xlsx")
the_sheet = the_file.sheet_by_name("Sheet1")
for row_index in range(0, the_sheet.nrows):
product_id = the_sheet.cell(row_index, 0).value
location = the_sheet.cell(row_index, 1).value
You need to make use of Python's groupby() function to take away the duplicates as follows:
from collections import defaultdict
from itertools import groupby
import xlrd
the_file = xlrd.open_workbook(r"excel file.xlsx")
the_sheet = the_file.sheet_by_name("Sheet1")
products = defaultdict(list)
for row_index in range(1, the_sheet.nrows):
products[int(the_sheet.cell(row_index, 0).value)].append(the_sheet.cell(row_index, 1).value)
for product, v in sorted(products.items()):
print "{} has {}.".format(product, ', '.join(k for k, g in groupby(v)))
This uses a defaultlist() with a dictionary to build your products. So each key in the dictionary holds your product ID and the contents is automatically a list of the matching entries. Finally the groupby() is used to read out each raw value and only give you one entry for the cases where there are consecutive identically values. Finally the list this produces is joined together with commas between them.
You should use a dictionary to store the data from excel and then traverse it according to product ID.
So, following code should help you out -
the_file = xlrd.open_workbook("C:\\excel file.xlsx")
the_sheet = the_file.sheet_by_name("Sheet1")
dataset = dict()
for row_index in range(0, the_sheet.nrows):
product_id = the_sheet.cell(row_index, 0).value
location = the_sheet.cell(row_index, 1).value
if product_id in dataset:
dataset[product_id].append(location)
else:
dataset[product_id] = [location]
for product_id in sorted(dataset.keys()):
print "{0} has {1}.".format(product_id, ", ".join(dataset[product_id]))
Above will preserve the order of locations as per product_id (in sequence).

Most elegant way to break CSV columns into separate data structures using Python?

I'm trying to pick up Python. As part of the learning process I'm porting a project I wrote in Java to Python. I'm at a section now where I have a list of CSV headers of the form:
headers = [a, b, c, d, e, .....]
and separate lists of groups that these headers should be broken up into, e.g.:
headers_for_list_a = [b, c, e, ...]
headers_for_list_b = [a, d, k, ...]
. . .
I want to take the CSV data and turn it into dict's based on these groups, e.g.:
list_a = [
{b:val_1b, c:val_1c, e:val_1e, ... },
{b:val_2b, c:val_2c, e:val_2e, ... },
{b:val_3b, c:val_3c, e:val_3e, ... },
. . .
]
where for example, val_1b is the first row of the 'b' column, val_3c is the third row of the 'c' column, etc.
My first "Java instinct" is to do something like:
for row in data:
for col_num, val in enumerate(row):
col_name = headers[col_num]
if col_name in group_a:
dict_a[col_name] = val
elif headers[col_cum] in group_b:
dict_b[col_name] = val
...
list_a.append(dict_a)
list_b.append(dict_b)
...
However, this method seems inefficient/unwieldy and doesn't posses the elegance that Python programmers are constantly talking about. Is there a more "Zen-like" way I should try- keeping with the philosophy of Python?
Try the CSV module of Python, in particular the DictReader class.
csv.DictReader
import csv
groups = dict(a=headers_for_list_a, b=headers_for_list_b)
lists = dict((name, []) for name in groups)
for row in csv.DictReader(csvfile, fieldnames=headers):
for name, grp_headers in groups.items():
lists[name].append(dict((header, row[header]) for header in grp_headers))
Not necessary the most pythonic way to achieve the same thing as your code, but this version of your code is somewhat more concise due to the use of generator expressions:
from itertools import izip
for row in data:
dict_a = dict((col_name, val) for col_name, val in izip(headers, row) \
if col_name in group_a)
dict_b = dict((col_name, val) for col_name, val in izip(headers, row) \
if col_name in group_b)
list_a.append(dict_a)
list_b.append(dict_b)
Also, use sets for group_a and group_b instead of lists - the in operator works faster on sets. But Jason Humber is right, DictReader is way more elegant, see the following version:
from csv import DictReader
for row in DictReader(your_file, headers):
dict_a = dict((k, row[k]) for k in group_a)
dict_b = dict((k, row[k]) for k in group_b)
list_a.append(dict_a)
list_b.append(dict_b)

Categories