After looking around for about a week, I have been unable to find an answer that I can get to work. I am making an assignment manager for a project for my first year CS class. Everything else works how I'd like it to (no GUI, just text) except that I cannot save data to use each time you reopen it. Basically, I would like to save my classes dictionary:
classes = {period_1:assignment_1, period_2:assignment_2, period_3:assignment_3, period_4:assignment_4, period_5:assignment_5, period_6:assignment_6, period_7:assignment_7}
after the program closes so that I can retain the data stored in the dictionary. However, I cannot get anything I have found to work. Again, this is a beginner CS class, so I don't need anything fancy, just something basic that will work. I am using a school-licensed form of Canopy for the purposes of the class.
L3viathan's post might be direct answer to this question, but I would suggest the following for your purpose: using pickle.
import pickle
# To save a dictionary to a pickle file:
pickle.dump(classes, open("assignments.p", "wb"))
# To load from a pickle file:
classes = pickle.load(open("assignments.p", "rb"))
By this method, the variable would retain its original structure without having to write and convert to different formats manually.
Either use the csv library, or do something simple like:
with open("assignments.csv", "w") as f:
for key, value in classes.items():
f.write(key + "," + value + "\n")
Edit: Since it seems that you can't read or write files in your system, here's an alternative solution (with pickle and base85):
import pickle, base64
def save(something):
pklobj = pickle.dumps(something)
print(base64.b85encode(pklobj).decode('utf-8'))
def load():
pklobj = base64.b85decode(input("> ").encode('utf-8'))
return pickle.loads(pklobj)
To save something, you call save on your object, and copy the string that is printed to your clipboard, then you can save it in a file, for instance.
>>> save(classes) # in my case: {34: ['foo#', 3]}
fCGJT081iWaRDe;1ONa4W^ZpJaRN&NWpge
To load, you call load() and enter the string:
>>> load()
> fCGJT081iWaRDe;1ONa4W^ZpJaRN&NWpge
{34: ['foo#', 3]}
The pickle approach described by #Ébe Isaac and #L3viathan is the way to go. In case you also want to do something else with the data, you might want to consider pandas (which you should only use IF you do something else than just exporting the data).
As there are only basic strings in your dictionary according to your comment below your question, it is straightforward to use; if you have more complicated data structures, then you should use the pickle approach:
import pandas as pd
classes = {'period_1':'assignment_1', 'period_2':'assignment_2', 'period_3':'assignment_3', 'period_4':'assignment_4', 'period_5':'assignment_5', 'period_6':'assignment_6', 'period_7':'assignment_7'}
pd.DataFrame.from_dict(classes, orient='index').sort_index().rename(columns={0: 'assignments'}).to_csv('my_csv.csv')
That gives you the following output:
assignments
period_1 assignment_1
period_2 assignment_2
period_3 assignment_3
period_4 assignment_4
period_5 assignment_5
period_6 assignment_6
period_7 assignment_7
In detail:
.from_dict(classes, orient='index') creates the actual dataframe using the dictionary as in input
.sort_index() sorts the index which is not sorted as you use a dictionary for the creation of the dataframe
.rename(columns={0: 'assignments'}) that just assigns a more reasonable name to your column (by default '0' is used)
.to_csv('my_csv.csv') that finally exports the dataframe to a csv
If you want to read in the file again you can do it as follows:
df2 = pd.read_csv('my_csv.csv', index_col=0)
Related
Let's say we have a dictionary
import numpy as np
d={}
d["s0"]=3
d["s1"]=np.int16(3)
d["s2"]=np.array("hello")
d["s3"]=np.array([2])
d["s4"]=np.linspace(0,2, 3)
One way to save this dictionary is to use json. Which means serializing and storing the data as a list. In this case there can be loss of precision.
Another way is to convert this into a pandas DataFrame and save that to hdf:
import pandas as pd
df=pd.DataFrame(d)
#dd.io.save("test.h5", d)
store = pd.HDFStore('store.h5')
store["data"]=df
But this failed. I get:
ValueError: arrays must all be same length
A yet third way is to use deepdish:
dd.io.save("test.h5", d)
The problem with this method was it wants my keys to be strings and misses key data without throwing up error:
$h5ls test.h5
s3 Dataset {1}
s4 Dataset {3}
Note that "s0", "s1" and "s2" were not saved to the file and no error was reported. So what is the safest way to store a python dictionary to an hdf file?
I don't want to use pickle dump because it will be hard to read back in Fortran. This question is not a duplicate of this question because it shown how those methods failed to store needed data.
So this isn't a cutting-edge answer that will impress any uber-geeks, but if you need portability, I'd suggest an INI file. There isn't a language or platform in existence that can't read or write it.
Here's a FORTRAN library for handling them.
https://github.com/szaghi/FiNeR
I'm working with one script that dumps a pandas series to a yaml file:
with open('ex.py','w') as f:
yaml.dump(a_series,f)
And then another script that opens the yaml file for the pandas series:
with open('ex.py','r') as f:
yaml.safe_load(a_series,f)
I'm trying to safe_load the series but I get a constructor error. How can I specify that the pandas series is safe to load?
When you use PyYAML's load, you specify that everything in the YAML document you are loading is safe. That is why you need to use yaml.safe_load.
In your case this leads to an error, because safe_load doesn't know how to construct pandas internals that have tags in the YAML document like:
!!python/name:pandas.core.indexes.base.Index
and
!!python/tuple
etc.
You would need to provide constructors for all the objects, add these to the SafeLoader and then do a_series = yaml.load(f).
Doing that can be a lot of work, especially since what looks like a small change to the data used in your series might require you to add constructors.
You could dump the dict representation of your Series and load that back. Of course some information is lost in this process, I am not sure if that is acceptable:
import sys
import yaml
from pandas import Series
def series_representer(dumper, data):
return dumper.represent_mapping(u'!pandas.series', data.to_dict())
yaml.add_representer(Series, series_representer, Dumper=yaml.SafeDumper)
def series_constructor(loader, node):
d = loader.construct_mapping(node)
return Series(data)
yaml.add_constructor(u'!pandas.series', series_constructor, Loader=yaml.SafeLoader)
data = Series([1,2,3,4,5], index=['a', 'b', 'c', 'd', 'e'])
with open('ex.yaml', 'w') as f:
yaml.safe_dump(data, f)
with open('ex.yaml') as f:
s = yaml.safe_load(f)
print(s)
print(type(s))
which gives:
a 1
b 2
c 3
d 4
e 5
dtype: int64
<class 'pandas.core.series.Series'>
And the ex.yaml file contains:
!pandas.series {a: 1, b: 2, c: 3, d: 4, e: 5}
There are a few things to note:
YAML documents are normally written to files with a .yaml extension. Using .py is bound to get you confused, or have you overwrite some program source files at some point.
yaml.load() and yaml.safe_load() take a stream as first paramater you use them like:
data = yaml.safe_load(stream)
and not like:
yaml.safe_load(data, stream)
It would be better to have a two step constructor, which allows you to construct self referential data structures. However Series.append() doesn't seem to work for that:
def series_constructor(loader, node):
d = Series()
yield d
d.append(Series(loader.construct_mapping(node)))
If dumping the Series via a dictionary is not good enough (because it simplifies the series' data), and if you don't care about the readability of the YAML generated, you can instead of .to_dict() use to to_pickle() but you would have to work with temporary files, as that method is not flexible enough to handle file like objects and expects a file name string as argument.
I am working on django project.where user can upload a csv file and stored into database.Most of the csv file i saw 1st row contain header and then under the values but my case my header presents on column.like this(my csv data)
I did not understand how to save this type of data on my django model.
You can transpose your data. I think it is more appropriate for your dataset in order to do real analysis. Usually things such as id values would be the row index and the names such company_id, company_name, etc would be the columns. This will allow you to do further analysis (mean, std, variances, ptc_change, group_by) and use pandas at its fullest. Thus said:
import pandas as pd
df = pd.read_csv('yourcsvfile.csv')
df2 = df.T
Also, as #H.E. Lee pointed out. In order to save your model to your database, you can either use the method to_sql in your dataframe to save in mysql (e.g. your connection), if you're using mongodb you can use to_json and then import the data, or you can manually set your function transformation to your database.
You can flip it with the built-in CSV module quite easily, no need for cumbersome modules like pandas (which in turn requires NumPy...)... Since you didn't define the Python version you're using, and this procedure differs slightly between the versions, I'll assume Python 3.x:
import csv
# open("file.csv", "rb") in Python 2.x
with open("file.csv", "r", newline="") as f: # open the file for reading
data = list(map(list, zip(*csv.reader(f)))) # read the CSV and flip it
If you're using Python 2.x you should also use itertools.izip() instead of zip() and you don't have to turn the map() output into a list (it already is).
Also, if the rows are uneven in your CSV you might want to use itertools.zip_longest() (itertools.izip_longest() in Python 2.x) instead.
Either way, this will give you a 2D list data where the first element is your header and the rest of them are the related data. What you plan to do from there depends purely on your DB... If you want to deal with the data only, just skip the first element of data when iterating and you're done.
Given your data it may be best to store each row as a string entry using TextField. That way you can be sure not to lose any structure going forward.
I'm attempting to convert a JSON file to an SQLite or CSV file so that I can manipulate the data with python. Here is where the data is housed: JSON File.
I found a few converters online, but those couldn't handle the quite large JSON file I was working with. I tried using a python module called sqlbiter but again, like the others, was never really able to output or convert the file.
I'm not. sure where to go now, if anyone has any recommendations or insights on how to get this data into a database, I'd really appreciate it.
Thanks in advance!
EDIT: I'm not looking for anyone to do it for me, I just need to be pointed in the right direction. Are there other methods I haven't tried that I could learn?
You can utilize pandas module for this data processing task as follows:
First, you need to read the JSON file using with, open and json.load.
Second, you need to change the format of your file a bit by changing the large dictionary that has a main key for every airport into a list of dictionaries instead.
Third, you can now utilize some pandas magic to convert your list of dictionaries into a DataFrame using pd.DataFrame(data=list_of_dicts).
Finally, you can utilize pandas's to_csv function to write your DataFrame as a CSV file into disk.
It would look something like this:
import pandas as pd
import json
with open('./airports.json.txt','r') as f:
j = json.load(f)
l = list(j.values())
df = pd.DataFrame(data=l)
df.to_csv('./airports.csv', index=False)
You need to load your json file and parse it to have all the fields available, or load the contents to a dictionary, then you could using pyodbc to write to the database these fields, or write them to the csv if you use import csv first.
But this is just a general idea. You need to study python and how to do every step.
For instance for writting to the database you could do something like:
for i in range(0,max_len):
sql_order = "UPDATE MYTABLE SET MYTABLE.MYFIELD ...."
cursor1.execute(sql_order)
cursor1.commit()
In the sci-kit learn python library there are many datasets accessed easily by the following commands:
for example to load the iris dataset:
iris=datasets.load_iris()
And we can now assign data and target/label variables as follows:
X=iris.data # assigns feature dataset to X
Y=iris.target # assigns labels to Y
My question is how to create my own data dictionary using my own data either in csv, xml or any other format into something similar above so data can be called easily and features/labels are easily accessed.
Is this possible? someone help me!!
By the way I am using the spyder (anaconda) platform by continuum.
Thanks!
I see at least two (easy) solutions to your problem.
First, you can store your data in whichever structure you like.
# Storing in a list
my_list = []
my_list.append(iris.data)
my_list[0] # your data
# Storing in a dictionary
my_dict = {}
my_dict["data"] = iris.data
my_dict["data"] # your data
Or, you can create your own class:
Class MyStructure:
def __init__(data, target):
self.data = data
self.target = target
my_class = MyStructure(iris.data, iris.target)
my_class.data # your data
Hope it helps
If ALL you want to do is read data from csv files and have them organized , I would recommend you to simply use either pandas or numpy's genfromtxt function.
mydata=numpy.genfromtxt(filepath,*params)
If the CSV is formatted regularly, you can extract for example the names of each column by specifying:
mydata=numpy.genfromtxt(filepath,unpack=True,names=True,delimiter=',')
then you can access any column data you want by simply typing it's name/header:
mydata['your header']
(Pandas also has a similar convenient way of grabbing data in an organized manner from CSV or similar files.)
However if you want to do it the long way and learn:
Simply, you want to write a class for the data that you are using, complete with its own access, modify, read, #dosomething functions. Instead of code for this, I think you would benefit more from going in and reading for example the iris class, or an introduction to a simple Class from any beginners guide to object based programming.
To do what you want, for an object MyData, you could have for example
read(#file) function that reads from a given file of some expected format and returns some specified structure. For reading from csv files, you can simply use numpy's loadtxt method.
modify(#some attribute)
etc.