How to store complex csv data in django? - python

I am working on django project.where user can upload a csv file and stored into database.Most of the csv file i saw 1st row contain header and then under the values but my case my header presents on column.like this(my csv data)
I did not understand how to save this type of data on my django model.

You can transpose your data. I think it is more appropriate for your dataset in order to do real analysis. Usually things such as id values would be the row index and the names such company_id, company_name, etc would be the columns. This will allow you to do further analysis (mean, std, variances, ptc_change, group_by) and use pandas at its fullest. Thus said:
import pandas as pd
df = pd.read_csv('yourcsvfile.csv')
df2 = df.T
Also, as #H.E. Lee pointed out. In order to save your model to your database, you can either use the method to_sql in your dataframe to save in mysql (e.g. your connection), if you're using mongodb you can use to_json and then import the data, or you can manually set your function transformation to your database.

You can flip it with the built-in CSV module quite easily, no need for cumbersome modules like pandas (which in turn requires NumPy...)... Since you didn't define the Python version you're using, and this procedure differs slightly between the versions, I'll assume Python 3.x:
import csv
# open("file.csv", "rb") in Python 2.x
with open("file.csv", "r", newline="") as f: # open the file for reading
data = list(map(list, zip(*csv.reader(f)))) # read the CSV and flip it
If you're using Python 2.x you should also use itertools.izip() instead of zip() and you don't have to turn the map() output into a list (it already is).
Also, if the rows are uneven in your CSV you might want to use itertools.zip_longest() (itertools.izip_longest() in Python 2.x) instead.
Either way, this will give you a 2D list data where the first element is your header and the rest of them are the related data. What you plan to do from there depends purely on your DB... If you want to deal with the data only, just skip the first element of data when iterating and you're done.

Given your data it may be best to store each row as a string entry using TextField. That way you can be sure not to lose any structure going forward.

Related

Converting JSON file to SQLITE or CSV

I'm attempting to convert a JSON file to an SQLite or CSV file so that I can manipulate the data with python. Here is where the data is housed: JSON File.
I found a few converters online, but those couldn't handle the quite large JSON file I was working with. I tried using a python module called sqlbiter but again, like the others, was never really able to output or convert the file.
I'm not. sure where to go now, if anyone has any recommendations or insights on how to get this data into a database, I'd really appreciate it.
Thanks in advance!
EDIT: I'm not looking for anyone to do it for me, I just need to be pointed in the right direction. Are there other methods I haven't tried that I could learn?
You can utilize pandas module for this data processing task as follows:
First, you need to read the JSON file using with, open and json.load.
Second, you need to change the format of your file a bit by changing the large dictionary that has a main key for every airport into a list of dictionaries instead.
Third, you can now utilize some pandas magic to convert your list of dictionaries into a DataFrame using pd.DataFrame(data=list_of_dicts).
Finally, you can utilize pandas's to_csv function to write your DataFrame as a CSV file into disk.
It would look something like this:
import pandas as pd
import json
with open('./airports.json.txt','r') as f:
j = json.load(f)
l = list(j.values())
df = pd.DataFrame(data=l)
df.to_csv('./airports.csv', index=False)
You need to load your json file and parse it to have all the fields available, or load the contents to a dictionary, then you could using pyodbc to write to the database these fields, or write them to the csv if you use import csv first.
But this is just a general idea. You need to study python and how to do every step.
For instance for writting to the database you could do something like:
for i in range(0,max_len):
sql_order = "UPDATE MYTABLE SET MYTABLE.MYFIELD ...."
cursor1.execute(sql_order)
cursor1.commit()

make custom spreadsheets with python

I have a pandas data frame with two columns:
year experience and salary
I want to save a csv file with these two columns and also have some stats at the head of the file as in the image:
Is there any option to handle these with pandas or any other library of do I have to make a script to write it line adding the commas between fields?
Pandas does not support what you want to do here. The problem is that your format is no valid csv. The RFC for CSV states that Each record is located on a separate line, implying that a line corresponds to a record, with an optional header line. Your format adds the average and max values, which do not correspond to records.
As I see it, you have three paths to go from here: i. You create two separate data frames and map them to csv files (super precise would be 3), one with your records, one with the additional values. ii. Write your data frame to csv first, then open that file and insert the your additional values at the top. iii. If your goal is an import into excel, however, #gefero 's suggestion is the right hint: try using the xslxwriter package do directly write to cells in a spreadsheet.
You can read the file as two separate parts (stats and csv)
Reading stats:
number_of_stats_rows = 3
stats = pandas.read_csv(file_path, nrows=number_of_stats_rows, header=None).fillna('')
Reading remaining file:
other_data = pandas.read_csv(file_path, skiprows=number_of_stats_rows).fillna('')
Take a look to xslxwriter. Perhaps it´s what you are looking for.

Reading csv from url and pushing it in DB through pandas

The URL gives a csv formatted data. I am trying to get the data and push it in database. However, I am unable to read data as it only prints header of the file and not complete csv data. Could there be better option?
#!/usr/bin/python3
import pandas as pd
data = pd.read_csv("some-url") //URL not provided due to security restrictions.
for row in data:
print(row)
You can iterate through the results of df.to_dict(orient="records"):
data = pd.read_csv("some-url")
for row in data.to_dict(orient="records"):
# For each loop, `row` will be filled with a key:value dict where each
# key takes the value of the column name.
# Use this dict to create a record for your db insert, eg as raw SQL or
# to create an instance for an ORM like SQLAlchemy.
I do a similar thing to pre-format data for SQLAlchemy inserts, although I'm using Pandas to merge data from multiple sources rather than just reading the file.
Side note: There will be plenty of other ways to do this without Pandas and just iterate through the lines of the file. However Pandas's intuituve handling of CSVs makes it an attractive shortcut to do what you need.

How do you create your own data dictionary/structure in python

In the sci-kit learn python library there are many datasets accessed easily by the following commands:
for example to load the iris dataset:
iris=datasets.load_iris()
And we can now assign data and target/label variables as follows:
X=iris.data # assigns feature dataset to X
Y=iris.target # assigns labels to Y
My question is how to create my own data dictionary using my own data either in csv, xml or any other format into something similar above so data can be called easily and features/labels are easily accessed.
Is this possible? someone help me!!
By the way I am using the spyder (anaconda) platform by continuum.
Thanks!
I see at least two (easy) solutions to your problem.
First, you can store your data in whichever structure you like.
# Storing in a list
my_list = []
my_list.append(iris.data)
my_list[0] # your data
# Storing in a dictionary
my_dict = {}
my_dict["data"] = iris.data
my_dict["data"] # your data
Or, you can create your own class:
Class MyStructure:
def __init__(data, target):
self.data = data
self.target = target
my_class = MyStructure(iris.data, iris.target)
my_class.data # your data
Hope it helps
If ALL you want to do is read data from csv files and have them organized , I would recommend you to simply use either pandas or numpy's genfromtxt function.
mydata=numpy.genfromtxt(filepath,*params)
If the CSV is formatted regularly, you can extract for example the names of each column by specifying:
mydata=numpy.genfromtxt(filepath,unpack=True,names=True,delimiter=',')
then you can access any column data you want by simply typing it's name/header:
mydata['your header']
(Pandas also has a similar convenient way of grabbing data in an organized manner from CSV or similar files.)
However if you want to do it the long way and learn:
Simply, you want to write a class for the data that you are using, complete with its own access, modify, read, #dosomething functions. Instead of code for this, I think you would benefit more from going in and reading for example the iris class, or an introduction to a simple Class from any beginners guide to object based programming.
To do what you want, for an object MyData, you could have for example
read(#file) function that reads from a given file of some expected format and returns some specified structure. For reading from csv files, you can simply use numpy's loadtxt method.
modify(#some attribute)
etc.

Exporting a list to a CSV/space separated and each sublist in its own column

I'm sure there is an easy way to do this, so here goes. I'm trying to export my lists into CSV in columns. (Basically, it's how another program will be able to use the data I've generated.) I have the group called [frames] which contains [frame001], [frame002], [frame003], etc. I would like the CSV file that's generated to have all the values for [frame001] in the first column, [frame002] in the second column, and so on. I thought if I could save the file as CSV I could manipulate it in Excel, however, I figure there is a solution that I can program to skip that step.
This is the code that I have tried using so far:
import csv
data = [frames]
out = csv.writer(open(filename,"w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)
I have also tried:
import csv
myfile = open(..., 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
If there's a way to do this so that all the values are space separated, that would be ideal, but at this point I've been trying this for hours and can't get my head around the right solution.
What you're describing is that you want to translate a 2 dimensional array of data. In Python you can achieve this easily with the zip function as long as the inner lists are all the same length.
out.writerows(zip(*data))
If they are not all the same length, you can use itertools.izip_longest to fill the remaining fields with some default value (even '').

Categories