Related
I normally use txt files but i need to use a csv i based this off how i do txt files and i am not sure what i am doing wrong can anyone help me please.
Home = "Road"
House = 5
def Save(Home,House):
Saved=open('Saved.csv', 'a')
Saved.write(Home+House+"/n")
Saved.close()
Save(Home,House)
I get this error
File "F:/Pygame/Test12.py", line 74, in Save
Saved.write(Home+House+"/n")
TypeError: cannot concatenate 'str' and 'int' objects
1) that's not a .csv file.
2) in python, you cannot concatenate integers with strings without prior conversion.
3) doing this: Home+str(House) would be legal, but when you want to read back your file you have to separate both fields (you provided no way of separating them)
Here's a code which would create a real csv file:
import csv
def Save(Home,House):
with open('Saved.csv', 'a') as Saved:
cw = csv.writer(Saved)
cw.writerow([Home,House])
when you compose your row, you can put any data you want, the csv module will convert it to string if needed.
BTW to read it back, use a csv.reader and iterate through the rows. Since you know the datatype, you can convert 2nd column to int directly.
with open('Saved.csv', 'r') as Saved:
cr = csv.reader(Saved)
for row in cr:
Home = row[0]
House = int(row[1])
# now you have to do something with those variables :)
You can not concatenate an integer with a string, use the following to convert the int to string:
Saved.write(Home + str(House) + "\n")
Python variables have types. So you are trying to add 5 to the 'House' which is not defined application. To make this work you have to convert number 5 to string '5'. As ettanany suggest use
Saved.write(Home+str(House)+"\n")
Also, note that it's '\n' not '/n'
I have three very long CSV files, and I need some advice/help with manipulating the code. Basically, I want the program to be broad/basic enough where I can add any limitations and it'll work.
For example, if I want to set the code to find where column 1==x and column 2 ==y, I want the code to also work if I want column 1!=r and column 2
import csv
file = input('csv files: ').split(',')
filters = input('Enter the filters: ').split(',')
f = open(csv_file,'r')
p=csv.reader(f)
header_eliminator = next(p,[])
I run into issues with the "file" part because if I choose to only use one file rather than the three I want to use now, it won't work. Same goes for the filters. The filters could be like
4==10,5>=4
this means that column 4 of the file(s) would equal 10 and column 5 of the files would be greater than or equal to 4. However, I might also want the filters to look like this:
1==4.333, 5=="6/1/2014 0:00:00", 6<=60.0, 7!=6
So I want to be able to use it for other things! I'm having so much trouble with this, do you have any advice on how to get started? Thanks!
Pandas is excellent for dealing with csv files. I'd recommend installing it. pip install pandas
Then if you want to read open 3 csv files and do checks on the columns. You'll just need to familiarize yourself with indexing in pandas. The only method you need to know for now, is .iloc since it seems you are indexing using the integer position of the columns.
import pandas as pd
files = input('Enter the csv files: ').split(',')
data = []
#keeping a list of the files allows us to input a different number of files
#we use pandas to read in each file into a pandas dataframe which is then stored in an element of the list. The length of the list is the number of files.
for names in files:
data.append(pd.read_csv(names)
#You can then perform checks like this to see if the column 2 of all files are equal to 3
print all(i.iloc[:,2] == 3 for i in data)
You can write an generator that will take a bunch of filenames and output the lines one by one and feed that in to csv.reader. The tricky part is the filter. If you let the filter be a single line of python code, then you can use eval for that part. As an example
import csv
#filenames = input('csv files: ').split(',')
#filters = input('Enter the filters: ').split(',')
# todo: for debug
# in this implementation, filters is a single python expression that can
# reference the 'col' variable which is a list of the current columns
filenames = 'a.csv,b.csv,c.csv'
filters = '"a" in col[0] and "2" in col[2]'
# todo: debug generate test files
for name in 'abc':
with open('{}.csv'.format(name), 'w') as fp:
fp.write('the header row\n')
for row in range(3):
fp.write(','.join('{}{}{}'.format(name, row, col) for col in range(3)) + '\n')
def header_squash(filenames):
"""Iterate multiple files line by line after squashing header line
and any empty lines.
"""
for filename in filenames:
with open(filename) as fp:
next(fp)
for line in fp:
if line.strip():
yield line
for col in csv.reader(header_squash(filenames.split(','))):
# eval's namespace limits the damage untrusted code can do...
if eval(filters, { 'col':col }):
# passed the filter, do the work
print(col)
My data looks like below
['[\'Patient, A\', \'G\', \'P\', \'RNA\']']
Irrespective of the brackets, quotes and back slashes, I'd like to separate the data by ',' and write to a CSV file like below
Patient,A,G,P,RNA
Mentioning delimiter = ',' has done no help. The output file then looks like
['Patient, A','G','P','RNA']
all in a single cell. I want to split them into multiple columns. How can I do that?
Edit - Mentioning quotechar='|' split them into different cells but it now looks like
|['Patient, A','G','P','RNA']|
Edit-
out_file_handle = csv.writer(out_file, quotechar='|', lineterminator='\n', delimiter = ",")
data = ''.join(mydict.get(word.lower(), word) for word in re.split('(\W+)', transposed))
data = [data,]
out_file_handle.writerow(data)
transposed:
['Patient, A','G','P','RNA']
data:
['[\'Patient, A\', \'G\', \'P\', \'RNA\']']
And it has multiple rows, the above is one of the rows from the entire data.
You first need to read this data into a Python array, by processing the string as a CSV file in memory:
from StringIO import StringIO
import csv
data = ['[\'Patient, A\', \'G\', \'P\', \'RNA\']']
clean_data = list(csv.reader( StringIO(data[0]) ))
However the output is still a single string, because it's not even a well-formed CSV! In which case, the best thing might be to filter out all those junk characters?
import re
clean_data = re.sub("[\[\]']","",data[0])
Now data[0] is 'Patient, A, G, P, RNA' which is a clean CSV you can write straight to a file.
If what you're trying to do is write data in the form of ['[\'Patient, A\', \'G\', \'P\', \'RNA\']'], where you have an array of these strings, to file, then it's really a question in two parts.
The first, is how do you separate the data into the correct format, and then the second is is to write it to file.
If that is the form of your data, for every row, then something like this should work (to get it into the correct format):
data = ['[\'Patient, A\', \'G\', \'P\', \'RNA\']', ...]
newData = [entry.replace("\'", "")[1:-1].split(",") for entry in data]
that will give you data in the following form:
[["Patient", "A", "G", "P", "RNA"], ...]
and then you can write it to file as suggested in the other answers;
with open('new.csv', 'wb') as write_file:
file_writer = csv.writer(write_file)
for dataEntry in range(newData ):
file_writer.writerow(dataEntry)
If you don't actually care about using the data in this round, and just want to clean it up, then you can just do data.replace("\'", "")[1:-1] and then write those strings to file.
The [1:-1] bits are just to remove the leading and trailing square brackets.
Python has a CSV writer. Start off with
import csv
Then try something like this
with open('new.csv', 'wb') as write_file:
file_writer = csv.writer(write_file)
for i in range(data):
file_writer.writerow([x for x in data[i]])
Edit:
You might have to wrangle the data a bit first before writing it, since it looks like its a string and not actually a list. Try playing around with the split() function
list = data.split()
"""
SAVING DATA INTO CSV FORMAT
* This format is used for many purposes, mainly for deep learning.
* This type of file can be used to view data in MS Excel or any similar
Application
"""
# == Imports ===================================================================
import csv
import sys
# == Initialisation Function ===================================================
def initialise_csvlog(filename, fields):
"""
Initilisation this function before using the Inserction function
* This Function checks the data before adding new one in order to maintain
perfect mechanisum of insertion
* It check the file if not exists then it creates a new one
* if it exists then it proceeds with getting fields
Parameters
----------
filename : String
Filename along with directory which need to be created
Fields : List
Colomns That need to be initialised
"""
try :
with open(filename,'r') as csvfile:
csvreader = csv.reader(csvfile)
fields = csvreader.next()
print("Data Already Exists")
sys.exit("Please Create a new empty file")
# print fields
except :
with open(filename,'w') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(fields)
# == Data Insertion Function ===================================================
def write_data_csv(filename, row_data):
"""
This Function save the Row Data into the CSV Created
* This adds the row data that is Double Listed
Parameters
----------
filename : String
Filename along with directory which need to be created
row_data : List
Double Listed consisting of row data and column elements in a list
"""
with open(filename,'a') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerows(row_data)
if __name__ == '__main__':
"""
This function is used to test the Feature Run it independently
NOTE: DATA IN row_data MUST BE IN THE FOLLOWING DOUBLE LISTED AS SHOWN
"""
filename = "TestCSV.csv"
fields = ["sno","Name","Work","Department"]
#Init
initialise_csvlog(filename,fields)
#Add Data
row_data = [["1","Jhon","Coder","Pythonic"]]
write_data_csv(filename,row_data)
# == END =======================================================================
Read the Module and you can start using CSV and view data in Excel or any similar application (calc in libreoffice)
NOTE: Remember to place list of data to be double listed as shown in __main__ function (row_data)
I know this question has been asked before, but never with the following caveats:
I'm a complete python n00b. Also a JSON noob.
The JSON file / string is not the same as those seen in json2csv examples.
The CSV file output is supposed to have standard columns.
Due to point number 1, I'm not aware of most terminologies and technologies used for this. So please bear with me.
Point number 2: Here's a single line of the supposed JSON file:
"id":"123456","about":"YESH","can_post":true,"category":"Community","checkins":0,"description":"OLE!","has_added_app":false,"is_community_page":false,"is_published":true,"likes":48,"link":"www.fake.com","name":"Test Name","parking":{"lot":0,"street":0,"valet":0},"talking_about_count":0,"website":"www.fake.com/blog","were_here_count":0^
Weird, I know - it lacks braces and brackets and stuff. Which is why I'm convinced posted solutions won't work.
I'm not sure what the 0^ at the end of the line is, but I see it at the end of every line. I'm assuming the 0 is the value for "were_here_count" while the ^ is a... line terminator? EDIT: Apparently, I can just disregard it.
Of note is that the value of "parking" appears to be yet another array - I'm fine with just displaying it as is (minus the double quotes).
Point number 3: Here's the columns of the supposed CSV file output. This is the complete column set - the JSON file won't always have them all.
ID STRING,
ABOUT STRING,
ATTIRE STRING,
BAND_MEMBERS STRING,
BEST_PAGE STRING,
BIRTHDAY STRING,
BOOKING_AGENT STRING,
CAN_POST STRING,
CATEGORY STRING,
CATEGORY_LIST STRING,
CHECKINS STRING,
COMPANY_OVERVIEW STRING,
COVER STRING,
CONTEXT STRING,
CURRENT_LOCATION STRING,
DESCRIPTION STRING,
DIRECTED_BY STRING,
FOUNDED STRING,
GENERAL_INFO STRING,
GENERAL_MANAGER STRING,
GLOBAL_BRAND_PARENT_PAGE STRING,
HOMETOWN STRING,
HOURS STRING,
IS_PERMANENTLY_CLOSED STRING,
IS_PUBLISHED STRING,
IS_UNCLAIMED STRING,
LIKES STRING,
LINK STRING,
LOCATION STRING,
MISSION STRING,
NAME STRING,
PARKING STRING,
PHONE STRING,
PRESS_CONTACT STRING,
PRICE_RANGE STRING,
PRODUCTS STRING,
RESTAURANT_SERVICES STRING,
RESTAURANT_SPECIALTIES STRING,
TALKING_ABOUT_COUNT STRING,
USERNAME STRING,
WEBSITE STRING,
WERE_HERE_COUNT STRING
Here's my code so far:
import os
num = '1'
inPath = "./fb-data_input/"
outPath = "./fb-data_output/"
#Get list of Files, put them in filenameList array
fileNameList = os.listdir(path)
#Process per file in
for item in fileNameList:
print("Processing: " + item)
fb_inputFile = open(inPath + item, "rb").read().split("\n")
fb_outputFile = open(outPath + "fbdata-IAB-output" + num, "wb")
num++
jsonString = fb_inputFile.split("\",\"")
jsonField = jsonString[0]
jsonValue = jsonString[1]
jsonHash[?] = [?,?]
#Do Code stuff here
Up until the for loop, it just loads the json file names into an array, and then processes it one by one.
Here's my logic for the rest of the code:
Split the json string by something. Perhaps the "," so that other commas won't get split.
Store it into a hashmap / 2D array (dynamic?)
Trim away the JSON fields and the first and/or last double quotes.
Add the resulting output to another hashmap, with those set columns, putting in null in a column that the JSON file does not have.
And then I output the result to a CSV.
It sounds logical in my head, but I'm pretty sure there's something I missed. And of course, I have a hard time putting it in code.
Can I have some help on this? Thanks.
P.S.
Additional information:
OS: Mac OSX
Target platform OS: Ubuntu of some sort
Here is a full solution, based on your original code:
import os
import json
from csv import DictWriter
import codecs
def get_columns():
columns = []
with open("columns.txt") as f:
columns = [line.split()[0] for line in f if line.strip()]
return columns
if __name__ == "__main__":
in_path = "./fb-data_input/"
out_path = "./fb-data_output/"
columns = get_columns()
bad_keys = ("has_added_app", "is_community_page")
for filename in os.listdir(in_path):
json_filename = os.path.join(in_path, filename)
csv_filename = os.path.join(out_path, "%s.csv" % (os.path.basename(filename)))
with open(json_filename) as f, open(csv_filename, "wb") as csv_file:
csv_file.write(codecs.BOM_UTF8)
csv = DictWriter(csv_file, columns)
csv.writeheader()
for line_number, line in enumerate(f, start=1):
try:
data = json.loads("{%s}" % (line.strip().strip('^')))
# fix parking column
if "parking" in data:
data['parking'] = ", ".join("%s: %s" % (k, str(v)) for k, v in data['parking'].items())
data = {k.upper(): unicode(v).encode('utf8') for k, v in data.items() if k not in bad_keys}
except Exception, e:
import traceback
traceback.print_exc()
data = {columns[0]: "Error on line %s of %s: %s" % (line_number, json_filename, e)}
csv.writerow(data)
Edited: Full unicode support plus extended error information.
So, first off, your string is valid json if you just add curly braces around it. You can then deserialize with Python's json library. Setup your csv columns as a dictionary with each of them pointing to whatever you want as a default value (None? ""? you're choice). Once you've deserialized the json to a dict, just loop through each key there and fill in the csv_columns dict as appropriate. Then just use Python's csv module to write it out:
import json
import csv
string = '"id":"123456","about":"YESH","can_post":true,"category":"Community","checkins":0,"description":"OLE!","has_added_app":false,"is_community_page":false,"is_published":true,"likes":48,"link":"www.fake.com","name":"Test Name","parking":{"lot":0,"street":0,"valet":0},"talking_about_count":0,"website":"www.fake.com/blog","were_here_count":0^'
string = '{%s}' % string[:-1]
json_dict = json.loads(string)
#make 'parking' a string. I'm assuming that's your only hash.
json_dict['parking'] = json.dumps(json_dict['parking'])
csv_cols_list = ['a','b','c'] #put your actual csv columns here
csv_cols = {col: '' for col in csv_cols_list}
for k, v in json_dict.iterkeys():
if k in csv_cols:
csv_cols[k] = v
#now just write to csv using Python's csv library
Note: this is a general answer that assumes that your "json" will be valid key/value pairs. Your "parking" key is a special case you'll need to deal with somehow. I left it as is because I don't know what you want with it. I'm also assuming the '^' at the end of your string was a typo.
[EDIT] Changed to account for parking and the '^' at the end. [/EDIT]
Either way, the general idea here is what you want.
The first thing is your input is not JSON. Its just a string that is delimited, where the column and value is quoted.
Here is a solution that would work:
import csv
columns = ['ID', 'ABOUT', ... ]
with open('input_file.txt', 'r') as f, open('output_file.txt', 'w') as o:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(o, delimiter=',')
writer.writerow(columns)
for row in reader:
data = {k.upper():v for k,v in row.split(':', 1)}
row = [data.get(v, '') for v in columns]
writer.writerow(row)
In this loop, for each line we read from the input file, a dictionary is created. The key is the first value from the 'foo:bar' pair, and we convert it to upper case.
Next, for each column, we try to fetch a value from this dictionary in the order that the columns are written out. If a value for the column doesn't exist, a blank '' is returned. These values are collected in a list row. This makes sure no matter how many columns are missing, we write an equal number of columns to the output.
I am trying to find the min and max out of a csv file, and have it output into a text file, currently my code outputs all data into the output file, and I am unsure of how to grab the data out of the multiple columns and have them sorted accordingly.
Any guidance would be appreciated, as I don't have a good lead on how to figure this out
read_file = open("riskfactors.csv", 'r')
def create_file():
read_file = open("riskfactors.csv", 'r')
write_file = open("best_and_worst.txt", "w")
for line_str in read_file:
read_file.readline()
print (line_str,file=write_file)
write_file.close()
read_file.close()
Assuming your file is a standard .csv file containing only numbers separated by semicolons:
1;5;7;6;
3;8;1;1;
Then it's easiest to use the str.split() command, followed by a type conversion to int.
You could store all values in a list (or quicker: set) and then get the maximum:
valuelist=[]
for line_str in read_file:
for cell in line_str.split(";"):
valuelist.append(int(cell))
print(max(valuelist))
print(min(valuelist))
Warning: If your file contains non-number entries you'd have to filter them out. .csv-files can also have different delimiters.
import sys, csv
def cmp_risks(x, y):
# This assumes risk factors are prioritised by key columns 1, 3
# and that column 1 is numeric while column 3 is textual
return cmp(int(x[0]), int(y[0])) or cmp(x[2], y[2])
l = sorted(csv.reader(sys.stdin), cmp_risks))
# Write out the first and last rows
csv.writer(sys.stdout).writerows([l[0], l[len(l)-1]])
Now, I took a shortcut and said the input and output files were sys.stdin and sys.stdout. You'd probably replace these with the file objects you created in your original question. (e.g. read_file and write_file)
However, in my case, I'd probably just run it (if I were using linux) with:
$ ./foo.py <riskfactors.csv >best_and_worst.txt