Python CSV parsing and file generation - python

I've got a CSV of over 500 entries and I'm trying to generate redirect files. The formatting of the CSV is:
/contact,/contact-us,
/about,/about-us,
The /contact is the old URL and the /contact-us is the new URL.
The formatting of the desired .htm file is:
url = "/contact"
is_hidden = 0
==
<?php
function onStart(){return Redirect::to("/contact-us");}
?>
==
The filename for the .htm files are unimportant (could be 1.htm, 2.htm, etc.).
I haven't really touched Python in several years and I'm not sure if it's the best option, but from what I've been reading, it seems like it's a solid choice for CSV parsing.
Any help would be greatly appreciated.
Edit:
This is what I have so far
import pip
import csv
with open('redirects.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
print 'url = "'+row[0]+'\nis_hidden = 0\n==\n\n<?php\nfunction onStart(){return Redirect::to("'+row[1]+'");}\n?>\n=='
This prints out exactly what I need. I just need to put each entry into a .htm file (auto-incremented filename).
Edit #2:
I got what I was looking for with this code:
import pip
import csv
count = 0
with open('redirects.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
count += 1
count_str = str(count)
file = open('redirects/'+count_str+'.htm', 'w')
file.write('url = "' + row[0] + '"\nis_hidden = 0\n==\n\n<?php\nfunction onStart(){return Redirect::to("' + row[1] + '");}\n?>\n==')
file.close()

If I understand correctly, something like below might work.
directories = open('filename', 'r').read()
splitted = directories.split(",")
correctlyformatted = [x.strip() for x in splitted]
counter = 0
for i in correctlyformatted:
f=open(str(counter) + '.html', 'w')
f.writeLines([
'url = "' + i + '"',
'i.s_hidden = 0',
'==',
'<?php',
'function onStart(){return Redirect::to("' + i +'");}',
'?>', '=='])
counter += 1

Related

Trying to read files named file1,file2,file3 using for loop in Python

I am pretty new to python and trying to run a script to edit csv files. The problem I am facing is that I need to split the csv files into smaller pieces(as they are large files and getting memory errors) and then run another script to edit the files but when im trying to append these two scripts and run the test, the script is reading only the first small file and not reading the rest of the files.
For example: When I split the main csv file, the files are getting split and the names come as big-1.csv,big-2.csv. Then when the script is picking up the files to edit, only big-1.csv is getting edited and rest are not getting edited.
The script is:
import csv
from csv import DictWriter
divisor = 990
outfileno = 1
outfile = None
with open('MOCK_DATA.csv', 'r', newline='') as infile:
infile_iter = csv.reader(infile, delimiter='\t')
header = next(infile_iter)
for index, row in enumerate(infile_iter):
if index % divisor == 0:
if outfile:
outfile.close()
outfilename = 'big-{}.csv'.format(outfileno)
outfile = open(outfilename, 'w', newline='')
outfileno += 1
writer = csv.writer(outfile, delimiter='\t', quoting=csv.QUOTE_NONE)
writer.writerow(header)
writer.writerow(row)
# Don't forget to close the last file
if outfile:
outfile.close()
#export the data
# with correct quoting, and that you are stuck with what you have.
for i in range(1,2):
with open("big-" + str(i) + ".csv") as people_file:
next(people_file)
corrected_people = []
for person_line in people_file:
chomped_person_line = person_line.rstrip()
person_tokens = chomped_person_line.split(",")
# check that each field has the expected type
try:
corrected_person = {
"id": person_tokens[0],
"first_name":person_tokens[1],
"last_name": "".join(person_tokens[2:-3]),
"email":person_tokens[-3],
"gender":person_tokens[-2],
"ip_address":person_tokens[-1]
}
if not corrected_person["ip_address"].startswith(
"") and corrected_person["ip_address"] !="n/a":
raise ValueError
corrected_people.append(corrected_person)
except (IndexError, ValueError):
# print the ignored lines, so manual correction can be performed later.
print("Could not parse line: " + chomped_person_line)
with open("fix-" + str(i) + ".csv", "w") as corrected_people_file:
writer = DictWriter(
corrected_people_file,
fieldnames=[
"id","first_name","last_name","email","gender","ip_address"
],delimiter=',')
writer.writeheader()
writer.writerows(corrected_people)
I think this maybe an issue with reading the smaller files in the for loop. The script is running without any error. Please help.

Unable to read csv file present in Unix in Python

I want to read CSV file which is present in Unix at a path say-/var/lib/Folder/abc.csv
I am using below code to read this file, but it looks like it's not returning any rows and hence it's not going inside the for loop.
file_path = "/var/lib/Folder/abc.csv"
with open(file_path, newline='') as csv_file:
reader = csv.reader(csv_file)
for row in reader:
logging.debug(str(datetime.datetime.now()) + " Checking rows...")
logging.debug(str(datetime.datetime.now()) + " Row(" + str(count) + ") = " + row)
CSV file looks something like-
"Account ID","Detail","Description","Date created"
"123456","Customer","Savings","2017/10/24"
I am using Python 2.7
This works when i try in my local. But I am actually using Jenkins to run this and the file is placed in my Jenkins Master server. I have copied that file from the code server to jenkins using below - with ssh_shell.open(path + fileName, "rb") as remote_file: with open(path + fileName, "wb") as local_file: shutil.copyfileobj(remote_file, local_file) After this I am trying to read the file, its not working. i.e not going inside that for loop. Any idea on that?
You could try the following:
reader = csv.reader(csv_file, delimiter=',', quotechar='"')
Also, please try to see what "reader" actually contains. If the for loop is not entered that means that the contents have not been read correctly. Try to start debugging from there.
try this changed row to str(row)
file_path = "/var/lib/Folder/abc.csv"
with open(file_path, newline='') as csv_file:
reader = csv.reader(csv_file)
for row in reader:
logging.debug(str(datetime.datetime.now()) + " Checking rows...")
logging.debug(str(datetime.datetime.now()) + " Row(" + str(count) + ") = " + str(row))

How to save multiple xml files in python

I'm attempting to get a series of weather reports from a website, I have the below code which creates the needed URLs for the XMLs I want, what would be the best way to save the returned XMLs with different names?
with open('file.csv') as csvfile:
towns_csv = csv.reader(csvfile, dialect='excel')
for rows in towns_csv:
x = float(rows[2])
y = float(rows[1])
url = ("http://api.met.no/weatherapi/locationforecast/1.9/?")
lat = "lat="+format(y)
lon = "lon="+format(x)
text = url + format(lat) + ";" + format(lon)
I have been saving single XMls with this code;
response = requests.get(text)
xml_text=response.text
winds= bs4.BeautifulSoup(xml_text, "xml")
f = open('test.xml', "w")
f.write(winds.prettify())
f.close()
The first column of the CSV file has city names on it, I would ideally like to use those names to save each XML file as it is created. I'm sure another for loop would do, I'm just not sure how to create it.
Any help would be great, thanks again stack.
You have done most of the work already. Just use rows[0] as your filename. Assuming rows[0] is 'mumbai', then rows[0]+'.xml' will give you 'mumbai.xml' as the filename. You might want to check if city names have spaces which need to be removed, etc.
with open('file.csv') as csvfile:
towns_csv = csv.reader(csvfile, dialect='excel')
for rows in towns_csv:
x = float(rows[2])
y = float(rows[1])
url = ("http://api.met.no/weatherapi/locationforecast/1.9/?")
lat = "lat="+format(y)
lon = "lon="+format(x)
text = url + format(lat) + ";" + format(lon)
response = requests.get(text)
xml_text=response.text
winds= bs4.BeautifulSoup(xml_text, "xml")
f = open(rows[0]+'.xml', "w")
f.write(winds.prettify())
f.close()

How to not just add a new first column to csv but alter the header names

I would like to do the following
read a csv file, Add a new first column, then rename some of the columns
then load the records from csv file.
Ultimately, I would like the first column to be populated with the file
name.
I'm fairly new to Python and I've kind of worked out how to change the fieldnames however, loading the data is a problem as it's looking for the original fieldnames which no longer match.
Code snippet
import csv
import os
inputFileName = "manifest1.csv"
outputFileName = os.path.splitext(inputFileName)[0] + "_modified.csv"
with open(inputFileName, 'rb') as inFile, open(outputFileName, 'wb') as outfile:
r = csv.DictReader(inFile)
fieldnames = ['MapSvcName','ClientHostName', 'Databasetype', 'ID_A', 'KeepExistingData', 'KeepExistingMapCache', 'Name', 'OnPremisePath', 'Resourcestype']
w = csv.DictWriter(outfile,fieldnames)
w.writeheader()
*** Here is where I start to go wrong
# copy the rest
for node, row in enumerate(r,1):
w.writerow(dict(row))
Error
File "D:\Apps\Python27\ArcGIS10.3\lib\csv.py", line 148, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'Databases [xsi:type]', 'Resources [xsi:type]', 'ID'
Would like to some assistance to not just learn but truly understand what I need to do.
Cheers and thanks
Peter
Update..
I think I've worked it out
import csv
import os
inputFileName = "manifest1.csv"
outputFileName = os.path.splitext(inputFileName)[0] + "_modified.csv"
with open(inputFileName, 'rb') as inFile, open(outputFileName, 'wb') as outfile:
r = csv.reader(inFile)
w = csv.writer(outfile)
header = next(r)
header.insert(0, 'MapSvcName')
#w.writerow(header)
next(r, None) # skip the first row from the reader, the old header
# write new header
w.writerow(['MapSvcName','ClientHostName', 'Databasetype', 'ID_A', 'KeepExistingData', 'KeepExistingMapCache', 'Name', 'OnPremisePath', 'Resourcestype'])
prevRow = next(r)
prevRow.insert(0, '0')
w.writerow(prevRow)
for row in r:
if prevRow[-1] == row[-1]:
val = '0'
else:
val = prevRow[-1]
row.insert(0,val)
prevRow = row
w.writerow(row)

How do I iterate through 2 CSV files and get data from one and add to the other?

I'm trying to iterate over a CSV file that has a 'master list' of names, and compare it to another CSV file that contains only the names of people who were present and made phone calls.
I'm trying to iterate over the master list and compare it to the names in the other CSV file, take the number of calls made by the person and write a new CSV file containing number of Calls if the name isn't found or if it's 0, I need that column to have 0 there.
I'm not sure if its something incredibly simple I'm overlooking, or if I am truly going about this incorrectly.
Edited for formatting.
import csv
import sys
masterlst = open('masterlist.csv')
comparelst = open(sys.argv[1])
masterrdr = csv.DictReader(masterlst, dialect='excel')
comparerdr = csv.DictReader(comparelst, dialect='excel')
headers = comparerdr.fieldnames
with open('callcounts.csv', 'w') as outfile:
wrtr = csv.DictWriter(outfile, fieldnames=headers, dialect='excel', quoting=csv.QUOTE_MINIMAL, delimiter=',', escapechar='\n')
wrtr.writerow(dict((fn,fn) for fn in headers))
for lines in masterrdr:
for row in comparerdr:
if lines['Names'] == row['Names']:
print(lines['Names'] + ' has ' + row['Calls'] + ' calls')
wrtr.writerow(row)
elif lines['Names'] != row['Names']:
row['Calls'] = ('%s' % 0)
wrtr.writerow(row)
print(row['Names'] + ' had 0 calls')
masterlst.close()
comparelst.close()
Here's how I'd do it, assuming the file sizes do not prove to be problematic:
import csv
import sys
with open(sys.argv[1]) as comparelst:
comparerdr = csv.DictReader(comparelst, dialect='excel')
headers = comparerdr.fieldnames
names_and_counts = {}
for line in comparerdr:
names_and_counts[line['Names']] = line['Calls']
# or, if you're sure you only want the ones with 0 calls, just use a set and only add the line['Names'] values that that line['Calls'] == '0'
with open('masterlist.csv') as masterlst:
masterrdr = csv.DictReader(masterlst, dialect='excel')
with open('callcounts.csv', 'w') as outfile:
wrtr = csv.DictWriter(outfile, fieldnames=headers, dialect='excel', quoting=csv.QUOTE_MINIMAL, delimiter=',', escapechar='\n')
wrtr.writerow(dict((fn,fn) for fn in headers))
# or if you're on 2.7, wrtr.writeheader()
for line in masterrdr:
if names_and_counts.get(line['Names']) == '0':
row = {'Names': line['Names'], 'Calls': '0'}
wrtr.writerow(row)
That writes just the rows with 0 calls, which is what your text description said - you could tweak it if you wanted to write something else for non-0 calls.
Thanks everyone for the help. I was able to nest another with statement inside of my outer loop, and add a variable to test whether or not the name from the master list was found in the compare list. This is my final working code.
import csv
import sys
masterlst = open('masterlist.csv')
comparelst = open(sys.argv[1])
masterrdr = csv.DictReader(masterlst, dialect='excel')
comparerdr = csv.DictReader(comparelst, dialect='excel')
headers = comparerdr.fieldnames
with open('callcounts.csv', 'w') as outfile:
wrtr = csv.DictWriter(outfile, fieldnames=headers, dialect='excel', quoting=csv.QUOTE_MINIMAL, delimiter=',', escapechar='\n')
wrtr.writerow(dict((fn,fn) for fn in headers))
for line in masterrdr:
found = False
with open(sys.argv[1]) as loopfile:
looprdr = csv.DictReader(loopfile, dialect='excel')
for row in looprdr:
if row['Names'] == line['Names']:
line['Calls'] = row['Calls']
wrtr.writerow(line)
found = True
break
if found == False:
line['Calls'] = '0'
wrtr.writerow(line)
masterlst.close()
comparelst.close()

Categories