I'm attempting to get a series of weather reports from a website, I have the below code which creates the needed URLs for the XMLs I want, what would be the best way to save the returned XMLs with different names?
with open('file.csv') as csvfile:
towns_csv = csv.reader(csvfile, dialect='excel')
for rows in towns_csv:
x = float(rows[2])
y = float(rows[1])
url = ("http://api.met.no/weatherapi/locationforecast/1.9/?")
lat = "lat="+format(y)
lon = "lon="+format(x)
text = url + format(lat) + ";" + format(lon)
I have been saving single XMls with this code;
response = requests.get(text)
xml_text=response.text
winds= bs4.BeautifulSoup(xml_text, "xml")
f = open('test.xml', "w")
f.write(winds.prettify())
f.close()
The first column of the CSV file has city names on it, I would ideally like to use those names to save each XML file as it is created. I'm sure another for loop would do, I'm just not sure how to create it.
Any help would be great, thanks again stack.
You have done most of the work already. Just use rows[0] as your filename. Assuming rows[0] is 'mumbai', then rows[0]+'.xml' will give you 'mumbai.xml' as the filename. You might want to check if city names have spaces which need to be removed, etc.
with open('file.csv') as csvfile:
towns_csv = csv.reader(csvfile, dialect='excel')
for rows in towns_csv:
x = float(rows[2])
y = float(rows[1])
url = ("http://api.met.no/weatherapi/locationforecast/1.9/?")
lat = "lat="+format(y)
lon = "lon="+format(x)
text = url + format(lat) + ";" + format(lon)
response = requests.get(text)
xml_text=response.text
winds= bs4.BeautifulSoup(xml_text, "xml")
f = open(rows[0]+'.xml', "w")
f.write(winds.prettify())
f.close()
Related
So I've never really used import csv before, but I've managed to scrape a bunch of information from websites and now want to put them in a csv file. The issue I'm having is that all my list values are being separated by commas (i.e. Jane Doe = J,a,n,e, ,D,o,e).
Also, I have three lists (one with names, one with emails, and one with titles) and I would like to add them each as its own column in the CSV file (so col1 = Name, col2 = title, col3= email)
Any thoughts on how to execute this? Thanks.
from bs4 import BeautifulSoup
import requests
import csv
urls = ''
with open('websites.txt', 'r') as f:
for line in f.read():
urls += line
urls = list(urls.split())
name_lst = []
position_lst = []
email_lst = []
for url in urls:
print(f'CURRENTLY PARSING: {url}')
print()
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
try:
for information in soup.find_all('tr', class_='sidearm-staff-member'):
names = information.find("th", attrs={'headers': "col-fullname"}).text.strip()
positions = information.find("td", attrs={'headers': "col-staff_title"}).text.strip()
emails = information.find("td", attrs={'headers': "col-staff_email"}).script
target = emails.text.split('var firstHalf = "')[1]
fh = target.split('";')[0]
lh = target.split('var secondHalf = "')[1].split('";')[0]
emails = fh + '#' + lh
name_lst.append(names)
position_lst.append(positions)
email_lst.append(emails)
except Exception as e:
pass
with open('test.csv', 'w') as csv_file:
csv_writer = csv.writer(csv_file)
for line in name_lst:
csv_writer.writerow(line)
for line in position_lst:
csv_writer.writerow(line)
for line in email_lst:
csv_writer.writerow(line)
Writing your data column-by-column is easy. All you have to do is write the rows where each row contains elements of the 3 tables with the same list index. Here is the code:
with open('test.csv', 'w') as csv_file:
csv_writer = csv.writer(csv_file)
for name, position, email in zip(name_lst, position_lst, email_lst):
csv_writer.writerow([name, position, email])
Assuming that the name_lst, position_lst and email_lst are all correct and are of the same size, Your problem is in the last part of your code where you write it to a CSV file.
Here is a way to do this:
fieldnames = ['Name', 'Position', 'Email']
with open('Data_to_Csv.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for i in range(len(name_lst)):
writer.writerow({'Name':name_lst[i],'Position':position_lst[i], 'Email':email_lst[i]})
This would of course fail if you are the length of the lists are unequal. You need to make sure that you are adding dummy values for entries that are not available to make sure that 3 lists have equal number of values.
I am very new to python.
I have a list of stock names in a csv. I extract the names and put it before a website domain to create urls. I am trying to write the urls I created into another csv, but it only writes the last one out of the list. I want it to write all of the url into the csv.
with open('names.csv', 'r') as datafile:
for line in datafile:
domain = f'https://ceo.ca/{line}'
urls_link = (domain.strip())
print(urls_link)
y = open("url.csv","w")
y.writelines(urls_link)
y.close()
names.csv: https://i.stack.imgur.com/WrrLw.png
url.csv: https://i.stack.imgur.com/BYEgN.png
I would want the url csv look like this:
https://i.stack.imgur.com/y4xre.png
I apologise if I worded some things horribly.
You can use csv module in python
Try using this code:
from csv import writer,reader
in_FILE = "names.csv"
out_FILE = 'url.csv'
urls = list()
with open(in_FILE, 'r') as infile:
read = reader(infile, delimiter=",")
for domain_row in read:
for domain in domain_row:
url = f'https://ceo.ca/{domain.strip()}'
urls.append(url)
with open(out_FILE, 'w') as outfile:
write = writer(outfile)
for url in urls:
write.writerow([url])
I've got a CSV of over 500 entries and I'm trying to generate redirect files. The formatting of the CSV is:
/contact,/contact-us,
/about,/about-us,
The /contact is the old URL and the /contact-us is the new URL.
The formatting of the desired .htm file is:
url = "/contact"
is_hidden = 0
==
<?php
function onStart(){return Redirect::to("/contact-us");}
?>
==
The filename for the .htm files are unimportant (could be 1.htm, 2.htm, etc.).
I haven't really touched Python in several years and I'm not sure if it's the best option, but from what I've been reading, it seems like it's a solid choice for CSV parsing.
Any help would be greatly appreciated.
Edit:
This is what I have so far
import pip
import csv
with open('redirects.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
print 'url = "'+row[0]+'\nis_hidden = 0\n==\n\n<?php\nfunction onStart(){return Redirect::to("'+row[1]+'");}\n?>\n=='
This prints out exactly what I need. I just need to put each entry into a .htm file (auto-incremented filename).
Edit #2:
I got what I was looking for with this code:
import pip
import csv
count = 0
with open('redirects.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
count += 1
count_str = str(count)
file = open('redirects/'+count_str+'.htm', 'w')
file.write('url = "' + row[0] + '"\nis_hidden = 0\n==\n\n<?php\nfunction onStart(){return Redirect::to("' + row[1] + '");}\n?>\n==')
file.close()
If I understand correctly, something like below might work.
directories = open('filename', 'r').read()
splitted = directories.split(",")
correctlyformatted = [x.strip() for x in splitted]
counter = 0
for i in correctlyformatted:
f=open(str(counter) + '.html', 'w')
f.writeLines([
'url = "' + i + '"',
'i.s_hidden = 0',
'==',
'<?php',
'function onStart(){return Redirect::to("' + i +'");}',
'?>', '=='])
counter += 1
I need to get information from a list and add a column year from name. I still not sure how to add one field 'year' in record. Can I use append?
And about output file, I just need use outputcsv.writerow(records) isn't it?
This is a part of code that I stuck:
filenames = ('babyQld2010.csv',
'babyQld2011.csv',
'babyQld2012.csv',
'babyQld2012.csv',
'babyQld2014.csv')
outFile = open('babyQldAll.csv','w')
csvFile_out = csv.writer(outFile, delimiter=',')
for filename in filenames:
name, ext = filename.split('.')
year = name[-4:] #extract year from file names
records = extract_names(filename)
# Get (name, count, gender) from list "records",
# and add value of "year" and write into output file (using "for" loop )
Output file look like:
2010,Lola,69,Girl
And input, I have 5 file babyQld2010.csv, babyQld2011.csv, babyQld2012.csv, babyQld2012.csv, babyQld2014.csv which contains:
Mia,425,William,493
and I have to sort it in format and I already done it and save in list 'records'
Lola,69,Girl
now I need to add one field 'year' on 'record' list and export csv file.
This is my full code:
import csv
def extract_names(filename):
''' Extract babyname, count, gender from a csv file,
and return the data in a list.
'''
inFile = open(filename, 'rU')
csvFile = csv.reader(inFile, delimiter=',')
# Initialization
records = []
rowNum = 0
for row in csvFile:
if rowNum != 0:
# +++++ You code here ++++
# Read each row of csv file and save information in list 'records'
# as (name, count, gender)
records.append([row[0], row[1], "Female"])
records.append([row[2], row[3], "Male"])
print('Process each row...')
rowNum += 1
inFile.close()
return(records)
#### Start main program #####
filenames = ('babyQld2010.csv',
'babyQld2011.csv',
'babyQld2012.csv',
'babyQld2012.csv',
'babyQld2014.csv')
with open('babyQldAll.csv','w') as outFile:
csvFile_out = csv.writer(outFile, delimiter=',')
for filename in filenames:
name, ext = filename.split('.')
year = name.split('.')[0][-4:] #extract year from file names
records = extract_names(filename)
for record in records:
csvFile_out.write([year] + record)
print("Write in csv file...")
outFile.close()
To get the year from the csv file you can simply split the string at '.' and then take the last four characters from the first part of the split. Example -
>>> s = 'babyQld2010.csv'
>>> s.split('.')[0][-4:]
'2010'
Then just simply iterate over your list of records, which you say is correct, for each list within in, use list contatenation to create a new list with year at the start and write that to csv file.
I would also suggest that you use with statement for opening the file to write to (and even in the function where you are reading from the other csv files). Example -
filenames = ('babyQld2010.csv',
'babyQld2011.csv',
'babyQld2012.csv',
'babyQld2012.csv',
'babyQld2014.csv')
with open('babyQldAll.csv','w') as outFile:
csvFile_out = csv.writer(outFile, delimiter=',')
for filename in filenames:
name, ext = filename.split('.')
year = name.split('.')[0][-4:] #extract year from file names
records = extract_names(filename)
for record in records:
csvFile_out.writerow([year] + record)
Yes, you can just append the year column to each row as you read it in from your source files. You can read in & write out each row as a dictionary so that you can use your existing column headers to address the data if you need to massage it on the way through.
Using the csv.DictWriter() method you specify your headers (fieldnames) when you set it up. You can then write them out with the writeheader() method.
import csv
file_list = ['babyQld2010.csv',
'babyQld2011.csv',
'babyQld2012.csv',
'babyQld2012.csv',
'babyQld2014.csv']
outFile = open('babyQldAll.csv', 'wb')
csv_writer = csv.DictWriter(outFile,
fieldnames=['name','count','gender','year'])
csv_write_out.writeheader()
for a_file in file_list:
name,ext = a_file.split('.')
year = name[-4:]
with open(a_file, 'rb') as inFile:
csv_read_in = csv.DictReader(inFile)
for row in csv_read_in:
row['year'] = year
csv_writer.writerow(row)
outfile.close()
Hope this helps.
I want to read files in an advanced mode.
First:
In this file, I have certain steps with which the code has to follow, how do I read the steps until the string [data] appears.
[Steps]
step1 = WebAddress
step2 = Tab
step3 = SecurityType
step4 = Criteria
step5 = Date
step6 = Click1
step7 = Results
step8 = Download
[data]
......
Second:
How can I read all everything after [data].
[data]
WebAddress___________________________ Destination___________ Tab_____________ SecurityType___________________________________________________ Criteria___ Date_______ Click1_ Results_ Download
https://mbsdisclosure.fanniemae.com/ q:\\%s\\raw\\fnmapool Advanced Search Interim MBS: Single-Family Issue Date 09/01/2012 Search 100 CSV XML
https://mbsdisclosure.fanniemae.com/ q:\\%s\\raw\\fnmapool Advanced Search Preliminary Mega: Fannie Mae/Ginnie Mae backed Adjustable Rate Issue Date 09/01/2012 Search 100 CSV XML
https://mbsdisclosure.fanniemae.com/ q:\\%s\\raw\\fnmapool Advanced Search Preliminary Mega: Fannie Mae/Ginnie Mae backed Fixed Rate Issue Date 09/01/2012 Search 100 CSV XML
I want to pass everything under the step____________________ where step can be the steps(e.g. WebAddress).
So for example, if step1 = WebAddress how do I read everything under WebAddress__________________________ and so on? Thanks!
First:
with open(file_name) as f:
print (f.read()).split("[data]")
Second:
with open(file_name) as f:
pre_data,post_data =[s.strip() for s in (f.read()).split("[data]")]
post_data_lines = post_data.splitlines()
headers = post_data_lines[0].split()
print headers
for line in post_data_lines[1:]:
print line.split()
print dict(zip(headers,line.split()))
Im also not sure how your [data]is delimited you may want line.split('\t') if its tabbed
this is untested... but it should work and it doesnt quite get you all the way where you want but at least it gets most of what your want (the "hard" parts)
to split by header width use
file_name = "testdata.txt"
with open(file_name) as f:
pre_data,post_data =[s.strip() for s in (f.read()).split("[data]")]
post_data_lines = post_data.splitlines()
headers = post_data_lines[0].split()
for line in post_data_lines[1:]:
tmpline = []
pos = 0
for itm in headers:
tmpline.append(line[pos:pos+len(itm)])
pos += len(itm)+1
print dict(zip(headers,tmpline))
and if you want the actual header with out the __'s then use
file_name = "testdata.txt"
with open(file_name) as f:
pre_data,post_data =[s.strip() for s in (f.read()).split("[data]")]
post_data_lines = post_data.splitlines()
headers = post_data_lines[0].split()
headers2 = [s.replace("_"," ").strip() for s in headers]
for line in post_data_lines[1:]:
tmpline = []
pos = 0
for itm in headers:
tmpline.append(line[pos:pos+len(itm)])
pos += len(itm)+1
print dict(zip(headers2,tmpline))
First step:
>>> import ConfigParser
>>> cfg = ConfigParser.RawConfigParser()
>>> with open('sample.cfg') as f:
... cfg.readfp(f)
...
>>> cfg.get('Steps','step1')
'WebAddress'
Second step:
>>> data_section = ''
>>> with open('sample.cfg') as f:
... data_section = f.read()
...
>>> data = data_section[data_section.index('[data]')+len('[data]')+1:]
>>> reader = csv.reader(io.BytesIO(data),delimiter='\t')
>>> reader.next() # skips header
>>> results = [row for for row in reader]
Now results is a list of lists, with each inner list having items from the data section.
[['https://mbsdisclosure.fanniemae.com/','q:\\\\%s\\\\raw\\\\fnmapool','Advanced Search', 'Interim MBS: Single-Family', 'Issue Date','09/01/2012','Search','100', 'CSV XML']...]