So I've never really used import csv before, but I've managed to scrape a bunch of information from websites and now want to put them in a csv file. The issue I'm having is that all my list values are being separated by commas (i.e. Jane Doe = J,a,n,e, ,D,o,e).
Also, I have three lists (one with names, one with emails, and one with titles) and I would like to add them each as its own column in the CSV file (so col1 = Name, col2 = title, col3= email)
Any thoughts on how to execute this? Thanks.
from bs4 import BeautifulSoup
import requests
import csv
urls = ''
with open('websites.txt', 'r') as f:
for line in f.read():
urls += line
urls = list(urls.split())
name_lst = []
position_lst = []
email_lst = []
for url in urls:
print(f'CURRENTLY PARSING: {url}')
print()
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
try:
for information in soup.find_all('tr', class_='sidearm-staff-member'):
names = information.find("th", attrs={'headers': "col-fullname"}).text.strip()
positions = information.find("td", attrs={'headers': "col-staff_title"}).text.strip()
emails = information.find("td", attrs={'headers': "col-staff_email"}).script
target = emails.text.split('var firstHalf = "')[1]
fh = target.split('";')[0]
lh = target.split('var secondHalf = "')[1].split('";')[0]
emails = fh + '#' + lh
name_lst.append(names)
position_lst.append(positions)
email_lst.append(emails)
except Exception as e:
pass
with open('test.csv', 'w') as csv_file:
csv_writer = csv.writer(csv_file)
for line in name_lst:
csv_writer.writerow(line)
for line in position_lst:
csv_writer.writerow(line)
for line in email_lst:
csv_writer.writerow(line)
Writing your data column-by-column is easy. All you have to do is write the rows where each row contains elements of the 3 tables with the same list index. Here is the code:
with open('test.csv', 'w') as csv_file:
csv_writer = csv.writer(csv_file)
for name, position, email in zip(name_lst, position_lst, email_lst):
csv_writer.writerow([name, position, email])
Assuming that the name_lst, position_lst and email_lst are all correct and are of the same size, Your problem is in the last part of your code where you write it to a CSV file.
Here is a way to do this:
fieldnames = ['Name', 'Position', 'Email']
with open('Data_to_Csv.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for i in range(len(name_lst)):
writer.writerow({'Name':name_lst[i],'Position':position_lst[i], 'Email':email_lst[i]})
This would of course fail if you are the length of the lists are unequal. You need to make sure that you are adding dummy values for entries that are not available to make sure that 3 lists have equal number of values.
Related
I am very new to python.
I have a list of stock names in a csv. I extract the names and put it before a website domain to create urls. I am trying to write the urls I created into another csv, but it only writes the last one out of the list. I want it to write all of the url into the csv.
with open('names.csv', 'r') as datafile:
for line in datafile:
domain = f'https://ceo.ca/{line}'
urls_link = (domain.strip())
print(urls_link)
y = open("url.csv","w")
y.writelines(urls_link)
y.close()
names.csv: https://i.stack.imgur.com/WrrLw.png
url.csv: https://i.stack.imgur.com/BYEgN.png
I would want the url csv look like this:
https://i.stack.imgur.com/y4xre.png
I apologise if I worded some things horribly.
You can use csv module in python
Try using this code:
from csv import writer,reader
in_FILE = "names.csv"
out_FILE = 'url.csv'
urls = list()
with open(in_FILE, 'r') as infile:
read = reader(infile, delimiter=",")
for domain_row in read:
for domain in domain_row:
url = f'https://ceo.ca/{domain.strip()}'
urls.append(url)
with open(out_FILE, 'w') as outfile:
write = writer(outfile)
for url in urls:
write.writerow([url])
How do I write two columns in my CSV file? The first should have data[0] and the second should have data[1]
with open('list_of_courses.csv', 'w', newline='', delimiter=',') as f:
thewriter = csv.writer(f)
for dept_courses in dept_url_dict.values():
newpage = requests.get("https://bulletin.temple.edu"+dept_courses)
courses = BeautifulSoup(newpage.content, 'html.parser')
courselist = courses.select('p.courseblocktitle')
print(dept_courses)
for c in courselist:
string = c.text
data = string.split(".")
thewriter.writerow(data[0]+","+data[1])
I want the CSV file to have two columns, but it currently has a column for each character.
Try this...
with open('list_of_courses.csv', 'w', newline='', delimiter=',') as f:
thewriter = csv.writer(f)
for dept_courses in dept_url_dict.values():
newpage=requests.get("https://bulletin.temple.edu"+dept_courses)
courses = BeautifulSoup(newpage.content, 'html.parser')
courselist = courses.select('p.courseblocktitle') print(dept_courses)
for c in courselist:
string = c.text
data = string.split(".")
thewriter.writerow([data[0],data[1]])
You should be passing a list to thewriter, not a string.
I want this output written via CSV
['https://www.lendingclub.com/loans/personal-loans' '6.16% to 35.89%']
['https://www.lendingclub.com/loans/personal-loans' '1% to 6%']
['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%']
['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%']
['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%']
['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%']
['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%']
['https://www.discover.com/personal-loans/' '6.99% to 24.99%']
However when I run the code to write the output to CSV I only get the last line written to the CSV file:
['https://www.discover.com/personal-loans/' '6.99% to 24.99%']
Could it be because my printed output is not comma separated? I attempted to circumvent having to put a comma in there by using a space as the delimiter. Let me know your thoughts. Would love some help on this because I am having the hardest time reshaping this collected data.
plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
'https://www.marcus.com/us/en/personal-loans',
'https://www.discover.com/personal-loans/']
#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
cdate = datetime.date.today()
l = r.get(link)
l.encoding = 'utf-8'
data = l.text
soup = bs(data, 'html.parser')
#captures Discover's rate perfectly but catches too much for lightstream/prosper
paragraph = soup.find_all(text=re.compile('[0-9]%'))
for n in paragraph:
matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
try:
irate = str(matches[0])
array = np.asarray(irate)
array2 = np.append(link,irate)
array2 = np.asarray(array2)
print(array2)
#with open('test.csv', "w") as csv_file:
# writer = csv.writer(csv_file, delimiter=' ')
# for line in test:
# writer.writerow(line)
except IndexError:
pass
When it comes to using csv file, pandas comes handy.
import datetime
import requests as r
from bs4 import BeautifulSoup as bs
import numpy as np
import regex as re
import pandas as pd
plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
'https://www.marcus.com/us/en/personal-loans',
'https://www.discover.com/personal-loans/']
df = pd.DataFrame({'Link':[],'APR Rate':[]})
#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
cdate = datetime.date.today()
l = r.get(link)
l.encoding = 'utf-8'
data = l.text
soup = bs(data, 'html.parser')
#captures Discover's rate perfectly but catches too much for lightstream/prosper
paragraph = soup.find_all(text=re.compile('[0-9]%'))
for n in paragraph:
matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
irate = ''
try:
irate = str(matches[0])
df2 = pd.DataFrame({'Link':[link],'APR Rate':[irate]})
df = pd.concat([df,df2],join="inner")
except IndexError:
pass
df.to_csv('CSV_File.csv',index=False)
I have stored each link and it's irate value in a data frame df2 and I concatenate it to parent data frame df.
At the end, I write parent data frame df to a csv file.
I think the problem is that you are opening the file in write-mode (the "w" in open('test.csv', "w")), meaning that Python overwrites what's already written in the file. I think you're looking for append-mode:
# open the file before the loop, and close it after
csv_file = open("test.csv", 'a') # change the 'w' to an 'a'
csv_file.truncate(0) # clear the contents of the file
writer = csv.writer(csv_file, delimiter=' ') # make the writer beforehand for efficiency
for n in paragraph:
matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
try:
irate = str(matches[0])
array = np.asarray(irate)
array2 = np.append(link,irate)
array2 = np.asarray(array2)
print(array2)
for line in test:
writer.writerow(line)
except IndexError:
pass
# close the file
csv_file.close()
If this doesn't work, please let me know!
Sorry if this has been asked, but is it possible to skip a column when writing to a csv file?
Here is the code I have:
with open("list.csv","r") as f:
reader2 = csv.reader(f)
for row in reader2:
url = 'http://peopleus.intelius.com/results.php?ReportType=33&qi=0&qk=10&qp='+row
req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
html = response.read()
retrieved_name = b'class="singleName">(.*?)<\/h1'
retrieved_number = b'<div\sclass="phone">(.*?)<\/div'
retrieved_nothing = b"(Sorry\swe\scouldn\\'t\sfind\sany\sresults)"
if re.search(retrieved_nothing,html):
noth = re.search(retrieved_nothing.decode('utf-8'),html.decode('utf-8')).group(1)
add_list(phone_data, noth)
else:
if re.search(retrieved_name,html):
name_found = re.search(retrieved_name.decode('utf-8'),html.decode('utf-8')).group(1)
else:
name_found = "No name found on peopleus.intelius.com"
if re.search(retrieved_number,html):
number_found = re.search(retrieved_number.decode('utf-8'),html.decode('utf-8')).group(1)
else:
number_found = "No number found on peopleus.intelius.com"
add_list(phone_data, name_found, number_found)
with open('column_skip.csv','a+', newline='') as mess:
writ = csv.writer(mess, dialect='excel')
writ.writerow(phone_data[-1])
time.sleep(10)
Assuming that there is data in the first three rows of column_skip.csv, can I have my program start writing its info in column 4?
Yeah, don't use csv.writer method and write it as an simple file write operation:
`file_path ='your_csv_file.csv'
with open(file_path, 'w') as fp:
#following are the data you want to write to csv
fp.write("%s, %s, %s" % ('Name of col1', 'col2', 'col4'))
fp.write("\n")`
I hope this helps...
I'm attempting to get a series of weather reports from a website, I have the below code which creates the needed URLs for the XMLs I want, what would be the best way to save the returned XMLs with different names?
with open('file.csv') as csvfile:
towns_csv = csv.reader(csvfile, dialect='excel')
for rows in towns_csv:
x = float(rows[2])
y = float(rows[1])
url = ("http://api.met.no/weatherapi/locationforecast/1.9/?")
lat = "lat="+format(y)
lon = "lon="+format(x)
text = url + format(lat) + ";" + format(lon)
I have been saving single XMls with this code;
response = requests.get(text)
xml_text=response.text
winds= bs4.BeautifulSoup(xml_text, "xml")
f = open('test.xml', "w")
f.write(winds.prettify())
f.close()
The first column of the CSV file has city names on it, I would ideally like to use those names to save each XML file as it is created. I'm sure another for loop would do, I'm just not sure how to create it.
Any help would be great, thanks again stack.
You have done most of the work already. Just use rows[0] as your filename. Assuming rows[0] is 'mumbai', then rows[0]+'.xml' will give you 'mumbai.xml' as the filename. You might want to check if city names have spaces which need to be removed, etc.
with open('file.csv') as csvfile:
towns_csv = csv.reader(csvfile, dialect='excel')
for rows in towns_csv:
x = float(rows[2])
y = float(rows[1])
url = ("http://api.met.no/weatherapi/locationforecast/1.9/?")
lat = "lat="+format(y)
lon = "lon="+format(x)
text = url + format(lat) + ";" + format(lon)
response = requests.get(text)
xml_text=response.text
winds= bs4.BeautifulSoup(xml_text, "xml")
f = open(rows[0]+'.xml', "w")
f.write(winds.prettify())
f.close()