I'm new to Python and I'm trying to scrape some data and save them in a csv.
I'm trying to loop a csv with a list of URLs, read the data from each URL and write that information in another csv file
The following code is writing roughly half of the data in the cvs but is printing everything fine while it's writing
df_link = pd.read_csv('url_list')
with open('url_list.csv', newline='') as urls, open('output.csv', 'w', newline='') as output:
csv_urls = csv.reader(urls)
csv_output = csv.writer(output)
csv_output.writerow(['details','date'])
for link in df_link.iterrows():
url = link[1]['url']
browser.get(url)
soup = BeautifulSoup(browser.page_source)
csv_file = open('output.csv', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['details'])
details=[i.text for i in soup.find_all(class_='product-info-content-
block product-info')]
print('details :', details)
dt = date.today()
print('date :', dt)
csv_output.writerow([str(details).strip('[]'), dt])
csv_file.close()
Everything is being printed fine when the code is running, but not all the rows of data are being written in the output csv.
I hope someone can help.
Thank you!
It looks like you are opening output.csv twice, once in the beginning and then in the for loop. Since you are opening with the option w like csv_file = open('output.csv', 'w') it will overwrite the file every loop.
So if you move the below part out of the loop it might work better
csv_file = open('output.csv', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['details'])
Related
i am having troubles on this python error.
I want to save changing variables to an csv file, however while the code runs again with an different variable it overwrites the previous one. I do not have the variables predetermined, they are generated while the code runs, so every time the loop will loop the program there will a different email passed.
Here is my code:
import csv
def hello(hme):
header = ['email']
data = [hme]
with open('countries.csv', 'w', encoding='UTF8', newline='') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(data)
hello(["test#icloud.com"])
Thank you!
you should open the file as append, instead of write:
'a' instead of 'w'
import csv
def hello(hme):
header = ['email']
data = [hme]
with open('countries.csv', 'a', encoding='UTF8', newline='') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(data)
hello(["test#icloud.com"])
Just replace 'w' by 'a' where 'w' writes in file (override) while 'a' appends the file whenever you write in it.
with open('countries.csv', 'a', encoding='UTF8', newline='') as f:
For the header "email" just write it before you add the loop of emails to do not duplicate it
Read the file contents first; add the new data; write the data to a file.
def hello(hme):
try:
with open('countries.csv', encoding='UTF8', newline='') as f:
stuff = list(csv.reader(f))
except FileNotFoundError:
# this must be the first time the function was called
stuff = [['email']]
stuff.append([hme])
with open('countries.csv', 'w', encoding='UTF8', newline='') as f:
writer = csv.writer(f)
writer.writerows(stuff)
If your file really only has one column you don't really need to use the csv module. Just append the new line to the file.
# assumes header is present
def hello(hme):
with open('countries.csv', 'a', encoding='UTF8') as f:
f.write(hme + '\n')
There are two CSV files. I need to convert to JSON. Code is below
import csv
import json
import os
import glob
os.chdir(r'C:\Users\user\Desktop\test' )
result = glob.glob( '*.csv' )
print (result)
def make_json(csvFile, jsonFile):
csvFile, jsonFile = '',''
for i in result:
data = {}
with open(csvFile, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
key = rows['id']
data[key] = rows
with open(jsonFile, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
csvFilePath =f"{i}"
jsonFilePath =f"{i.split('.')[-2]}.json"
make_json(csvFile, jsonFile)
I got error > csvFile is not mentioned. But the third line from the end mentions the CSV file.
Disclaimer. Please find the error in the code. I already know of the working code which is in pandas
Below is the correct code, but I would recommend you learn to use the python debugger so you can resolve any logic flaws in your code next time. Documentation on the python debugger can be found here:
https://docs.python.org/3/library/pdb.html
Your code was structured in a way that meant for each csv file, you were not setting the file name until after you attempted to open it. The immediate error you saw was caused because you tried to call make_json() before you defined the values for csvFile and jsonFile.
I would recommend changing the code to:
import csv
import json
import glob
def make_json(csvList):
for csvFile in csvList:
data = {}
with open(csvFile, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
key = rows['id']
data[key] = rows
jsonFile =f"{csvFile.split('.')[-2]}.json"
with open(jsonFile, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
make_json(glob.glob('*.csv'))
You should try this
import csv, json, os, glob
os.chdir(r'C:\Users\user\Desktop\test' )
result = glob.glob( '*.csv' )
print(result)
def make_json():
for i in result:
with open(i, encoding='utf-8') as csvf:
data = [row for row in csv.DictReader(csvf)]
with open(f"{i.split('.')[-2]}.json", 'w', encoding='utf-8') as jsonf:
json.dump(data, jsonf)
make_json()
You did not initialize both the arguments of make_json() - (csvFilePath & jsonFilePath)
I am trying to attempt something that I have not before within python.
The code below collects data from my test database and put it into a text under my headers of 'Test1','Test2','Test3'. This is working fine.
What I am trying to attempt now is to add a header (on top of the current header) and footer to the file.
python code:
file = 'file.txt'
header_names = {'t1':'Test1', 't2': 'Test2','t3':'Test3'}
with open(file, 'w', newline='') as f:
w = csv.DictWriter(f, fieldnames=header_names.keys(), restval='', extrasaction='ignore')
w.writerow(header_names)
for doc in res['test']['test']:
my_dict = doc['test']
w.writerow(my_dict)
current file output using the above code.
file.txt
Test1,Test2,Test3
Bob,john,Male
Cat,Long,female
Dog,Short,Male
Case,Fast,Male
Nice,who,Male
ideal txt output.
{header}
Filename:file.txt
date:
{data}
Test1,Test2,Test3
Bob,john,Male
Cat,Long,female
Dog,Short,Male
Case,Fast,Male
Nice,who,Male
{Footer}
this file was generated by using python.
the {header}, {data} and {footer} is not needed within the file that is just to make clear what is needed. i hope this makes sense.
Something like this
import csv
from datetime import date
# prepare some sample data
data = [['Bob', 'John', 'Male'],
['Cat', 'Long', 'Female']]
fieldnames = ['test1', 'test2', 'test3']
data = [dict(zip(fieldnames, row)) for row in data]
# actual part that writes to a file
with open('spam.txt', 'w', newline='') as f:
f.write('filename:spam.txt\n')
f.write(f'date:{date.today().strftime("%Y%m%d")}\n\n')
wrtr = csv.DictWriter(f, fieldnames = fieldnames)
wrtr.writeheader()
wrtr.writerows(data)
f.write('\nwritten with python\n')
Output in the file:
filename:spam.txt
date:20190321
test1,test2,test3
Bob,John,Male
Cat,Long,Female
written with python
Now, all that said, do you really need to write header and footer. It will just break a nicely formatted csv file and would require extra effort later on when reading it.
Or if you prefer - is the csv format what best suits your needs? Maybe using json would be better...
vardate= datetime.datetime.now().strftime("%x")
file = 'file.txt'
header_names = {'t1':'Test1', 't2': 'Test2','t3':'Test3'}
with open(file, 'w', newline='') as f:
f.seek(0,0) //This will move cursor to start position of file
f.writelines("File Name: ", file)
f.writelines("date: ", vardate)
f.writelines(".Try out next..")
w = csv.DictWriter(f, fieldnames=header_names.keys(), restval='',
extrasaction='ignore')
w.writerow(header_names)
for doc in res['test']['test']:
my_dict = doc['test']
w.writerow(my_dict)
f.seek(0,2)
f.writelines("This is generated using Python")
import urllib
import re
import csv
player_code = open("Desktop/OHL PYTHON/test2.txt").read()
player_code = player_code.split("\r")
for pc in player_code:
htmlfile = urllib.urlopen( "http://www.eliteprospects.com/iframe_player_stats_small.php?player="+pc+"")
htmltext = htmlfile.read()
regex = '<font color="#000099">(.+?)</font>'
pattern = re.compile(regex)
team = re.findall(pattern,htmltext)
data = pc, team
with open('my_games.csv', 'w') as csvfile:
fieldnames = ['pc', 'team','League', 'Gp', 'G','A','P','Pims']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames,delimiter= ":",
extrasaction ='ignore')
i=0
writer.writeheader()
for pc in player_code:
writer.writerow({'pc':[pc],'team':[team]})
i+=1
This is only returning one line of data over and over. Any direction would be helpful! Thank you.
You should open file before the for loop or use a+ mode. w opens the file and truncate its data each time it is opened.
with open('my_games.csv', 'w') as csvfile:
for pc in player_code:
or
with open('my_games.csv', 'a+') as csvfile
Opening file once would be a better approach.
Since you are looping twice for writing data you have multiple lines. Just remove
i=0
for pc in player_code:
writer.writerow({'pc':[pc],'team':[team]})
i+=1
And instead just have and i think it will work.
writer.writerow({'pc':[pc],'team':[team]})
I have a CSV file which has certain columns which I need to extract. One of those columns is a text string from which I need to extract the first and last items. I have a print statement in a for loop which get exactly what I need but cannot figure out how to either get that data into a list or dict. Not sure which is the best to use.
Code so far:
f1 = open ("report.csv","r") # open input file for reading
users_dict = {}
with open('out.csv', 'wb') as f: # output csv file
writer = csv.writer(f)
with open('report.csv','r') as csvfile: # input csv file
reader = csv.DictReader(csvfile, delimiter=',')
for row in reader:
print row['User Name'],row['Address'].split(',')[0],row['Last Login DateTime'],row['Address'].split(',')[7]
users_dict.update(row)
#users_list.append(row['Address'].split(','))
#users_list.append(row['Last Login DateTime'])
#users_list.append(row[5].split(',')[7])
print users_dict
f1.close()
Input from file:
User Name,Display Name,Login Name,Role,Last Login DateTime,Address,Application,AAA,Exchange,Comment
SUPPORT,SUPPORT,SUPPORT,124,2015-05-29 14:32:26,"Test Company,Bond St,London,London,1111 111,GB,test#test.com,IS",,,LSE,
Output on print:
SUPPORT Test Company 2015-05-29 14:32:26 IS
Using this code, I've got the line you need:
import csv
f1 = open ("report.csv","r") # open input file for reading
users_dict = {}
with open('out.csv', 'wb') as f: # output csv file
writer = csv.writer(f)
with open('report.csv','r') as csvfile: # input csv file
reader = csv.DictReader(csvfile, delimiter=',')
for row in reader:
print row['User Name'],row['Address'].split(',')[0],row['Last Login DateTime'],row['Address'].split(',')[7]
users_dict.update(row)
#users_list.append(row['Address'].split(','))
#users_list.append(row['Last Login DateTime'])
#users_list.append(row[5].split(',')[7])
print users_dict
f1.close()
The only changes:
Including the import csv at the top.
Indenting the code just after the with open('out.csv' ......
Does this solve your problem?
With some testing I finally get the line to write the csv file:
for row in reader:
writer.writerow([row['User Name'],row['Address'].split(',')[0],row['Last Login DateTime'],row['Address'].split(',')[7]])