This may be considered as second part of question Finding an element within an element using Selenium Webdriver.
What Im doing here is, after extracting each text from the table, writing it into csv file
Here is the code:
from selenium import webdriver
import os
import csv
chromeDriver = "/home/manoj/workspace2/RedTools/test/chromedriver"
os.environ["webdriver.chrome.driver"] = chromeDriver
driver = webdriver.Chrome(chromeDriver)
driver.get("https://www.betfair.com/exchange/football/coupon?id=2")
list2 = driver.find_elements_by_xpath('//*[#data-sportid="1"]')
couponlist = []
finallist = []
for game in list2[1:]:
coup = game.find_element_by_css_selector('span.home-team').text
print(coup)
couponlist.append(coup)
print(couponlist)
print('its done')
outfile = open("./footballcoupons.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Games"])
writer.writerows(couponlist)
Results of 3 print statements:
Santos Laguna
CSMS Iasi
AGF
Besiktas
Malmo FF
Sirius
FCSB
Eibar
Newcastle
Pescara
[u'Santos Laguna', u'CSMS Iasi', u'AGF', u'Besiktas', u'Malmo FF', u'Sirius', u'FCSB', u'Eibar', u'Newcastle', u'Pescara']
its done
Now, You can notice the code where i write these values into csv. But I end up writing it weirdly into csv. please see the snapshot. Can someone help me to fix this please?
According to the documentation, writerows takes as parameter a list of rows, and
A row must be an iterable of strings or numbers for Writer objects
You are passing a list of strings, so writerows iterates over your strings, making a row out of each character.
You could use a loop:
for team in couponlist:
writer.writerow([team])
or turn your list into a list of lists, then use writerows :
couponlist = [[team] for team in couponlist]
writer.writerows(couponlist)
But anyway, there's no need to use csv if you only have one column...
Related
So, here is the deal: I have this code bellow and it produces multiples results, how do i put all this results in a single document? I was wondering if it was possible to make all of this a list of links. It's comming this way
['http://acervo.estadao.com.br/pagina/#!/20171101-45305-nac-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20171004-45277-spo-1-pri-a1-not/busca/Minist%C3%A9rio', 'http://acervo.estadao.com.br/pagina/#!/20171004-45277-nac-1-pri-a1-not/busca/Minist%C3%A9rio', 'http://acervo.estadao.com.br/pagina/#!/20171109-45313-nac-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20171219-45353-nac-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20171122-45326-spo-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20171122-45326-nac-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20171229-45363-spo-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20171229-45363-nac-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20180105-45370-nac-1-pri-a1-not/busca/minist%C3%A9rio']
['http://acervo.estadao.com.br/pagina/#!/20180202-45398-spo-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20180202-45398-nac-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20180131-45396-spo-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20100702-42626-spo-1-pri-a1-not/busca/Ministro', 'http://acervo.estadao.com.br/pagina/#!/20101202-42779-spo-1-pri-a1-not/busca/Minist%C3%A9rio', 'http://acervo.estadao.com.br/pagina/#!/20101220-42797-spo-1-pri-a1-not/busca/Minist%C3%A9rio', 'http://acervo.estadao.com.br/pagina/#!/20100904-42690-spo-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20101102-42749-spo-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20100514-42577-nac-1-pri-a1-not/busca/ministro', 'http://acervo.estadao.com.br/pagina/#!/20100915-42701-spo-1-pri-a1-not/busca/Minist%C3%A9rio']
But i wanted something like a list, like this:
http://acervo.estadao.com.br/pagina/#!/20171101-45305-nac-1-pri-a1-not/busca/ministro
http://acervo.estadao.com.br/pagina/#!/20180202-45398-spo-1-pri-a1-not/busca/ministro
http://acervo.estadao.com.br/pagina/#!/20180131-45396-spo-1-pri-a1-not/busca/ministro
http://acervo.estadao.com.br/pagina/#!/20171101-45305-nac-1-pri-a1-not/busca/ministro
A bunch of links in the order they were get in a .txt document. I have no idea how to start (i'm a newbie in programming).
opts = Options()
opts.add_argument("user-agent=Mozilla/5.0")
driver = webdriver.Chrome(chrome_options=opts)
x = 1
driver.get("http://acervo.estadao.com.br/procura/#!/ministro%3B minist%C3%A9rio|||/Acervo/capa//1/2000|2010|2010///Primeira")
time.sleep(5)
page_number = driver.find_element_by_class_name("page-ultima-qtd").text
for i in range(int(page_number)):
link = ("http://acervo.estadao.com.br/procura/#!/ministro%3B minist%C3%A9rio|||/Acervo/capa//{}/2000|2010|2010///Primeira").format(x)
#driver.get(link)
links = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.LINK_TEXT, "LEIA ESTA EDIÇÃO")))
references = [link.get_attribute("href") for link in links]
driver.find_element_by_class_name("seta-right").click()
time.sleep(1)
print(references)
x = x + 1
#print(x)
print(i)
import csv
list1 = ['a','b','c']
list2 = ['a','b','c']
#if your output your getting is lists you could put them all into one list first
master = list1 + list2
#concatenated lists
print(master)
#then simply send to file
with open("filenames.csv", 'w') as f:
wr = csv.writer(f, lineterminator='\n')
for row in master:
wr.writerow([row])
Simplest solution: format your references list before printing, ie
# print(references)
print("\n".join(references))
or print them one by one (might be a bit longer but well):
# print(references)
for ref in references:
print(ref)
and then use your OS redirections to redirect the output to a file (linux example):
$ python yourscript.py > myurls.txt
I'm having trouble getting anything to write in my outut file (word_count.txt).
I expect the script to review all 500 phrases in my phrases.txt document, and output a list of all the words and how many times they appear.
from re import findall,sub
from os import listdir
from collections import Counter
# path to folder containg all the files
str_dir_folder = '../data'
# name and location of output file
str_output_file = '../data/word_count.txt'
# the list where all the words will be placed
list_file_data = '../data/phrases.txt'
# loop through all the files in the directory
for str_each_file in listdir(str_dir_folder):
if str_each_file.endswith('data'):
# open file and read
with open(str_dir_folder+str_each_file,'r') as file_r_data:
str_file_data = file_r_data.read()
# add data to list
list_file_data.append(str_file_data)
# clean all the data so that we don't have all the nasty bits in it
str_full_data = ' '.join(list_file_data)
str_clean1 = sub('t','',str_full_data)
str_clean_data = sub('n',' ',str_clean1)
# find all the words and put them into a list
list_all_words = findall('w+',str_clean_data)
# dictionary with all the times a word has been used
dict_word_count = Counter(list_all_words)
# put data in a list, ready for output file
list_output_data = []
for str_each_item in dict_word_count:
str_word = str_each_item
int_freq = dict_word_count[str_each_item]
str_out_line = '"%s",%d' % (str_word,int_freq)
# populates output list
list_output_data.append(str_out_line)
# create output file, write data, close it
file_w_output = open(str_output_file,'w')
file_w_output.write('n'.join(list_output_data))
file_w_output.close()
Any help would be great (especially if I'm able to actually output 'single' words within the output list.
thanks very much.
Would be helpful if we got more information such as what you've tried and what sorts of error messages you received. As kaveh commented above, this code has some major indentation issues. Once I got around those, there were a number of other logic errors to work through. I've made some assumptions:
list_file_data is assigned to '../data/phrases.txt' but there is then a
loop through all file in a directory. Since you don't have any handling for
multiple files elsewhere, I've removed that logic and referenced the
file listed in list_file_data (and added a small bit of error
handling). If you do want to walk through a directory, I'd suggest
using os.walk() (http://www.tutorialspoint.com/python/os_walk.htm)
You named your file 'pharses.txt' but then check for if the files
that endswith 'data'. I've removed this logic.
You've placed the data set into a list when findall works just fine with strings and ignores special characters that you've manually removed. Test here:
https://regex101.com/ to make sure.
Changed 'w+' to '\w+' - check out the above link
Converting to a list outside of the output loop isn't necessary - your dict_word_count is a Counter object which has an 'iteritems' method to roll through each key and value. Also changed the variable name to 'counter_word_count' to be slightly more accurate.
Instead of manually generating csv's, I've imported csv and utilized the writerow method (and quoting options)
Code below, hope this helps:
import csv
import os
from collections import Counter
from re import findall,sub
# name and location of output file
str_output_file = '../data/word_count.txt'
# the list where all the words will be placed
list_file_data = '../data/phrases.txt'
if not os.path.exists(list_file_data):
raise OSError('File {} does not exist.'.format(list_file_data))
with open(list_file_data, 'r') as file_r_data:
str_file_data = file_r_data.read()
# find all the words and put them into a list
list_all_words = findall('\w+',str_file_data)
# dictionary with all the times a word has been used
counter_word_count = Counter(list_all_words)
with open(str_output_file, 'w') as output_file:
fieldnames = ['word', 'freq']
writer = csv.writer(output_file, quoting=csv.QUOTE_ALL)
writer.writerow(fieldnames)
for key, value in counter_word_count.iteritems():
output_row = [key, value]
writer.writerow(output_row)
Something like this?
from collections import Counter
from glob import glob
def extract_words_from_line(s):
# make this as complicated as you want for extracting words from a line
return s.strip().split()
tally = sum(
(Counter(extract_words_from_line(line))
for infile in glob('../data/*.data')
for line in open(infile)),
Counter())
for k in sorted(tally, key=tally.get, reverse=True):
print k, tally[k]
I have written a python web scraper and would like to output the data strings I have gotten into a csv/excel file. So far, I have a for loop that accesses multiple database websites and stores the data in a string. I would like to pop out these strings each time I complete the web scraping before moving onto the next page.
Someone suggested to create a whole repository of them or a dictionary and then reference it. I tried implementing it, but my code instead returns me a the data in one cell instead of spanning multiple cells because I have a header at the top that separates the data into my desired attributes.
Substances = []
Whole_list = []
f = open(filename) # chemtest.txt
for sub in f:
Substances.append(sub)
print sub
for substance in Substances:
#some logic
names1 = [data ]
Whole_list.append(names1)
with open('chemtest.csv', 'wb') as myfile: #creates new chemtest.csv
wr = csv.writer(myfile)
wr.writerow(Whole_list)
So far I'm running through 2 websites as a test and my outputs are:
names1 = ['Acetaldehyde', 'Acetaldehyde', '75-07-0', 'GO1N1ZPR3B', 'CC=O']
Whole_list = [['Acetaldehyde', 'Acetaldehyde', '75-07-0', 'GO1N1ZPR3B', 'CC=O']]
names1 = ['Acetone', 'Acetone', '67-64-1', '1364PS73AF', '=O']
Whole_list = [['Acetaldehyde', 'Acetaldehyde', '75-07-0', 'GO1N1ZPR3B', 'CC=O'], ['Acetone', 'Acetone', '67-64-1', '1364PS73AF', '=O']]
What is wrong with my method exactly and how can I improve it?
Use writerows (note the s at the end). writerow is for writing one line at a time.
wr.writerows(Whole_list)
As a side note, capitalized variable names are usually reserved to classes, so prefer whole_list.
I have two lists in Python: 'away' and 'home'. I want to append them to an already existing csv file such that I write a row solely of 1st element of away, then 1st element of home, then the 2nd element of away, then the 2nd element of home,...etc with empty spaces in between them, so it will be like this:
away1
home1
away2
home2
away3
home3
and so on and so on. The size of the away and home lists is the same, but might change day to day. How can I do this?
Thanks
Looks like you just want the useful and flexible zip built-in.
>>> away = ["away1", "away2", "away3"]
>>> home = ["home1", "home2", "home3"]
>>> list(zip(away, home))
[('away1', 'home1'), ('away2', 'home2'), ('away3', 'home3')]
import csv
away = ["away1", "away2", "away3"]
home = ["home1", "home2", "home3"]
record_list = [ list(item) for item in list(zip(away, home)) ]
print record_list
with open("sample.csv", "a") as fp:
writer = csv.writer(fp)
writer.writerows(record_list)
# record_list = [['away1', 'home1'], ['away2', 'home2'], ['away3', 'home3']]
You should use writerows method to write multiple list at a time to each row.
I’m trying to split downloaded data to an 2D array into different datatypes. The downloaded data looks like this:
000|17:40
000|17:45
010|17:50
025|17:55
056|18:00
178|18:05
202|18:10
203|18:15
190|18:20
072|18:25
013|18:30
002|18:35
000|18:40
000|18:45
000|18:50
000|18:55
000|19:00
000|19:05
000|19:10
000|19:15
000|19:20
000|19:25
000|19:30
000|19:35
000|19:40
I’m using the following code to parse this into a two dimensional array:
#!/usr/bin/python
import urllib2
response = urllib2.urlopen('http://gps.buienradar.nl/getrr.php?lat=52&lon=4')
html = response.read()
htmlsplit = []
for record in html.split("\r\n"):
htmlsplit.append(record.split("|"))
print htmlsplit
This is working great, but as expected, it treats it as a string. I’ve found some examples that splits into integers. That’s great if both sides where integers. But in my case it’s an integer | string (or maybe some kind of Python time format)
How can I split this directly into different data types?
Something like this?
for record in html.split("\r\n"): # beware, newlines are treacherous!
s = record.split("|")
htmlsplit.append((int(s[0]), s[1]))
Just write a parser for each record, if you have data this simple. However, I would add some try/except clause to catch errors for non-conforming lines, empty lines, etc. which may be present in the data. The code above is very fragile. Also, you might want to break at only \n and then clean your strings by strip() (i.e. replace s[1] by s[1].strip()). The integer conversion takes care of it automatically.
Use str.splitlines instead of splitting on \r\n
Use the csv module to iterate over the lines:
import csv
txt = '000|17:40\n000|17:45\n000|17:50\n000|17:55\n000|18:00\n000|18:05\n000|18:10\n000|18:15\n000|18:20\n000|18:25\n000|18:30\n000|18:35\n000|18:40\n000|18:45\n000|18:50\n000|18:55\n000|19:00\n000|19:05\n000|19:10\n000|19:15\n000|19:20\n000|19:25\n000|19:30\n000|19:35\n000|19:40\n'
reader = csv.reader(txt.splitlines(), delimiter='|')
column1 = []
column2 = []
for c1, c2 in reader:
column1.append(c1)
column2.append(c2)
You can also use the DictReader
import StringIO
reader2 = csv.DictReader(StringIO.StringIO(txt),
fieldnames=['int', 'time'],
delimiter='|')
column1 = []
column2 = []
for row in reader2:
column1.append(row['time'])
column2.append(row['int'])