tldr;
When writing to csv, how do I ensure the last element of a previous write is separated from the first element of a subsequent write by a comma and not a new line?
I am currently trying to collect followers data with a list of Twitter users using Tweepy. In the code below, you can see that I'm using pagination as some users have a lot of followers. I'm trying to put all the followers into a csv file for each user, however when I test this code and inspect the csv, I can see there's only a new line between page writes, but no commas. I do not want improper csv format to come back and bite me later in this project.
for page in tweepy.Cursor(api.followers_ids,screen_name=username).pages():
with open(f'output/{username}.csv', 'a') as outfile:
writer = csv.writer(outfile)
writer.writerow(page)
I've thought of enumerating the pages:
for i, page in enumerate(tweepy.Cursor(api.followers_ids,screen_name=username).pages()):
and doing something like if i > 0, add comma at end of the current file. This way feels inefficient, as I'd have to open, write a ',', and close the file each time this happens and I need every second I can save for this project.
Related
I have JSON data which I am pulling in via API.
here's my code
# list of each api url to use
link =[]
#for every id in the accounts , create a new url link into the link list
for id in accounts:
link.append('https://example.ie:0000/v123/accounts/'+id+'/users')
accountReq = []
for i in link:
accountReq.append(requests.get(i, headers=headers).json())
with open('masterSheet.txt', 'x') as f:
for each in accountReq:
account = each['data']
for data in account:
list=(data['username']+" "+" ",data['first_name'],data['last_name'])
f.write(str(list)+"\n")
This pulls in data no problem .
If I do
print(data['username']+" "+" ",data['first_name'],data['last_name'])
I get all of the data back , around 500lines.
However my problem I am having is when I try to write to my file , it writes about 8lines of data and then stops running with no errors.
I'm assuming its due to the data size. How can I fix my issue of not printing all of the data to the .txt file??
Are you trying to write each data point to the file? Your write function is outside the nested for loop, so you are actually only writing the last list variable that you create to the file.
You should move the f.write() under the for loop if you intend to write every single data point into the file.
for data in account:
list=(data['username']+" "+" ",data['first_name'],data['last_name'])
f.write(str(list)+"\n")
I'm new to Python and I need to perform the following two tasks in a .txt file which contains more than 500 lines with lots of information: dates, hours, comments, names, etc.
(1) Replace the substrings "p. m." and "a. m." to "PM" and "AM". (Already did)
(2) I need to save the output into another file since I need to keep the original one. (This is the main issue).
I'm familiar with the concepts of open, read and close. But I have not solved this task yet:
with open('Dates of arrival.txt','r', encoding='utf-8') as file:
filedata = file.read()
filedata.replace("p.\xa0m.", "PM").replace("a.\xa0m.", "AM") # This output is the one I want to save as a .txt file.
I know I have to open another file to contain the information, but the file 'dates of arrival1.txt' is empty.
with open('dates of arrival1.txt', 'w') as wf:
wf.write(file) # I am not sure if file is the correct word to put there.
So, the main problem is how to nest these two codes into one in order to perform the tasks (1) and (2) and save the output into a .txt file. It may not be as difficult as I think but I need a little help on this one.
Thanks for the help.
Assuming you're happy with your string replace statements
Code can be simplified to the following:
with open('Dates of arrival.txt','r', encoding='utf-8') as file, open('dates of arrival1.txt', 'w') as wf:
wf.write(file.read().replace("p.\xa0m.", "PM").replace("a.\xa0m.", "AM"))
My website just launched a new simple component that contains a series of links. Every 24 hours, the links update/change based on an algorithm. I'm wanting to see how long a particular link stays in the component (because, based on the algorithm, sometimes a particular link may stay in the component for multiple days, or sometimes maybe it will be present for just one day).
I'm working on building a Python crawler to crawl the frontend of the website where this new component is present, and I want to have a simple output likely in a CSV file with two columns:
Column 1: URL (the URL that was found within the component)
Column 2: #/days seen (The number of times the Python crawler saw that URL. If it crawls every day, this could be simply thought of as the #/days the crawler has seen that particular URL. So this number would be updated every time the crawler runs. Or, if it was the first time a particular URL was seen, the URL would simply be added to the bottom of the list with a "1" in this column)
How can this be achieved from an output perspective? I'm pretty new to Python, but I'm pretty sure I've got the crawling part covered to identify the links. I'm just not sure how to accomplish the output part, especially as it will update daily, and I want to keep the historical data of how many times the link has been seen.
You need to learn how to webscrape, I suggest using the beautiful soup package for that.
You scraping script should then iterate over your csv file, incrementing the number on each url it finds, or adding a new one if its not found.
Put this script in a cron job, to run it once every 24 hours.
For 2 you can do something like this
from tempfile import NamedTemporaryFile
import shutil
import csv
links_found = [] # find the links here
filename = 'temp.csv'
tempfile = NamedTemporaryFile(delete=False)
with open("myfile.csv") as csv_file, tempfile:
reader = csv.reader(csv_file)
writer = csv.writer(tempfile)
# Increment exising
existing_links = []
writer.write_row(reader.next())
for row in reader:
link = row[0]
existing_links.append(link)
times = int(row[1])
if link in links_found:
row[1] = str(row[1]+1)
writer.write_row(row)
# Add new links
for link in links_found:
if link not in existing_links:
writer.write_row([link, 1])
shutil.move(tempfile.name, filename)
I have a text file which contains text in the first 20 or so lines, followed by CSV data. Some of the text in the text section contains commas and so trying csv.reader or csv.dictreader doesn't work well.
I want to skip past the text section and only then start to parse the CSV data.
Searches don't yield much other than instructions to either use csv.reader/csv.dictreader and iterate through the rows that are returned (which doesn't work because of the commas in the text), or to read the file line-by-line and split the lines using ',' as the delimiter.
The latter works up to a point, but it produces strings, not numbers. I could convert the strings to numbers but I'm hoping that there's a simple way to do this either with the csv or numpy libraries.
As requested - Sample data:
This is the first line. This is all just text to be skipped.
The first line doesn't always have a comma - maybe it's in the third line
Still no commas, or was there?
Yes, there was. And there it is again.
and so on
There are more lines but they finally stop when you get to
EndOfHeader
1,2,3,4,5
8,9,10,11,12
3, 6, 9, 12, 15
Thanks for the help.
Edit#2
A suggested answer gave the following link entitled Read file from line 2...
That's kind of what I'm looking for, but I want to be able to read through the lines until I find the "EndOfHeader" and then call on the CSV library to handle the remainder of the file.
The reply by saimadhu.polamuri is part of what I've tried, specifically
with open(filename , 'r') as f:
first_line = f.readline()
for line in f:
#test if line equals EndOfHeader. If true then parse as CSV
But that's where it comes apart - I can't see how to have CSV work with the data from this point forward.
With thanks to #Mike for the suggestion, the code is actually reasonably straightforward.
with open('data.csv') as f: # open the file
for i in range(7): # Loop over first 7 lines
str=f.readline() # just read them. Could also do f.next()
r = csv.reader(f, delimiter=',') # Now pass the file handle to a csv reader
for row in r: # and loop over the resulting rows
print(row) # Print the row. Or do something else.
In my actual code, it will search for the EndOfHeader line and use that to decide where to start parsing the CSV
I'm posting this as an answer, as the question that this one supposedly duplicates doesn't explicitly consider this issue of the file handle and how it can be passed to a CSV reader, and so it may help someone else.
Thanks to all who took time to help.
I am trying to write a script to automate browsing to my most commonly visited websites. I have put the websites into a list and am trying to open it using the webbrowser() module in Python. My code looks like the following at the moment:
import webbrowser
f = open("URLs", "r")
list = f.readline()
for line in list:
webbrowser.open_new_tab(list)
This only reads the first line from my file "URLs" and opens it in the browser. Could any one please help me understand how I can achieve reading through the entire file and also opening the URLs in different tabs?
Also other options that can help me achieve the same.
You have two main problems.
The first problem you have is that you are using readline and not readlines. readline will give you the first line in the file, while readlines gives you a list of your file contents.
Take this file as an example:
# urls.txt
http://www.google.com
http://www.imdb.com
Also, get in to the habit of using a context manager, as this will close the file for you once you have finished reading from it. Right now, even though for what you are doing, there is no real danger, you are leaving your file open.
Here is the information from the documentation on files. There is a mention about best practices with handling files and using with.
The second problem in your code is that, when you are iterating over list (which you should not use as a variable name, since it shadows the builtin list), you are passing list in to your webrowser call. This is definitely not what you are trying to do. You want to pass your iterator.
So, taking all this in to mind, your final solution will be:
import webbrowser
with open("urls.txt") as f:
for url in f:
webbrowser.open_new_tab(url.strip())
Note the strip that is called in order to ensure that newline characters are removed.
You're not reading the file properly. You're only reading the first line. Also, assuming you were reading the file properly, you're still trying to open list, which is incorrect. You should be trying to open line.
This should work for you:
import webbrowser
with open('file name goes here') as f:
all_urls = f.read().split('\n')
for each_url in all_urls:
webbrowser.open_new_tab(each_url)
My answer is assuming that you have the URLs 1 per line in the text file. If they are separated by spaces, simply change the line to all_urls = f.read().split(' '). If they're separated in another way just change the line to split accordingly.