Simple question: i've got this code:
i want to fetch a row with Dictreader from the csv package, every entry i wanto to cast it float and put it in the data array. At the end of the scanning i want to print the first 10 elements of the array. It gives me error of visibility on the array data.
with open(train, "r") as traincsv:
trainreader = csv.DictReader(traincsv)
for row in trainreader:
data = [float(row['Sales'])]
print(data[:10])
If i put the print inside the for like this
with open(train, "r") as traincsv:
trainreader = csv.DictReader(traincsv)
for row in trainreader:
data = [float(row['Sales'])]
print(data[:10])
It prints all the entries not just 10.
You are overwriting data every time in the for loop. This is the source of your problem.
Please upload an example input for me to try and I will, but I believe what is below will fix your problem, by appending to data instead of overwriting it.
Also, it is good practice to leave the with block as soon as possible.
# Open with block and leave immediately
with open(train, "r") as traincsv:
trainreader = csv.DictReader(traincsv)
# Declare data as a blank list before iterations
data =[]
# Iterate through all of trainreader
for row in trainreader:
data.append([float(row['Sales'])])
# Now it should print first 10 results
print(data[:10])
Ways of appending a list:
data = data + [float(row['Sales'])]
data += [float(row['Sales'])]
data.append([float(row['Sales'])]
Related
I have a code that generates characters from 000000000000 to ffffffffffff which are written to a file.
I'm trying to implement a check to see if the program was closed so that I can read from the file, let's say at 00000000781B, and continue for-loop from the file.
The Variable "attempt" in (for attempt in to_attempt:) has tuple type and always starting from zero.
Is it possible to continue the for-loop from the specified value?
import itertools
f = open("G:/empty/last.txt", "r")
lines = f.readlines()
rand_string = str(lines[0])
f.close()
letters = '0123456789ABCDEF'
print(rand_string)
for length in range(1, 20):
to_attempt = itertools.product(letters, repeat=length)
for attempt in to_attempt:
gen_string = rand_string[length:] + ''.join(attempt)
print(gen_string)
You have to store the value on a file to keep track of what value was last being read from. I'm assuming the main for loop running from 000000000000 to ffffffffffff is the to_attempt one. All you need store the value of the for loop in a file. You can use a new variable to keep track of it.
try:
with open('save.txt','r') as reader:
save = int(reader.read())
except FileNotFoundError:
save = 0
#rest of the code
for i in range(save,len(to_attempt)):
with open('save.txt','r') as writer:
writer.write(i)
#rest of the code
I am pulling 10 tweets from Twitter using tweepy and storing it in a CSV for later displaying it on the front-end webpage. My code refreshes every 60 minutes, and at certain times I get the "IndexError".
Following is the exact error:
return ks[5]
IndexError: List index out of range
Following is the function to retrieve the particular tweet from CSV:
def tweet6(self):
with codecs.open('HELLOTWITTER.csv', 'r', encoding='utf-8', errors='ignore') as f:
reader = csv.reader(f)
d = {}
for i, row in enumerate(reader):
d[row[0]]=row[1:]
if (i>=10):
break
ks=list(d)
return (ks[5])
This error occurs only at times but I am unable to figure out why it happens although the CSV has all the 10 tweets written into it every time the entire code is refreshed.
Also, if I run the code once more, the error disappears and the webpage loads without any issues with the tweets, surprisingly!
What am I missing?
Any help is much appreciated! Thanks!
As Ken White pointed out in the comments above. The error is caused by you trying to access an index that is outside the bounds of the list.
What is going on is that there is a blank row in your CSV file that python cannot process because you are calling index 0 even though it does not exist therefore python throws an exception.
In order to fix this error what you need to do is check if there are enough elements in the list to run your code. By using
if(len(row) < 1):
continue
Another place that could be causing problems is where you are taking the list d and putting it inside another list ks. Then you try to return the 5th object in the new list. However, there is no object because you now have a list that looks like this
ks={{tweet,tweetyouwant,tweet},{list,two,if,present}}
When you were expecting the list to look like this
ks={tweet,tweetyouwant,tweet}
In order to fix this simply get rid of ks=list(d) and call d wherever you would call ks
Your whole snippet should like this.
def tweet6(self):
with codecs.open('HELLOTWITTER.csv', 'r', encoding='utf-8', errors='ignore') as f:
reader = csv.reader(f)
d = {}
for i, row in enumerate(reader):
#Verify row is within range
if(len(row) < 1):
continue
#Get the rows values
d[row[0]]=row[1:]
#If past row 10 then break
if (i>=10):
break
#ks=list(d) #Not needed D is already a list
return (d[5]) #return the row of the 6th tweet
my code:
import csv
with open('file.csv', 'r') as information:
reader = csv.reader(information, delimiter=';')
for row in reader:
print('Highest score is: {} on {} by {}'.format(row[2], row[1], row[0]))
information in csv file:
Anton;12-05-2016;29
Douwe Bob;13-05-2016;44
Anton;11-05-2016;39
Douwe Bob;12-05-2016;55
Anton;10-05-2016;29
Douwe Bob;11-05-2016;69
When I run the program i'll get all lines printed without the max score. I try'd max(row[2]) but seems not to work, there must be somthing im doing wrong. Anyone that can help me out?
I only want to get the line printed with the max score row[2]
For those that start crying we're not going to make your homework stay away, i just want to improve my programming skills!
def get_score_from_row(row):
return float(row[-1])
max_score_row = max(csv.reader(information, delimiter=';'),key=get_score_from_row)
should give you the line with the max score
Here's how I worked this. I created a new list to store the numbers and then since the row is a list itself, accessed the number element location and stored it into the new list.
import csv
# create a new list to store only the numbers
my_list = []
with open('C:\\Users\\mnickey\\Documents\\names.txt', 'r') as information:
reader = csv.reader(information, delimiter=';')
for row in reader:
# used to assure that the row in the reader is a list
print(type(row))
# assuming that you know where the numbers are in the file
num = row[2]
# append this to the empty list
my_list.append(num)
print(max(my_list))
I hope this helps. Note that it's likely not as advanced or efficient as others might post but it should work.
I'm using openpyxl in python, and I'm trying to run through 50k lines and grab data from each row and place it into a file. However.. what I'm finding is it runs incredibely slow the farther I get into it. The first 1k lines goes super fast, less than a minute, but after that it takes longer and longer and longer to do the next 1k lines.
I was opening a .xlsx file. I wonder if it is faster to open a .txt file as a csv or something or to read a json file or something? Or to convert somehow to something that will read faster?
I have 20 unique values in a given column, and then values are random for each value. I'm trying to grab a string of the entire unique value column for each value.
Value1: 1243,345,34,124,
Value2: 1243,345,34,124,
etc, etc
I'm running through the Value list, seeing if the name exists in a file, if it does, then it will access that file and append to it the new value, if the file doesn't exist, it will create the file and then set it to append. I have a dictionary that has all the "append write file" things connected to it, so anytime I want to write something, it will grab the file name, and the append thing will be available in the dict, it will look it up and write to that file, so it doesn't keep opening new files everytime it runs.
The first 1k took less than a minute.. now I'm on 4k to 5k records, and it's running all ready 5 minutes.. it seems to take longer as it goes up in records, I wonder how to speed it up. It's not printing to the console at all.
writeFile = 1
theDict = {}
for row in ws.iter_rows(rowRange):
for cell in row:
#grabbing the value
theStringValueLocation = "B" + str(counter)
theValue = ws[theStringValueLocation].value
theName = cell.value
textfilename = theName + ".txt"
if os.path.isfile(textfilename):
listToAddTo = theDict[theName]
listToAddTo.write("," + theValue)
if counter == 1000:
print "1000"
st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
else:
writeFileName = open(textfilename, 'w')
writeFileName.write(theValue)
writeFileName = open(textfilename, 'a')
theDict[theName] = writeFileName
counter = counter + 1
I added some time stamps to the above code, it is not there, but you can see the output below. The problem I'm seeing is that it is going up higher and higher each 1k run. 2 minutes the firs ttime, thne 3 minutes, then 5 minutes, then 7 minutes. By the time it hits 50k, I'm worried it's going to be taking an hour or something and it will be taking too long.
1000
2016-02-25 15:15:08
20002016-02-25 15:17:07
30002016-02-25 15:20:52
2016-02-25 15:25:28
4000
2016-02-25 15:32:00
5000
2016-02-25 15:40:02
6000
2016-02-25 15:51:34
7000
2016-02-25 16:03:29
8000
2016-02-25 16:18:52
9000
2016-02-25 16:35:30
10000
Somethings I should make clear.. I don't know the names of the values ahead of time, maybe I should run through and grab those in a seperate python script to make this go faster?
Second, I need a string of all values seperated by comma, that's why I put it into a text file to grab later. I was thinking of doing it by a list as was suggested to me, but I'm wondering if that will have the same problem. I'm thinking the problem has to do with reading off excel. Anyway I can get a string out of it seperated by comma, I can do it another way.
Or maybe I could do try/catch instead of searching for the file everytime, and if there is an error, I can assume to create a new file? Maybe the lookup everytime is making it go really slow? the If the file exists?
this question is a continuation from my original here and I took some suggestions from there.... What is the fastest performance tuple for large data sets in python?
I think what you're trying to do is get a key out of column B of the row, and use that for the filename to append to. Let's speed it up a lot:
from collections import defaultdict
Value_entries = defaultdict(list) # dict of lists of row data
for row in ws.iter_rows(rowRange):
key = row[1].value
Value_entries[key].extend([cell.value for cell in row])
# All done. Now write files:
for key in Value_entries.keys():
with open(key + '.txt', 'w') as f:
f.write(','.join(Value_entries[key]))
It looks like you only want cells from the B-column. In this case you can use ws.get_squared_range() to restrict the number of cells to look at.
for row in ws.get_squared_range(min_col=2, max_col=2, min_row=1, max_row=ws.max_row):
for cell in row: # each row is always a sequence
filename = cell.value
if os.path.isfilename(filename):
…
It's not clear what's happening with the else branch of your code but you should probably be closing any files you open as soon as you have finished with them.
Based on the other question you linked to, and the code above, it appears you have a spreadsheet of name - value pairs. The name in in column A and the value is in column B. A name can appear multiple times in column A, and there can be a different value in column B each time. The goal is to create a list of all the values that show up for each name.
First, a few observations on the code above:
counter is never initialized. Presumably it is initialized to 1.
open(textfilename,...) is called twice without closing the file in between. Calling open allocates some memory to hold data related to operating on the file. The memory allocated for the first open call may not get freed until much later, maybe not until the program ends. It is better practice to close files when you are done with them (see using open as a context manager).
The looping logic isn't correct. Consider:
First iteration of inner loop:
for cell in row: # cell refers to A1
valueLocation = "B" + str(counter) # valueLocation is "B1"
value = ws[valueLocation].value # value gets contents of cell B1
name = cell.value # name gets contents of cell A1
textfilename = name + ".txt"
...
opens file with name based on contents of cell A1, and
writes value from cell B1 to the file
...
counter = counter + 1 # counter = 2
But each row has at least two cells, so on the second iteration of the inner loop:
for cell in row: # cell now refers to cell B1
valueLocation = "B" + str(counter) # valueLocation is "B2"
value = ws[valueLocation].value # value gets contents of cell B2
name = cell.value # name gets contents of cell B1
textfilename = name + ".txt"
...
opens file with name based on contents of cell "B1" <<<< wrong file
writes the value of cell "B2" to the file <<<< wrong value
...
counter = counter + 1 # counter = 3 when cell B1 is processed
Repeat for each of 50K rows. Depending on how many unique values are in column B, the program could be trying to have hundreds or thousands of open files (based on contents of cells A1, B1, A2, B2, ...) ==>> very slow or program crashes.
iter_rows() returns a tuple of the cells in the row.
As people suggested in the other question, use a dictionary and lists to store the values and write them all out at the end. Like so (Im using python 3.5, so you may have to adjust this if you are using 2.7)
Here is a straight forward solution:
from collections import defaultdict
data = defaultdict(list)
# gather the values into lists associated with each name
# data will look like { 'name1':['value1', 'value42', ...],
# 'name2':['value7', 'value23', ...],
# ...}
for row in ws.iter_rows():
name = row[0].value
value = row[1].value
data[name].append(value)
for key,valuelist in data.items():
# turn list of strings in to a long comma-separated string
# e.g., ['value1', 'value42', ...] => 'value1,value42, ...'
value = ",".join(valuelist)
with open(key + ".txt", "w") as f:
f.write(value)
I have a script which consumes an API of bus location, I am attempting to parse the lat/lng fields which are float objects. I am repeatedly receiving this error.
row.append(Decimal(items['longitude'].encode('utf-16')))
AttributeError: 'float' object has no attribute 'encode'
# IMPORTS
from decimal import *
#Make Python understand how to read things on the Internet
import urllib2
#Make Python understand the stuff in a page on the Internet is JSON
import simplejson as json
from decimal import Decimal
# Make Python understand csvs
import csv
# Make Python know how to take a break so we don't hammer API and exceed rate limit
from time import sleep
# tell computer where to put CSV
outfile_path='C:\Users\Geoffrey\Desktop\pycharm1.csv'
# open it up, the w means we will write to it
writer = csv.writer(open(outfile_path, 'wb'))
#create a list with headings for our columns
headers = ['latitude', 'longitude']
#write the row of headings to our CSV file
writer.writerow(headers)
# GET JSON AND PARSE IT INTO DICTIONARY
# We need a loop because we have to do this for every JSON file we grab
#set a counter telling us how many times we've gone through the loop, this is the first time, so we'll set it at 1
i=1
#loop through pages of JSON returned, 100 is an arbitrary number
while i<100:
#print out what number loop we are on, which will make it easier to track down problems when they appear
print i
#create the URL of the JSON file we want. We search for 'egypt', want English tweets,
#and set the number of tweets per JSON file to the max of 100, so we have to do as little looping as possible
url = urllib2.Request('http://api.metro.net/agencies/lametro/vehicles' + str(i))
#use the JSON library to turn this file into a Pythonic data structure
parsed_json = json.load(urllib2.urlopen('http://api.metro.net/agencies/lametro/vehicles'))
#now you have a giant dictionary.
#Type in parsed_json here to get a better look at this.
#You'll see the bulk of the content is contained inside the value that goes with the key, or label "results".
#Refer to results as an index. Just like list[1] refers to the second item in a list,
#dict['results'] refers to values associated with the key 'results'.
print parsed_json
#run through each item in results, and jump to an item in that dictionary, ex: the text of the tweet
for items in parsed_json['items']:
#initialize the row
row = []
#add every 'cell' to the row list, identifying the item just like an index in a list
#if latitude is not None:
#latitude = str(latitude)
#if longitude is not None:
#longitude = str(longitude)
row.append(Decimal(items['longitude'].encode('utf-16')))
row.append(Decimal(items['latitude'].encode('utf-16')))
#row.append(bool(services['predictable'].unicode('utf-8')))
#once you have all the cells in there, write the row to your csv
writer.writerow(row)
#increment our loop counter, now we're on the next time through the loop
i = i +1
#tell Python to rest for 5 secs, so we don't exceed our rate limit
sleep(5)
encode is available only for string. In your case item['longitude'] is a float. float doesn't have encode method. You can type case it and then use encode.
You can write like,
str(items['longitude']).encode('utf-16')
str(items['latitude']).encode('utf-16')
I think you can't pass an encoded string to Decimal object.
encode is a method that strings have, not floats.
Change row.append(Decimal(items['longitude'].encode('utf-16'))) to row.append(Decimal(str(items['longitude']).encode('utf-16'))) and similar with the other line.