Strange error in Python (IndexError) - python

Sorry if this is a dumb question but I am having a few issues with my code.
I have a Python script that scrapes Reddit and sets the top picture as my desktop background.
I want it to only download if the picture is big enough, but I am getting a strange error.
>>> m = '1080x608'
>>> w = m.rsplit('x', 1)[0]
>>> print(w)
1080
>>> h = m.rsplit('x', 1)[1]
>>> print(h)
608
This works fine, but the following doesn't, despite being almost the same.
>>> m = '1280×721'
>>> w = m.rsplit('x', 1)[0]
>>> h = m.rsplit('x', 1)[1]
Traceback (most recent call last):
File "<pyshell#35>", line 1, in <module>
h = m.rsplit('x', 1)[1]
IndexError: list index out of range

In your second example × is not the same as x, it is instead a multiplication sign. If you are getting these stings from somewhere and then parsing them, you should first do
m = m.replace('×', 'x')

× != x. Split returns one-element list, and you are trying to retrieve second element from it.
'1080x608'.rsplit('x', 1) # ['1080', '608']
'1280×721'.rsplit('x', 1) # ['1280\xc3\x97721']
In second case there is no second element in list - it contains only one element.
MCVE would be:
l = ['something']
l[1]
With exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
To ensure that you split string to two parts you may use a partition.
Split the string at the first occurrence of sep, and return a 3-tuple
containing the part before the separator, the separator itself, and
the part after the separator. If the separator is not found, return a
3-tuple containing the string itself, followed by two empty strings.
w, sep, h = m.partition('x')
# h and sep will be empty if there is no separator in m

Related

python - reading list with strings and convert it to int() but keep specific format

I have a file full of strings which i read into a list. Now I'd like to find a specific line (for example the first line below) by looking for .../002/... and add to these 002 +5 to give me /007/, in order to find my next line containing /007/.
The file looks like this
https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/6/MYD021KM/2018/002/MYD021KM.A2018002.1345.006.2018003152137.hdf
https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/6/MYD021KM/2018/004/MYD021KM.A2018004.1345.006.2018005220045.hdf
with this i could identify for example the first line:
match = re.findall("/(\d{3})/", data_time_filtered[i])
The problem now is: how do I convert the string to integers but keeping the format 00X? Is this Ansatz correct?:
match_conv = ["{WHAT's in HERE?}".format(int(i)) for i in match]
EDIT according to suggested answers below:
So apparently there's no way to directly read the numbers in the string and keep them as they are?
adding 0s to the number with zfill and other suggested functions makes it more complicated as /00x/ should remain max 3 digits (as they represent days of year). So i was looking for an efficient way to keep the numbers from the string as they are and make them "math-able".
We can first define a function that adds a integer to a string and returns a string, padded with zeros to keep the same length:
def add_to_string(s, n):
total = int(s)+n
return '{:0{}}'.format(total, len(s))
add_to_string('003', 2)
#'005'
add_to_string('00030', 12 )
#'00042
We can then use re.sub with a replacement function. We use the regex r"(?<=/)\d{3}(?=/)" that matches a group of 3 digits, preceded and followed by /, without including them in the match.
The replacement function takes a match as parameter, and returns a string.You could hardcode it, like this:
import re
def add_5_and_replace(match):
return add_to_string(match.group(0), 5)
url = 'https://nasa.gov/archive/allData/6/MYD021KM/2018/002/MYD021KM.hdf'
new = re.sub(r"(?<=/)\d{3}(?=/)", add_5_and_replace, url)
print(new)
# https://nasa.gov/archive/allData/6/MYD021KM/2018/007/MYD021KM.hdf
But it could be better to pass the value to add. Either use a lambda:
def add_and_replace(match, n=1):
return add_to_string(match.group(0), n)
url = 'https://nasa.gov/archive/allData/6/MYD021KM/2018/002/MYD021KM.hdf'
new = re.sub(r"(?<=/)\d{3}(?=/)", lambda m: add_and_replace(m, n=5), url)
Or a partial function. A complete solution could then be:
import re
from functools import partial
def add_to_string(s, n):
total = int(s)+n
return '{:0{}}'.format(total, len(s))
def add_and_replace(match, n=1):
return add_to_string(match.group(0), n)
url = 'https://nasa.gov/archive/allData/6/MYD021KM/2018/002/MYD021KM.hdf'
new = re.sub(r"(?<=/)\d{3}(?=/)", partial(add_and_replace, n=3), url)
print(new)
# https://nasa.gov/archive/allData/6/MYD021KM/2018/005/MYD021KM.hdf
If you only want to add the default value 1 to your number, you can simply write
new = re.sub(r"(?<=/)\d{3}(?=/)", add_and_replace, url)
print(new)
# https://nasa.gov/archive/allData/6/MYD021KM/2018/003/MYD021KM.hdf
Read about mini format language here:
c = "{:03}".format(25) # format a number to 3 digits, fill with 0
print(c)
Output:
025
You can't get int to be 001, 002. They can only be 1, 2.
You can do similar by using string.
>>> "3".zfill(3)
'003'
>>> "33".zfill(3)
'000ss'
>>> "33".rjust(3, '0')
'033'
>>> int('033')
33
>>> a = 3
>>> a.zfill(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'zfill'
Or you rjust and ljust:
>>> '2'.ljust(3,'0')
'200'
>>> '2'.rjust(3,'0')
'002'
>>>
Or:
>>> '{0:03d}'.format(2)
'002'
Or:
>>> format(2, '03')
'002'
Or:
>>> "%03d" % 2
'002'

Search through directory for items with multiple criteria

I'm trying to write some code that searches through a directory and pulls out all the items that start with a certain numbers (defined by a list) and that end with '.labels.txt'. This is what I have so far.
lbldir = '/musc.repo/Data/shared/my_labeled_images/labeled_image_maps/'
picnum = []
for ii in os.listdir(picdir):
num = ii.rstrip('.png')
picnum.append(num)
lblpath = []
for file in os.listdir(lbldir):
if fnmatch.fnmatch(file, '*.labels.txt') and fnmatch.fnmatch(file, ii in picnum + '.*'):
lblpath.append(os.path.abspath(file))
Here is the error I get
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-a03c65e65a71> in <module>()
3 lblpath = []
4 for file in os.listdir(lbldir):
----> 5 if fnmatch.fnmatch(file, '*.labels.txt') and fnmatch.fnmatch(file, ii in picnum + '.*'):
6 lblpath.append(os.path.abspath(file))
TypeError: can only concatenate list (not "str") to list
I realize the ii in picnum part won't work but I don't know how to get around it. Can this be accomplished with the fnmatch module or do I need regular expressions?
The error comes because you are trying to add ".*" (a string) to the end of picnum, which is a list, and not a string.
Also, ii in picnum isn't giving you back each item of picnum, because you are not iterating over ii. It just has the last value that it was assigned in your first loop.
Instead of testing both at once with the and, you might have a nested test that operates when you find a file matching .labels.txt, as below. This uses re instead of fnmatch to extract the digits from the beginning of the file name, instead of trying to match each picnum. This replaces your second loop:
import re
for file in os.listdir(lbldir):
if file.endswith('.labels.txt')
startnum=re.match("\d+",file)
if startnum and startnum.group(0) in picnum:
lblpath.append(os.path.abspath(file))
I think that should work, but it is obviously untested without your actual file names.

Value Error : invalid literal for int() with base 10: ''

I'm new in Python and I don't know why I'm getting this error sometimes.
This is the code:
import random
sorteio = []
urna = open("urna.txt")
y = 1
while y <= 50:
sort = int(random.random() * 392)
print sort
while sort > 0:
x = urna.readline()
sort = sort - 1
print x
sorteio = sorteio + [int(x)]
y = y + 1
print sorteio
Where urna.txt is a file on this format:
1156
459
277
166
638
885
482
879
33
559
I'll be grateful if anyone knows why this error appears and how to fix it.
Upon attempting to read past the end of the file, you're getting an empty string '' which cannot be converted to an int.
>>> int('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
to satisfy the requirement of selecting 50 random lines from the text value, if I understand your problem correctly:
import random
with open("urna.txt") as urna:
sorteio = [int(line) for line in urna] # all lines of the file as ints
selection = random.sample(sorteio, 50)
print selection
.readline() returns an empty string when you come to the end of the file, and that is not a valid number.
Test for it:
if x.strip(): # not empty apart from whitespace
sorteio = sorteio + [int(x)]
You appear to beappending to a list; lists have a method for that:
sorteio.append(int(x))
If you want to get a random sample from your file, there are better methods. One is to read all values, then use random.sample(), or you can pick values as you read the file line by line all the while adjusting the likelihood the next line is part of the sample. See a previous answer of mine for a more in-depth discussion on that subject.

In what order does Python resolve functions? (why does string.join(lst.append('a')) fail?)

How does string.join resolve? I tried using it as below:
import string
list_of_str = ['a','b','c']
string.join(list_of_str.append('d'))
But got this error instead (exactly the same error in 2.7.2):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/string.py", line 318, in join
return sep.join(words)
TypeError
The append does happen, as you can see if you try to join list_of_string again:
print string.join(list_of_string)
-->'a b c d'
here's the code from string.py (couldn't find the code for the builtin str.join() for sep):
def join(words, sep = ' '):
"""join(list [,sep]) -> string
Return a string composed of the words in list, with
intervening occurrences of sep. The default separator is a
single space.
(joinfields and join are synonymous)
"""
return sep.join(words)
What's going on here? Is this a bug? If it's expected behavior, how does it resolve/why does it happen? I feel like I'm either about to learn something interesting about the order in which python executes its functions/methods OR I've just hit a historical quirk of Python.
Sidenote: of course it works to just do the append beforehand:
list_of_string.append('d')
print string.join(list_of_string)
-->'a b c d'
list_of_str.append('d')
does not return the new list_of_str.
The method append has no return value and so returns None.
To make it work you can do this:
>>> import string
>>> list_of_str = ['a','b','c']
>>> string.join(list_of_str + ['d'])
Although that is not very Pythonic and there is no need to import string... this way is better:
>>> list_of_str = ['a','b','c']
>>> ''.join(list_of_str + ['d'])

Python: needs more than 1 value to unpack

What am I doing wrong to get this error?
replacements = {}
replacements["**"] = ("<strong>", "</strong>")
replacements["__"] = ("<em>", "</em>")
replacements["--"] = ("<blink>", "</blink>")
replacements["=="] = ("<marquee>", "</marquee>")
replacements["##"] = ("<code>", "</code>")
for delimiter, (open_tag, close_tag) in replacements: # error here
message = self.replaceFormatting(delimiter, message, open_tag, close_tag);
The error:
Traceback (most recent call last):
File "", line 1, in
for doot, (a, b) in replacements: ValueError: need more than 1 value to
unpack
All the values tuples have two values. Right?
It should be:
for delimiter, (open_tag, close_tag) in replacements.iteritems(): # or .items() in py3k
I think you need to call .items() like the third example in this link
for delimiter, (open_tag, close_tag) in replacements.items(): # error here
message = self.replaceFormatting(delimiter, message, open_tag, close_tag)

Categories