How can I solve this error?(python crawling)

How can I solve this error?(python crawling) - python

I have been crawling Flickr data for 2 weeks.
Crawling has been done well.
But, today executing the python code in Windows PowerShell, this error happened.
Traceback (most recent call last): File "getdata_tag.py", line 3 in module
nsid= info["owner"]["nsid"];
TypeError: string indices must be integers, not str
how can I modify this code?
I will add the code here

This looks like info["owner"] or info itself is string, not dictionary.
You must check which scenario is it and then remove ["owner"]["nsid"] if info is string or only ["nsid"] if info["owner"] is string.

Related

This simple piece of code gives an error when it gets looped for the second time

import random
import string
for x in range(0,15):
print "something"
string= str(random.choice(string.letters)+str(random.randint(100,10000))+random.choice(string.letters)+str(random.randint(0,100)))
print string
Why does this code throw an error when it runs for the second time inside the for loop? I have no idea how it works perfectly for the first time and throws this error:
something J6554r15 something
Traceback (most recent call last): File
"C:\Users\test\Desktop\soooo.py", line 8, in
string= str(random.choice(string.letters)+str(random.randint(100,10000))+ran
dom.choice(string.letters)+str(random.randint(0,100))) AttributeError:
'str' object has no attribute 'letters'
What am I missing here?

You are setting the variable string inside the loop which overwrites the import string library. Hence, on the second round you no longer have the string library to use string.letters but an actual string. Try using a different variable name.

Why does ord() fail when porting from Python 2 to Python 3?

I am trying to port a Python library called heroprotocol from Python 2 to Python 3. This library is used to parse replay files from an online game called Heroes of the Storm, for the purpose of getting data from the file (i.e. who played against who, when did they die, when did the game end, who won, etc).
It seems that this library was created for Python 2, and since I am using Python 3 (specifically Anaconda, Jupyter notebook) I would like to convert it to Python 3.
The specific issue I am having is that when I run
header = protocol.decode_replay_header(mpq.header['user_data_header']['content'])
which should get some basic data about the replay file, I get this error:
TypeError: ord() expected string of length 1, but int found
I googled the ord() function and found a few posts about the usage of ord() in Python 3, but none of them solved the issue I am having. I also tried posting in the "Issues" section on Github, but I got no response yet.
Why am I seeing this error?

According to the issue you raised, the exception occurs on line 69 of decoders.py:
self._next = ord(self._data[self._used])
The obvious reason this would succeed in Python 2 but fail in Python 3 is that self._data is a bytestring. In Python 2, bytestrings are the "standard" string objects, so that indexing into one returns the character at that position (itself a string) …
# Python 2.7
>>> b'whatever'[3]
't'
… and calling ord() on the result behaves as expected:
>>> ord(b'whatever'[3])
116
However, in Python 3, everything is different: the standard string object is a Unicode string, and bytestrings are instead sequences of integers. Because of this, indexing into a bytestring returns the relevant integer directly …
# Python 3.6
>>> b'whatever'[3]
116
… so calling ord() on that integer makes no sense:
>>> ord(b'whatever'[3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ord() expected string of length 1, but int found
So, you ought to be able to prevent the specific exception you're asking about here by simply removing the call to ord() on that and similar lines:
self._next = self._data[self._used]
… although of course it's likely that further problems (out of scope for this question) will be revealed as a result.

Unicodedata.normalize() ValueError: invalid normalization form

I'm trying to take foreign language text and output a human-readable, filename-safe equivalent. After looking around, it seems like the best option is unicodedata.normalize(), but I can't get it to work. I've tried putting the exact code from some answers here and elsewhere, but it keeps giving me this error. I only got one success, when I ran:
unicodedata.normalize('NFD', '\u00C7')
'C\u0327'
But every other time, I get an error. Here's my code I've tried:
unicodedata.normalize('NFKD', u'\u2460') #error, not sure why. Look same as above.
s = 'ذهب الرجل'
unicodedata.normalize('NKFC',s) #error
unicodedata.normalize('NKFD', 'ñ') #error
Specifically, the error I get is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid normalization form
I don't understand why this isn't working. All of these are strings, which means they are unicode in Python 3. I tried encoding them using .encode(), but then normalize() said it only takes arguments of string, so I know that can't be it. I'm seriously at a loss because even code I'm copying from here seems to error out. What's going on here?

Looking at unicodedata.c, the only way you can get that error is if you enter an invalid form string. The valid values are "NFC", "NFKC", "NFD", and "NFKD", but you seem to be using values with the "F" and "K" switched around:
>>> import unicodedata
>>>
>>> unicodedata.normalize('NKFD', 'ñ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid normalization form
>>>
>>> unicodedata.normalize('NFKD', 'ñ')
'ñ'

How to format a write statement in Python?

I have data that I want to print to file. For missing data, I wish to print the mean of the actual data. However, the mean is calculated to more than the required 4 decimal places. How can I write to the mean to file and format this mean at the same time?
I have tried the following, but keep getting errors:
outfile.write('{0:%.3f}'.format(str(mean))+"\n")

First, remove the % since it makes your format syntax invalid. See a demonstration below:
>>> '{:%.3f}'.format(1.2345)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Invalid conversion specification
>>> '{:.3f}'.format(1.2345)
'1.234'
>>>
Second, don't put mean in str since str.format is expecting a float (that's what the f in the format syntax represents). Below is a demonstration of this bug:
>>> '{:.3f}'.format('1.2345')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'f' for object of type 'str'
>>> '{:.3f}'.format(1.2345)
'1.234'
>>>
Third, the +"\n" is unnecessary since you can put the "\n" in the string you used on str.format.
Finally, as shown in my demonstrations, you can remove the 0 since it is redundant.
In the end, the code should be like this:
outfile.write('{:.3f}\n'.format(mean))

You don't need to convert to string using str(). Also, the "%" is not required. Just use:
outfile.write('{0:.3f}'.format(mean)+"\n")

First of all, the formatting of your string has nothing to do with your write statement. You can reduce your problem to:
string = '{0:%.3f}'.format(str(mean))+"\n"
outfile.write(string)
Then, your string specification is incorrect and should be:
string = '{0:.3f}\n'.format(mean)

outfile.write('{.3f}\n'.format(mean))

ValueError: Unknown format code 'g' for object of type 'str'

I am new to Python and I am trying to write a simple print function but I am getting a strange error. This is my code:
#! /usr/bin/env python3.2
import numpy as np
a=np.arange(1,10,1)
print(a)
for i in a:
print(i)
print(type(i))
print("{0:<12g}".format(i))
The output is:
[1 2 3 4 5 6 7 8 9]
1
<class 'numpy.int64'>
Traceback (most recent call last):
File "./test.py", line 9, in <module>
print("{0:<12g}".format(i))
ValueError: Unknown format code 'g' for object of type 'str'
Why does print take the "numpy.int64" as a string? I have to add that it works perfectly for a normal list: (e.g. [1,2,3,4]) I would be most grateful to any ideas on this issue, thanks ;-).

This is a known bug and should be fixed in version 2.0. In the interim, you can use the old syntax %f that works.

Someone will be able to give you a more in-depth answer, but what I think is happening here is that you're using "{0:<12g}".format(i) which uses special formatting. If you'd try "\{{0}:<12g\}".format(i) you'd probably get better results. Using the slashes there escapes the {}'s which is what is giving you the error.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I solve this error?(python crawling) - python

This looks like info["owner"] or info itself is string, not dictionary. You must check which scenario is it and then remove ["owner"]["nsid"] if info is string or only ["nsid"] if info["owner"] is string.

Related

This simple piece of code gives an error when it gets looped for the second time

Why does ord() fail when porting from Python 2 to Python 3?

Unicodedata.normalize() ValueError: invalid normalization form

How to format a write statement in Python?

ValueError: Unknown format code 'g' for object of type 'str'

Categories

Resources