I'm new to Python trying to build a web scraper with Scrapy and I am getting a lot of non-printing and blank spaces in the results. I'm attempting to iterate through a dictionary with a for loop where the values are lists, then run the .strip() method to get rid of all the non-printing characters. Only now I this error instead, "TypeError: list indices must be integers or slices, not str". I know I must be reaching into the object wrong, but after a few days of sifting through docs and similar exceptions I haven't found a way to resolve it yet.
The code I'm using is:
# -*- coding: utf-8 -*-
import scrapy
from ..items import JobcollectorItem
from ..AutoCrawler import searchIndeed
class IndeedSpider(scrapy.Spider):
name = 'indeed'
page_number = 2
start_urls = [searchIndeed.current_page_url]
def parse(self, response):
items = JobcollectorItem()
position = response.css('.jobtitle::text').extract()
company = response.css('span.company::text').extract()
location = response.css('.location::text').extract()
# print(position[0])
items['position'] = position
items['company'] = company
items['location'] = location
for key in items.keys():
prestripped = items[key]
for object in prestripped:
object = object.strip('\n')
items[key] = prestripped
yield items
I'm using python 3.7.4. Any tips on simplifying the function to get rid of the nested for loops would also be appreciated. The code for the entire project can be found here.
Thanks for the help!
Edit0:
The exception is thrown at line 27 reading:
" prestripped = items[key][value]
TypeError: list indices must be integers or slices, not str"
Edit1:
The data structure is items{'key':[list_of_strings]} where the dictionary name is items, the keys are string and the key's value is a list, with each list element being a sting.
Edit2:
Updated the code to reflect Alex.Kh's answer. Also, here is an approximation of what is currently getting returned: {company: ['\nCompany Name', '\n', '\nCompany Name', '\n', '\n', '\n',], location: ['Some City, US', 'Some City, US'], position: [' ', '\n', '\nPosition Name', ' ', ' Position Name']}
In addition to my comment, I think I know how to simplify and fix your code as well.
...
for key in items.keys():
restripped = items[key]
#BEWARE: a novice mistake here as object is just a copy
for object in restripped: #assuming extract() returns a list
object=object.strip() # will change a temporary copy
items[key] = restripped
...
I am not sure why exactly you need a value in your loop, so you could also just say for key in items.keys():. You main mistake was probably accessing the dictionary incorrectly(items[key][value]->items[key] as value is an actually value that corresponds to that key).
Edit:I spotted a huge mistake from my part in the for loop. As it creates a copy, the statement object=object.strip() will not affect the actual list. Guess not using Python for a while does make you forget certain features
I will leave the incorrect solution a reminder both to me and others. The correct way to use the strip() method is as follows:
...
#correct solution
for key in items.keys():
restripped = items[key]
for i,object in enumerate(restripped):
# alternatively: restripped[i]=restripped[i].strip()
restripped[i]=object.strip()
items[key] = restripped
...
So this is what I'm trying to do:
input: ABCDEFG
Desired output:
***DEFG
A***EFG
AB***FG
ABC***G
ABCD***
and this is the code I wrote:
def loop(input):
output = input
for index in range(0, len(input)-3): #column length
output[index:index +2] = '***'
output[:index] = input[:index]
output[index+4:] = input[index+4:]
print output + '\n'
But I get the error: TypeError: 'str' object does not support item assignment
You cannot modify the contents of a string, you can only create a new string with the changes. So instead of the function above you'd want something like this
def loop(input):
for index in range(0, len(input)-3): #column length
output = input[:index] + '***' + input[index+4:]
print output
Strings are immutable. You can not change the characters in a string, but have to create a new string. If you want to use item assignment, you can transform it into a list, manipulate the list, then join it back to a string.
def loop(s):
for index in range(0, len(s) - 2):
output = list(s) # create list from string
output[index:index+3] = list('***') # replace sublist
print(''.join(output)) # join list to string and print
Or, just create a new string from slices of the old string combined with '***':
output = s[:index] + "***" + s[index+3:] # create new string directly
print(output) # print string
Also note that there seemed to be a few off-by-one errors in your code, and you should not use input as a variable name, as it shadows the builtin function of the same name.
In Python, strings are immutable - once they're created they can't be changed. That means that unlike a list you cannot assign to an index to change the string.
string = "Hello World"
string[0] # => "H" - getting is OK
string[0] = "J" # !!! ERROR !!! Can't assign to the string
In your case, I would make output a list: output = list(input) and then turn it back into a string when you're finished: return "".join(output)
In python you can't assign values to specific indexes in a string array, you instead will probably want to you concatenation. Something like:
for index in range(0, len(input)-3):
output = input[:index]
output += "***"
output += input[index+4:]
You're going to want to watch the bounds though. Right now at the end of the loop index+4 will be too large and cause an error.
strings are immutable so don't support assignment like a list, you could use str.join concatenating slices of your string together creating a new string each iteration:
def loop(inp):
return "\n".join([inp[:i]+"***"+inp[i+3:] for i in range(len(inp)-2)])
inp[:i] will get the first slice which for the first iteration will be an empty string then moving another character across your string each iteration, the inp[i+3:] will get a slice starting from the current index i plus three indexes over also moving across the string one char at a time, you then just need to concat both slices to your *** string.
In [3]: print(loop("ABCDEFG"))
***DEFG
A***EFG
AB***FG
ABC***G
ABCD***
I'm trying to debug some Python 2.7.3 code to loop through a list of items and convert each to a string:
req_appliances = ['9087000CD', 'Olympus', 185]
for i in range(0, len(req_appliances)):
req_appliances[i] = str(req_appliances[i])
print req_appliances
The output is as follows:
['9087000CD', 'Olympus', '185']
In the example above, I've set the value of req_appliances explicitly to test the loop. In the actual code, req_appliances is an argument to a function. I do not know the type of the argument at runtime, but it appears to be a list of scalar values. I do know that when I invoke the function, I see the following error message:
File ".../database.py", line 8277, in get_report_appliance_list
req_appliances[i] = str(req_appliances[i])
TypeError: 'str' object does not support item assignment
I'm trying to deduce for what values of argument req_appliances it would be possible for this error condition to arise. It seems to me that all of the values are scalar and each (even if immutable) should be a valid LHS expressions in an assignment. Is there something I'm missing here? Here is the code in context, in the function in which it is defined:
def get_report_appliance_list(self, req_appliances, filter_type=None):
appliances = {}
appliance_list = []
if filter_type != None:
if filter_type not in ('appliances', 'servers'):
raise ValueError("appliance filter_type must be one of 'appliances' or 'servers'")
active_con = self.get_con()
if active_con is None:
raise Exception('No database connections are available.')
con = None
in_expr_items = ''
if req_appliances != None:
# Create a string like '(%s, %s, ...)' to represent
# the 'in' expression items in the SQL.
print(req_appliances)
for i in range(0, len(req_appliances)):
req_appliances[i] = str(req_appliances[i])
in_expr_items += '%s,'
in_expr_items = '(' + in_expr_items[:-1] + ') '
An str acts like a sequence type (you can iterate over it), but strings in Python are immutable, so you can't assign new values to any of the indices.
I expect what's happening here is that you're trying to run this when req_appliances is a str object.
I came up with two ways to fix this:
First, just check if it's a str before you iterate over it:
if isinstance(req_appliances, basestring):
return req_appliances
Second, you could check each item to see if it's already a string before trying to assign to it.
req_appliances = ['9087000CD', 'Olympus', 185]
for i in range(0, len(req_appliances)):
if req_appliances[i] != str(req_appliances[i]):
req_appliances[i] = str(req_appliances[i])
print req_appliances
Here, I'm actually checking whether the member is equal to its string representation. This is true when you iterate over strings.
>>> a = 'a'
>>> a[0] == str(a[0])
True
This is not really an answer to your question, but a style advise. If you happen to use for i in range(0, len(something)) a lot you should either use for i, obj in enumerate(something), map(func, something) or a list comprehension [func(x) for x in something].
Another red flag is the use of string += inside a loop. Better create an array and join it. This also eliminates the need to do stuff like [-1] in order to get rid of trailing commas.
Regarding your code you could simplify it a lot:
def get_report_appliance_list(self, req_appliances, filter_type=None):
appliances = {}
appliance_list = []
if filter_type not in (None, 'appliances', 'servers'):
raise ValueError("appliance filter_type must be one of 'appliances' or 'servers'")
active_con = self.get_con()
if active_con is None:
raise Exception('No database connections are available.')
# Create a string like '(%s, %s, ...)' to represent
# the 'in' expression items in the SQL.
in_expr_items = ','.join(['%s'] * len(req_appliances)
req_appliances = map(str, req_appliances)
...
Apart from that I would recommend that get_con() throws so you do not have to check for None in your code.
I am getting an error here and I am wondering if any of you can see where I went wrong. I am pretty much a beginner in python and can not see where I went wrong.
temp = int(temp)^2/key
for i in range(0, len(str(temp))):
final = final + chr(int(temp[i]))
"temp" is made up of numbers. "key" is also made of numbers. Any help here?
First, you defined temp as an integer (also, in Python, ^ isn't the "power" symbol. You're probably looking for **):
temp = int(temp)^2/key
But then you treated it as a string:
chr(int(temp[i]))
^^^^^^^
Was there another string named temp? Or are you looking to extract the ith digit, which can be done like so:
str(temp)[i]
final = final + chr(int(temp[i]))
On that line temp is still a number, so use str(temp)[i]
EDIT
>>> temp = 100 #number
>>> str(temp)[0] #convert temp to string and access i-th element
'1'
>>> int(str(temp)[0]) #convert character to int
1
>>> chr(int(str(temp)[0]))
'\x01'
>>>
I'm playing with both learning Python and am trying to get GitHub issues into a readable form. Using the advice on How can I convert JSON to CSV?, I came up with this:
import json
import csv
f = open('issues.json')
data = json.load(f)
f.close()
f = open("issues.csv", "wb+")
csv_file = csv.writer(f)
csv_file.writerow(["gravatar_id", "position", "number", "votes", "created_at", "comments", "body", "title", "updated_at", "html_url", "user", "labels", "state"])
for item in data:
csv_file.writerow([item["gravatar_id"], item["position"], item["number"], item["votes"], item["created_at"], item["comments"], item["body"], item["title"], item["updated_at"], item["html_url"], item["user"], item["labels"], item["state"]])
Where "issues.json" is the JSON file containing my GitHub issues. When I try to run that, I get
File "foo.py", line 14, in <module>
csv_file.writerow([item["gravatar_id"], item["position"], item["number"], item["votes"], item["created_at"], item["comments"], item["body"], item["title"], item["updated_at"], item["html_url"], item["user"], item["labels"], item["state"]])
TypeError: string indices must be integers
What am I missing here? Which are the "string indices"? I'm sure that once I get this working I'll have more issues, but for now, I'd just love for this to work!
When I tweak the for statement to simply
for item in data:
print item
what I get is ... "issues" -- so I'm doing something more basic wrong. Here's a bit of my JSON content:
{"issues": [{"gravatar_id": "44230311a3dcd684b6c5f81bf2ec9f60", "position": 2.0, "number": 263, "votes": 0, "created_at": "2010/09/17 16:06:50 -0700", "comments": 11, "body": "Add missing paging (Older>>) links...
when I print data, it looks like it is getting munged really oddly:
{u'issues': [{u'body': u'Add missing paging (Older>>) lin...
The variable item is a string. An index looks like this:
>>> mystring = 'helloworld'
>>> print mystring[0]
'h'
The above example uses the 0 index of the string to refer to the first character.
Strings can't have string indices (like dictionaries can). So this won't work:
>>> mystring = 'helloworld'
>>> print mystring['stringindex']
TypeError: string indices must be integers
item is most likely a string in your code; the string indices are the ones in the square brackets, e.g., gravatar_id. So I'd first check your data variable to see what you received there; I guess that data is a list of strings (or at least a list containing at least one string) while it should be a list of dictionaries.
TypeError for Slice Notation str[a:b]
Short Answer
Use a colon : instead of a comma , in between the two indices a and b in str[a:b]:
my_string[0,5] # wrong ❌
my_string[0:5] # correct ✅
Long Answer
When working with strings and slice notation (a common sequence operation), it can happen that a TypeError is raised, pointing out that the indices must be integers, even if they obviously are.
Example
>>> my_string = "Hello, World!"
>>> my_string[0,5]
TypeError: string indices must be integers
We obviously passed two integers for the indices to the slice notation, right? So what is the problem here?
This error can be very frustrating - especially at the beginning of learning Python - because the error message is a little bit misleading.
Explanation
We implicitly passed a tuple of two integers to the slice notation when we called my_string[0,5]. 0,5 evaluates to the same tuple as (0,5) does - even without the parentheses. Why though?
A trailing comma , is actually enough for the Python interpreter to evaluate something as a tuple:
>>> my_variable = 0,
>>> type(my_variable)
<class 'tuple'>
So what we did there, this time explicitly:
>>> my_string = "Hello, World!"
>>> my_tuple = 0, 5
>>> my_string[my_tuple]
TypeError: string indices must be integers
Now, at least, the error message makes sense.
Solution
We need to replace the comma , with a colon : to separate the two integers correctly, not having them interpreted as a tuple:
>>> my_string = "Hello, World!"
>>> my_string[0:5]
'hello'
A clearer and more helpful error message could have been something like:
TypeError: string indices must be integers not tuple
^^^^^
(actual type here)
A good error message should show the user directly what they did wrong! With this kind of information it would have been much more easier to find the root cause and solve the problem - and you wouldn't have had to come here.
So next time, when you find yourself responsible for writing error description messages, remind yourself of this example and add the reason (or other useful information) to error message! Help other people (or maybe even your future self) to understand what went wrong.
Lessons learned
slice notation uses colons : to separate its indices (and step range, i.e., str[from:to:step])
tuples are defined by commas , (i.e., t = 1,)
add some information to error messages for users to understand what went wrong
data is a dict object. So, iterate over it like this:
Python 2
for key, value in data.iteritems():
print key, value
Python 3
for key, value in data.items():
print(key, value)
I had a similar issue with Pandas, you need to use the iterrows() function to iterate through a Pandas dataset Pandas documentation for iterrows
data = pd.read_csv('foo.csv')
for index,item in data.iterrows():
print('{} {}'.format(item["gravatar_id"], item["position"]))
note that you need to handle the index in the dataset that is also returned by the function.
As a rule of thumb, when I receive this error in Python I compare the function signature with the function execution.
For example:
def print_files(file_list, parent_id):
for file in file_list:
print(title: %s, id: %s' % (file['title'], file['id']
So if I'll call this function with parameters placed in the wrong order and pass the list as the 2nd argument and a string as the 1st argument:
print_files(parent_id, list_of_files) # <----- Accidentally switching arguments location
The function will try to iterate over the parent_id string instead of file_list and it will expect to see the index as an integer pointing to the specific character in string and not an index which is a string (title or id).
This will lead to the TypeError: string indices must be integers error.
Due to its dynamic nature (as opposed to languages like Java, C# or Typescript), Python will not inform you about this syntax error.
How to read the first element of this JSON?
when the file appears like this
for i in data[1]:
print("Testing"+i['LocalObservationDateTime'])
This is not working for me.
Below is the JSON file
[
{
"LocalObservationDateTime":"2022-09-15T19:05:00+02:00",
"EpochTime":1663261500,
"WeatherText":"Mostly cloudy",
"WeatherIcon":6,
"HasPrecipitation":false,
"PrecipitationType":"None",
"IsDayTime":true,
"Temperature":{
"Metric":{
"Value":11.4,
"Unit":"C",
"UnitType":17
},
"Imperial":{
"Value":52.0,
"Unit":"F",
"UnitType":18
}
},
"RealFeelTemperature":{
"Metric":{
"Value":8.4,
"Unit":"C",
"UnitType":17,
"Phrase":"Chilly"
}
}
},
{
"LocalObservationDateTime":"2022-09-16T19:05:00+02:00",
"EpochTime":1663261500,
"WeatherText":"Mostly cloudy",
"WeatherIcon":6,
"HasPrecipitation":false,
"PrecipitationType":"None",
"IsDayTime":true,
"Temperature":{
"Metric":{
"Value":11.4,
"Unit":"C",
"UnitType":17
},
"Imperial":{
"Value":52.0,
"Unit":"F",
"UnitType":18
}
},
"RealFeelTemperature":{
"Metric":{
"Value":8.4,
"Unit":"C",
"UnitType":17,
"Phrase":"Chilly"
}
}
}
]
This can happen if a comma is missing. I ran into it when I had a list of two-tuples, each of which consisted of a string in the first position, and a list in the second. I erroneously omitted the comma after the first component of a tuple in one case, and the interpreter thought I was trying to index the first component.
Converting the lower case letters to upper:
str1 = "Hello How are U"
new_str = " "
for i in str1:
if str1[i].islower():
new_str = new_str + str1[i].upper()
print(new_str)
Error :
TypeError: string indices must be integers
Solution :
for i in range(0, len(str1))
// Use range while iterating the string.