Value Error with indexing string despite string present

Value Error with indexing string despite string present - python

I want to use .index() to search a column of a 2D list and return the location of that line so I can then alter data at that location. I've been trying to solve a smaller version of this below.
data_test = [["2016-12-14T07:39:00.000000Z",0],["2016-12-14T07:40:00.000000Z",1],\
["2016-12-14T07:41:00.000000Z",2], ["2016-12-14T07:42:00.000000Z",3]]
string = "2016-12-14T07:39:00.000000Z"
if data_test[0][0] == string:
print('works')
else:
print("does not work")
print(data_test.index(string))
The string compare test works, so it isn't anything wrong there, but the index test below returns:
ValueError: '2016-12-14T07:39:00.000000Z' is not in list
In full operation I will be checking a list of thousands of rows, so I'm trying to avoid just looping through and doing a string comparison at each level. Any alternatives and help would be highly appreciated.

data_test = [["2016-12-14T07:39:00.000000Z",0],["2016-12-14T07:40:00.000000Z",1],
["2016-12-14T07:41:00.000000Z",2], ["2016-12-14T07:42:00.000000Z",3]]
string = "2016-12-14T07:39:00.000000Z"
for i, data in enumerate(data_test):
if data[0] == string:
print("works, index {}".format(i))
else:
pass
I'm not sure what you want to do with the index, so this could be a vastly inefficient way of going through the list. If you want to transform the list, it might be better to rebuild the list as you iterate through it once and make the transforms as you go.
EDIT:
Since data_test is a nested list, \ was unnecessary, since you can split the data structure over multiple lines. Also you can omit the else clause here entirely if you're not rebuilding the list. I just assume that you are.

The string you're looking for is not in data_test but in data_test [0] so obviously data_test.index cannot find it.

Related

Check if terms are in columns and remove

Originally I wanted to filter only for specific terms, however I've found python will match the pattern regardless of specificity eg:
possibilities = ['temp', 'degc']
temp = (df.filter(regex='|'.join(re.escape(x) for x in temp_possibilities))
.columns.to_list())
Output does find the correct columns, but unfortunately it also returns columns like temp_uncalibrated, which I do not want.
So to solve this, so far I define and remove unwanted columns first, before filtering ie:
if 'temp_uncalibrated' in df.columns:
df = df.drop('temp_uncalibrated',axis = 1)
else:
pass
However, I have found more and more of these unwanted columns, and now the code looks messy and hard to read with all the terms. Is there a way to do this more succinctly? I tries putting the terms in a list and do it that way, but it does not work, ie:
if list in df.columns:
df = df.drop(list,axis = 1)
else:
pass
I thought maybe a def function might be a better way to do it, but not really sure where to start.

What's the fastest way to find if a string is not in a group of strings?

I have a bunch of strings and I need to be able to tell if I have already used them. Right now I add all of the strings to a main string called titles. I then used:
#titles = some string with a bunch of strings in it
#n_title is the the string I want to check if it is in titles
if n_title not in titles:
#do something
else:
#do something else
My question is would it be better if titles was a dictionary or an array or are they all the same run time? I believe most of the time my n_title will not be in the titles, if that makes any difference.

use set() instead of list() because sets is much faster than lists.
lists have duplicates values in it but sets does not contain any duplicates so lesser data becomes faster to find.
titles = {'tilte1', 'tilte', 'tilte3'}
n_title = 'title4'
if n_title not in titles:
#do something
else:
#do something else

How to create a DataFrame with Reddit API loop and manage the list

I'm very new to Reddit API (PRAW/PSAW), Python, as well as programming in general. What I'm trying to do is get top submissions from certain subreddits within 6 months, then convert the list into a DataFrame and to CSV file later.
I want to:
Get the length of the list
Sort by date(epoch)
Make a data frame out of this
What I tried so far:
list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
if submission.created_utc >=1569902400 and submission.created_utc <=1585627200:
print(submission.created_utc, submission.title, submission.score, submission.id) # This seems to get me the data I want.
len() # I want to check the length, but it doesn't work. It just gives me a row of zeroes.
sorted(submission.created_utc) # This also doesn't work. It says 'float' object is not iterable.
# I tried converting to int, but also didn't work.
pd.DataFrame(list_submission) # Also doesn't work.
So in brief,
I suppose making a data frame out of this can as well solve the first 2 problems, although I think being able to do that using the codes will be helpful when evaluating the list!

To answer the 3 parts of your question:
To get the length of a list, you need to pass the list you want to evaluate to the len() method, so if you want to let's say find the length of list_submission, you would instead do len(list_submission). Right now you are basically trying to get the length of nothingness, so that is why you are seeing zeros.
If the submission matches the requirements, you can append it to the list of submissions with list_submission.append(submission). Then after the for loop is complete, you can used sorted() to sort the entire list. You need to pass in the whole list plus the key you want to sort on, so it would look like sorted(list_submission, key=lambda submission: submission.created_utc). The reason you are getting an error is because you are passing in the wrong parameters.
Your method for converting the list into a DataFrame should then work. You can use columns = ['created_utc', 'title', 'score', 'id'] to set the column names.
Final code will look something like the following:
list_submission = []
for submission in reddit.subreddit('bitcoin').top(limit=None):
if submission.created_utc >= 1569902400 and submission.created_utc <= 1585627200:
print(submission.created_utc, submission.title, submission.score, submission.id)
list_submission.append(submission)
print(len(list_submission))
sorted(list_submission, key=lambda submission: submission.created_utc)
pd.DataFrame(list_submission, columns = ['created_utc', 'title', 'score', 'id'])

Efficient way to convert a list of one string to a string and perform split operation

I have a list which contains a string shown below. I have defined mylist in the global space as a string using "".
mylist = ""
mylist = ["1.22.43.45"]
I get an execution error stating that the split operation is not possible as it is being performed on a list rather than the string.
mylist.rsplit(".",1)[-1]
I tried to resolve it by using the following code:
str(mylist.rsplit(".",1)[-1]
Is this the best way to do it? The output I want is 45. I am splitting the string and accessing the last element. Any help is appreciated.

mylist=["1.22.43.45"]
newstring = mylist[0].rsplit(".",1)[-1]
First select the element in your list then split then choose the last element in the split

Just because you assigned mylist = "" first, doesn't mean it'll cast the list to a string. You've just reassigned the variable to point at a list instead of an empty string.
You can accomplish what you want using:
mylist = ["1.22.43.45"]
mylist[-1].rsplit('.', 1)[-1]
Which will get the last item from the list and try and perform a rsplit on it. Of course, this won't work if the list is empty, or if the last item in the list is not a string. You may want to wrap this in a try/except block to catch IndexError for example.
EDIT: Added the [-1] index to the end to grab the last list item from the split, since rsplit() returns a list, not a string. See DrBwts' answer

You can access the first element (the string, in your case) by the index operator []
mylist[0].rsplit(".", 1)[-1]

Python: How do I remove the formatting when joining items from a list into a string?

Good morning, all:
I'm working on a small Python program which matches together small strings in order to make up fictional company names. The name segments are stored in three lists, and a random string from each list is chosen each time a new name is requested. For example, the program might pick "Eli", "rce" and "Softworks" from the three lists, which would give me "Elirce Softworks".
seg1 = namesegs1[random.randint(0, seg1_length)]
seg2 = namesegs2[random.randint(0, seg2_length)]
seg3 = namesegs3[random.randint(0, seg3_length)]
new_name = "{0}{1} {2}".format(seg1, seg2, seg3)
However, the code actually returns ['Eli']['rce'] ['Softworks']. It makes sense, given that it's joining items from lists, but I don't see why these can't be removed in some way or another.
Here's one way I've made it work:
new_name = new_name.replace("'", "")
new_name = new_name.replace(",", "")
new_name = new_name.replace("[", "")
new_name = new_name.replace("]", "")
That gets rid of the formatting nicely, but it's less than satisfactory and it feels like I'm doing it wrong. Is there a better way to be going about this?
Many thanks for your time.

It looks like namesegs1 is a list of lists rather than a list of strings.
What do you get with the following?
new_name = "{0}{1} {2}".format(seg1[0], seg2[0], seg3[0])
By the way, use random.choice(namesegs1) to select a random item from your list, rather than that thing you've done with random.randint.

Are you familiar with Python string join functionality?
For me it sounds like what you need.
http://docs.python.org/2/library/stdtypes.html#str.join

Suppose your logic with the randomizer generates an input list as:
input_list = [['Eli'],['rce'],['Softworks']]
the following code will give you what you need:
input_list = [i[0] for i in input_list]
''.join(input_list)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Value Error with indexing string despite string present - python

The string you're looking for is not in data_test but in data_test [0] so obviously data_test.index cannot find it.

Related

Check if terms are in columns and remove

What's the fastest way to find if a string is not in a group of strings?

How to create a DataFrame with Reddit API loop and manage the list

Efficient way to convert a list of one string to a string and perform split operation

Python: How do I remove the formatting when joining items from a list into a string?

Categories

Resources