Python - Pulling data from a file that matches a parameter - python

I have a file that contains information about users and the amount of times they have logged in. I am trying to pull all users that have a login of >= 250 and save it to another file. I am new at python coding and continue to get a "invalid literal with base 10" error when trying to run this portion of my code. Can anyone help me out and explain why this happens so I can prevent from this from happening in the future? TIA
thanks
def main():
userInformation = readfile("info")
suspicious = []
for i in userInformation :
if(int(i[2])>=250):
suspicious.append(i)
Full code below if needed:
#Reading the file function
def readFile(filename):
file = open(filename,'r')
lines = [x.split('\n')[0].split(';') for x in file.readlines()]
file.close()
return lines
def writeFile(suspicious):
file = open('suspicious.txt','w')
for i in suspicious:
file.write('{};{};{};{}\n'.format(i[0],i[1],i[2],i[3]))
file.close()
def main()
userInformation = readfile("info")
suspicious = []
for i in userInformation :
if(int(i[2])>=250):
suspicious.append(i)
writeFile(suspicious)
print('Suspicious users:')
for i in suspicious:
print('{} {}'.format(i[0],i[1]))
main()
Here is some line of my file:
Jodey;Lamins;278
Chris;Taylors;113
David;Mann;442
etc
etc

"invalid literal with base 10" occurs when you're trying to parse an integer that's not in base 10. In other words i[2] is not a valid integer (most likely it's a string that you're incorrectly trying to convert to an integer). Also, it would be best to correctly format your main function.

Ok, so I took your example file and played with it a little. The issues I faced were mostly spacing issues. So, here's the code you might like -
UsersInfoFileName = '/path/to/usersinfofile.txt'
MaxRetries = 250
usersWithExcessRetries = []
with open(UsersInfoFileName, 'r') as f:
lines = f.readlines()
consecutiveLines = (line.strip() for line in lines if line.strip())
for line in consecutiveLines:
if (int(line.split(';')[-1]) > MaxRetries):
usersWithExcessRetries.append(line)
for suspUsers in usersWithExcessRetries:
print(suspUsers)
Here's what it does -
Reads all lines in the given file
Filters all lines by excluding lines which may be empty
Removes surrounding white spaces for the remaining lines
Reads last semi-colon separated value, and compares it with MaxRetries
Adds the original line to a list if the value exceeds MaxRetries

Related

Python: How to return specific lines from a text

I am new here and new to Programming too.
I am reading Jamie Chan's Learn Python in One Day and am currently at the Practical Project section. I am trying to make python read a line from a txt file. The txt file contains a name and a number seperated by a comma,
This is the text file
Benny, 102
Ann, 100
Carol, 214
Darren, 129
I succeded in making it read the first line but the trying to print the second line by calling on the name there keeps returning a nill. When I switch the lines, the same thing occurs, it reads the name in the first line but returns nill on the name in the second file.
This is the function I tried to use to read the texts:
def getUserPoint(userName):
f = open('userScores.txt', 'r')
for line in f:
result = line.splitlines()
if userName in line:
return result
else:
return "nill"
f.close()
s = getUserPoint(input('Ann'))
print(s)
And this is the result:
nill
and this is the instructions:
Each line records the information of one user. The first value is the user’s username and the second is the user’s score.
Next, the function reads the file line by line using a for loop. Each line is then split using the split() function
Let’s store the results of the split() function in the list content.
Next, the function checks if any of the lines has the same username as the value that is passed in as the parameter. If there is, the function closes the file and returns the score beside that username. If there isn’t, the function closes the file and returns the string ‘-1’
Am terribly sorry for the long winded post.
you can use :
def getUserPoint(userName):
f = open('userScores.txt', 'r')
for line in f.readlines():
result = line.splitlines()
if userName in line:
f.close()
return result
f.close()
return "nill"
s = getUserPoint(input('Ann'))
print(s)
One problem is that you have an else statement that is matched and will immediately end the function and loop
You need to return the default result after you've looked at all lines
def getUserPoint(userName):
with open('userScores.txt') as f:
for line in f:
if userName == line.rstrip().split(',')[0]:
return line
return "nill"
Then, as shown, you either want to split the comma and check the first column, or userName in line . Otherwise, you are checking
'Ann' in ["Ann, 100", ""]
since splitlines() will split at the newline character at the end, which returns False
See below
The code takes care of closing the file.
It will return None if no match found, else 'user point' is returned
def get_user_point(user_name):
with open('userScores.txt', 'r') as f:
lines = [l.strip() for l in f]
for line in lines:
parts = line.split(',')
if user_name == parts[0]:
return parts[1]
Thanks everyone for the help...
This code by OneCricketeer worked:
def getUserPoint(userName):
with open('userScores.txt') as f:
for line in f:
if userName == line.split(',')[0]:
return line
return "nill"
Since am new to Python and programming in General, I will probably be asking a lot more questions.
Thanks for the help everyone.

Python reading file error

I'm trying to read a list of number from a text file and I am getting this error when I run my code:
ValueError: invalid literal for float(): -4.4987000e-01 -2.0049000e-01 -4.8729000e-01 -6.1085000e-02 -5.1024000e-02 -2.1653000e-02
Here is my code:
def read_data_file(datafile, token):
dataset = []
with open(datafile, 'r') as file:
for line in file:
#split each word by token
data = line[:-1].split(token)
tmp = []
for x in data:
if x != "":
x = float(x)
tmp.append(x)
else:
tmp.append(1e+99)
dataset.append(tmp)
return dataset
The program encounter the error at the line: x = float(x)
You have not provided the full code and data file to reproduce your problem exactly. But it's possible to guess your issue from the error message.
Your line[:-1].split(token) failed to break up line into single-number strings. Either you picked an incorrect separator (token is a poor choice of name), or/and omitting the last character from line broke it.
Try this (also reducing many unnecessary lines)
def read_data_file(datafile):
dataset = []
for line in open(datafile, 'r'):
# no need to assign open() to file, it will be closed when for is done
dataset.append( [ float(token) for token in line.split() ] )
# default (separator=None) works for most situations
dataset.append(1e+99)
# do you really need to append a fake number?
# a better way might be a list of lists
return dataset
This assumes that the data in your file is on one line, like below:
-4.4987000e-01 -2.0049000e-01 -4.8729000e-01 -6.1085000e-02 -5.1024000e-02 -2.1653000e-02
I made a few adjustments and ran your code as below and it worked. It may not be what you want, but if that's the case then you need to be more specific as to what you want. If the data in your file doesn't look like the above then you also need to show us exactly how it resides in your file.
def read_data_file(datafile, token):
dataset = []
with open(datafile, 'r') as file:
for line in file:
#split each word by token
data = line[:-1].split(token)
tmp = []
for x in data:
if x != "":
x = float(x)
tmp.append(x)
else:
tmp.append(1e+99)
dataset.append(tmp)
return dataset
dataset = read_data_file('test_data.txt', ' ')
print(dataset)
# output
'''
[[-0.44987, -0.20049, -0.48729, -0.061085, -0.051024, -0.021653]]
'''
As mention, using token as a parameter is not a good choice but it will work. If your data is always going to be space separated and on a single line then do away with a 'token' and do this: data = line[:-1].split(' ')
Use str.strip to remove any trailing space. And the split by the required token. And you can use map to apply float on each element on list.
Ex:
def read_data_file(datafile, token):
dataset = []
with open(datafile, 'r') as file:
for line in file:
#split each word by token
dataset.append(map(float, map(float, line.strip().split(token))))
return dataset
You need to provide a check before you call float to make sure there are no other data that will cause the float function to result in the error. Usually it is the character \x00. You will need to call something like
filtered_value = x.replace("\x00", "")
filtered_value = float(filtered_value)
tmp.append(filtered_value)

Same value in list keeps getting repeated when writing to text file

I'm a total noob to Python and need some help with my code.
The code is meant to take Input.txt [http://pastebin.com/bMdjrqFE], split it into seperate Pokemon (in a list), and then split that into seperate values which I use to reformat the data and write it to Output.txt.
However, when I run the program, only the last Pokemon gets outputted, 386 times. [http://pastebin.com/wkHzvvgE]
Here's my code:
f = open("Input.txt", "r")#opens the file (input.txt)
nf = open("Output.txt", "w")#opens the file (output.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
num = 0
tab = """ """
newl = """NEWL
"""
slash = "/"
while num != 386:
current = pokeData
current.append(line)
print current[num]
for tab in current:
words = tab.split()
print words
for newl in words:
nf.write('%s:{num:%s,species:"%s",types:["%s","%s"],baseStats:{hp:%s,atk:%s,def:%s,spa:%s,spd:%s,spe:%s},abilities:{0:"%s"},{1:"%s"},heightm:%s,weightkg:%s,color:"Who cares",eggGroups:["%s"],["%s"]},\n' % (str(words[2]).lower(),str(words[1]),str(words[2]),str(words[3]),str(words[4]),str(words[5]),str(words[6]),str(words[7]),str(words[8]),str(words[9]),str(words[10]),str(words[12]).replace("_"," "),str(words[12]),str(words[14]),str(words[15]),str(words[16]),str(words[16])))
num = num + 1
nf.close()
f.close()
There are quite a few problems with your program starting with the file reading.
To read the lines of a file to an array you can use file.readlines().
So instead of
f = open("Input.txt", "r")#opens the file (input.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
You can just do this
pokeData = open("Input.txt", "r").readlines() # This will return each line within an array.
Next you are misunderstanding the uses of for and while.
A for loop in python is designed to iterate through an array or list as shown below. I don't know what you were trying to do by for newl in words, a for loop will create a new variable and then iterate through an array setting the value of this new variable. Refer below.
array = ["one", "two", "three"]
for i in array: # i is created
print (i)
The output will be:
one
two
three
So to fix alot of this code you can replace the whole while loop with something like this.
(The code below is assuming your input file has been formatted such that all the words are split by tabs)
for line in pokeData:
words = line.split (tab) # Split the line by tabs
nf.write ('your very long and complicated string')
Other helpers
The formatted string that you write to the output file looks very similar to the JSON format. There is a builtin python module called json that can convert a native python dict type to a json string. This will probably make things alot easier for you but either way works.
Hope this helps

How to open a file and change a line

I've looked at loads of threads on this but cant find the right answer:
I'm a bit new to python. Im opening a file in python, inserting one line before the beginning line and one before the last line and then reconstructing all the lines into a string variable at the end.
Heres the function Im trying to run on the opened file. This changes all lines where mode="0" to mode="1" or vice versa.
def Modes(file, mode):
endstring = ''
if(mode == 1):
mode0 = 'mode="0"'
mode1 = 'mode="1"'
else:
mode0 = 'mode="1"'
mode1 = 'mode="0"'
for line in iter(file):
if 'Modes' in line:
line = line.replace(mode0,mode1)
endstring += line
return endstring
So I try the following:
mode = 1
input_file = open("c:\myfile.txt")
input_file.readlines()
lengthlines = len(input_file)
#insert line at position 1
input_file.insert(1,'<VARIABLE name="init1" />')
#insert line at last line position - 1
input_file.insert(lengthlines,'<VARIABLE name="init2" />')
#join the lines back up again
input_file = "".join(input_file)
#run the modes function - to replace all occurrences of mode=x
finalfile = Modes(input_file,mode)
print finalfile
And then Im getting the error, "object of type file, has no "len()"" and general object/list errors.
It seems Im getting objects/lists etc mixed up but im not sure where - would be grateful for any assistance - cheers
input_file.readlines() returns the content but does not assign it to input_file.
You'll have to assign the return value of the call to a variable, like so:
file_content = input_file.readlines()
and then pass that to len()
lengthlines = len(file_content)
EDIT: solving the issue with len() leads to further exceptions.
This should roughly do what you want:
mode = 1
with open("c:\myfile.txt") as input_file:
file_content = list(input_file.readlines())
file_content.insert(0,'<VARIABLE name="init1" />')
file_content.append('<VARIABLE name="init2" />')
finalfile = Modes(file_content,mode)
print finalfile
You might have to alter your string concatenation in the function if you want to stick with several lines.
endstring += line + '\n'
return endstring.rstrip('\n')
This does not yet write the new content back to the file though.
EDIT2: And it is always good practice to close the file when you are done with it, therefore I updated the above to use a context manager that takes care of this. You could also explicitly call input_file.close() after you are finished.

Locating a particular word in a text file and storing the integer after it as a variable?

For example, if a text file contains the text "Serves 4", what code can I use in order to locate where the word "Serves" occurs in the file (assume it occurs once) and store the integer right after it as a variable?
I have already parsed the file and the current variable name is:
def parse_file(filename):
file = open(filename, 'r')
name = (file.readline()).strip(' \t\n\r') #name on first line
ingredients = []
instructions = ""
servings = 0
line = ""
As of now, I reach the keyword "Serves" and the integer that follows it using this piece of code (the text files being read are consistent in their formatting, which is why it is done like this)
#reading serving size which is several numbers (\d+)
line = (file.readline()).strip(' \t\n\r')
servings = int((re.search("\d+", line)).group(0))
However, this only works if the text file I am dealing with does not have any line breaks in between the name, ingredients, and instructions sections of the text file. I need for my code to be able to handle all cases, line break or not.
This is the kind of file I am dealing with:
https://www.dropbox.com/s/cbumq16pgjqyr95/recipe.txt?dl=0
You can make a loop over your lines and when you find Server in a line use re.search() to grub the number :
with open('your_file.txt','r') as f :
for line in f :
if 'Server' in line :
print re.search(r'Server (\d+)',line).group(1)

Categories