Taking information by line from a file to tuple - python

I'm doing a python decryting program for a school project.
So first of all, i have a function who takes a file as argument. Then i must take all the line by line and return a tuple.
This file containt 3 things : -a number(whatever it's), -the decrypted text, -the crypted text)
import sys
fileName = sys.argv[-1]
def load_data(fileName):
tuple = ()
data = open(fileName, 'r')
content = data.readlines()
for i in contenu:
tuple += (i,)
return tuple #does nothing why?
print(tuple)
load_data(fileName)
Output:
('13\n', 'mecanisme chiffres substituer\n', "'dmnucmnn gmnuaetiihmnunofrutfrmhamprmnunshusfua f ludmuaoccsfta rtofumruvosnu vmzul ur aemudmulmnudmaetiihmhulmnucmnn gmnuaetiihmnunofrudtnpoftblmnunosnul uiohcmudusfurmxrmuaofnrtrsmudmulmrrhmnuctfsnaslmnun fnu aamfrumrudmua h armhmnubl fanuvosnun vmzuqsmulmucma ftncmudmuaetiihmcmfrusrtltnmuaofntnrmu unsbnrtrsmhulmnua h armhmnudsucmnn gmudmudmp hrup hudu srhmnumfuhmnpmar frusfudtartoff thmudmuaetiihmcmfr'")
Output needed:
(13,'mecanisme chiffres substituer','dmnucmnn gmnuaetiihmnunofrutfrmhamprmnunshusfua f ludmuaoccsfta rtofumruvosnu vmzul ur aemudmulmnudmaetiihmhulmnucmnn gmnuaetiihmnunofrudtnpoftblmnunosnul uiohcmudusfurmxrmuaofnrtrsmudmulmrrhmnuctfsnaslmnun fnu aamfrumrudmua h armhmnubl fanuvosnun vmzuqsmulmucma ftncmudmuaetiihmcmfrusrtltnmuaofntnrmu unsbnrtrsmhulmnua h armhmnudsucmnn gmudmudmp hrup hudu srhmnumfuhmnpmar frusfudtartoff thmudmuaetiihmcmfr')
The tuple need to be like this (count,word_list,crypted), 13 as count and so on..
If someone can help me it would be great.
Sorry if i'm asking wrongly my question..

You could try this to avoid the '\n' characters at the end
import sys
fileName = sys.argv[-1]
def load_data(fileName):
tuple = ()
data = open(fileName, 'r')
content = data.readlines()
for i in content:
tuple += (i.strip(''' \n'"'''),)
return tuple
print(load_data(fileName));
Note that a function ends when ever it finds a return statement, if you want to print the value of tuple do the before return statement or print the returned value.

I am a little confused about what the file in question looks like, but from what I could infer from the output you got the file appears to be something like this:
some number
decrypted text
encrypted text
If so, the most straightforward way to do this would be
with open('lines.txt','r') as f:
all_the_text = f.read()
list_of_text = all_the_text.split('\n')
tuple_of_text = tuple(list_of_text)
print(tuple_of_text)
Explanation:
The open built-in function creates an object that allows you to interact with the file. We use open with the argument 'r' to let it know we only want to read from the file. Doing this within a with statement ensures that the file gets closed properly when you are done with it. The as keyword followed by f tells us that we want the file object to be placed into the variable f. f.read() reads in all of the text in the file. String objects in python contain a split method that will place strings separated by some delimiter into a list without placing the delimiter into the separated strings. The split method will return the results in a list. To put it into a tuple, simply pass the list into tuple.

Related

im trying this question but i just cant seem to get the code working, after the sentence ive attached a picture of my work

load_datafile() takes a single string parameter representing the filename of a datafile.
This function must read the content of the file, convert all letters to their lowercase, and store
the result in a string, and finally return that string. I will refer to this string as data throughout
this specification, you may rename it. You must also handle all exceptions in case the datafile
is not available.
Sample output:
data = load_datafile('harry.txt')
print(data)
the hottest day of the summer so far was drawing to a close and a drowsy silence
lay over the large, square houses of privet drive.
load_wordfile() takes a single string argument representing the filename of a wordfile.
This function must read the content of the wordfile and store all words in a one-dimensional
list and return the list. Make sure that the words do not have any additional whitespace or newline character in them. You must also handle all exceptions in case the files are not
available.
Sample outputs:
pos_words = load_wordfile("positivewords.txt")
print(pos_words[2:9])
['abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed',
'acclamation']
neg_words = load_wordfile("negativewords.txt")
print(neg_words[10:19])
['aborts', 'abrade', 'abrasive', 'abrupt', 'abruptly', 'abscond', 'absence',
'absent-minded', 'absentee']
MY CODE BELOW
def load_datafile('harryPotter.txt'):
data = ""
with open('harryPotter.txt') as file:
lines = file.readlines()
temp = lines[-1].lower()
return data
Your code has two main problems. The first one is that you are assigning an empty string to the variable data and returning it, so no matter what you do with the contents of the file you always return an empty string. The second one is that file.readlines() returns a list of strings, where each line in the file is an element on the list and you are only converting the last element lines[-1] to lowercase.
To fix your code you should make sure that you store the contents of the file on the data variable and you should apply the lower() function to each line on the file and not just the last one. Something like this:
def load_datafile(file_name):
data = ''
with open(file_name) as file:
lines = file.readlines()
for line in lines:
data = data + line.lower() + '\n'
return data
The previous example is not the best way of doing this but it's very easy to understand what is happening and I think that is more important when you are starting. To make it more efficient you might want to change it to:
def load_datafile(file_name):
with open(file_name) as file:
return '\n'.join(line.lower() for line in file.readlines())

Accept multiple files in parameter using args python

I need to be able to import and manipulate multiple text files in the function parameter. I figured using *args in the function parameter would work, but I get an error about tuples and strings.
def open_file(*filename):
file = open(filename,'r')
text = file.read().strip(punctuation).lower()
print(text)
open_file('Strawson.txt','BigData.txt')
ERROR: expected str, bytes or os.PathLike object, not tuple
How do I do this the right way?
When you use the *args syntax in a function parameter list it allows you to call the function with multiple arguments that will appear as a tuple to your function. So to perform a process on each of those arguments you need to create a loop. Like this:
from string import punctuation
# Make a translation table to delete punctuation
no_punct = dict.fromkeys(map(ord, punctuation))
def open_file(*filenames):
for filename in filenames:
print('FILE', filename)
with open(filename) as file:
text = file.read()
text = text.translate(no_punct).lower()
print(text)
print()
#test
open_file('Strawson.txt', 'BigData.txt')
I've also included a dictionary no_punct that can be used to remove all punctuation from the text. And I've used a with statement so each file will get closed automatically.
If you want the function to "return" the processed contents of each file, you can't just put return into the loop because that tells the function to exit. You could save the file contents into a list, and return that at the end of the loop. But a better option is to turn the function into a generator. The Python yield keyword makes that simple. Here's an example to get you started.
def open_file(*filenames):
for filename in filenames:
print('FILE', filename)
with open(filename) as file:
text = file.read()
text = text.translate(no_punct).lower()
yield text
def create_tokens(*filenames):
tokens = []
for text in open_file(*filenames):
tokens.append(text.split())
return tokens
files = '1.txt','2.txt','3.txt'
tokens = create_tokens(*files)
print(tokens)
Note that I removed the word.strip(punctuation).lower() stuff from create_tokens: it's not needed because we're already removing all punctuation and folding the text to lower-case inside open_file.
We don't really need two functions here. We can combine everything into one:
def create_tokens(*filenames):
for filename in filenames:
#print('FILE', filename)
with open(filename) as file:
text = file.read()
text = text.translate(no_punct).lower()
yield text.split()
tokens = list(create_tokens('1.txt','2.txt','3.txt'))
print(tokens)

Same value in list keeps getting repeated when writing to text file

I'm a total noob to Python and need some help with my code.
The code is meant to take Input.txt [http://pastebin.com/bMdjrqFE], split it into seperate Pokemon (in a list), and then split that into seperate values which I use to reformat the data and write it to Output.txt.
However, when I run the program, only the last Pokemon gets outputted, 386 times. [http://pastebin.com/wkHzvvgE]
Here's my code:
f = open("Input.txt", "r")#opens the file (input.txt)
nf = open("Output.txt", "w")#opens the file (output.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
num = 0
tab = """ """
newl = """NEWL
"""
slash = "/"
while num != 386:
current = pokeData
current.append(line)
print current[num]
for tab in current:
words = tab.split()
print words
for newl in words:
nf.write('%s:{num:%s,species:"%s",types:["%s","%s"],baseStats:{hp:%s,atk:%s,def:%s,spa:%s,spd:%s,spe:%s},abilities:{0:"%s"},{1:"%s"},heightm:%s,weightkg:%s,color:"Who cares",eggGroups:["%s"],["%s"]},\n' % (str(words[2]).lower(),str(words[1]),str(words[2]),str(words[3]),str(words[4]),str(words[5]),str(words[6]),str(words[7]),str(words[8]),str(words[9]),str(words[10]),str(words[12]).replace("_"," "),str(words[12]),str(words[14]),str(words[15]),str(words[16]),str(words[16])))
num = num + 1
nf.close()
f.close()
There are quite a few problems with your program starting with the file reading.
To read the lines of a file to an array you can use file.readlines().
So instead of
f = open("Input.txt", "r")#opens the file (input.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
You can just do this
pokeData = open("Input.txt", "r").readlines() # This will return each line within an array.
Next you are misunderstanding the uses of for and while.
A for loop in python is designed to iterate through an array or list as shown below. I don't know what you were trying to do by for newl in words, a for loop will create a new variable and then iterate through an array setting the value of this new variable. Refer below.
array = ["one", "two", "three"]
for i in array: # i is created
print (i)
The output will be:
one
two
three
So to fix alot of this code you can replace the whole while loop with something like this.
(The code below is assuming your input file has been formatted such that all the words are split by tabs)
for line in pokeData:
words = line.split (tab) # Split the line by tabs
nf.write ('your very long and complicated string')
Other helpers
The formatted string that you write to the output file looks very similar to the JSON format. There is a builtin python module called json that can convert a native python dict type to a json string. This will probably make things alot easier for you but either way works.
Hope this helps

python2.7 csv: iterate through lines and store in array by appending with split()

I'm trying to figure out if my comment in the loop is correct. Will the variable 'device' be a "list of lists" like I'm hoping? If so, can I call the data by using device[0][0]? Or, say I want the third line and second item, use device[2][1]?
def deviceFile():
devFile = raw_input('Enter the full file path to your device list file: \n-->')
open(devFile, 'r')
device = []
for line in devFile:
# creating an array here to hold each line. Call with object[0][0]
device.append(line.split(','))
return(device)
Edit:
def deviceFile():
'''This def will be used to retrieve a list of devices that will have
commands sent to it via sendCommands(). The user will need to supply
the full file path to the raw_input(). This file will need to be a csv,
will need to have column0 be the IP address, and column1 be the
hostname(for file naming purposes). If a hostname is not supplied in
column1, the file will be named with the IP address.'''
devFile = raw_input('Enter the full file path to your device list file: \n-->')
thisFile = open(devFile, 'r')
device = []
for line in thisFile:
# creating an array here to hold each line. Call with object[0][0]
device.append(line.split(','))
thisFile.close()
return(device)
This is more of a 'am I doing this logically' more than 'is my code perfect' type of question. I want each line of the csv to be it's own list and be able to access it by calling it in my main program:
devices = deviceFile()
machine = devices[0][0]
returns the first item on the first line
machine = devices[2][1]
returns the second item on the third line
Your problem is that you're not doing anything with the file object (the thing that open returns), but instead trying to operate on the filename as if it were a file object. So, just change this:
devFile = raw_input('Enter the full file path to your device list file: \n-->')
open(devPath, 'r')
to this:
devPath = raw_input('Enter the full file path to your device list file: \n-->')
devFile = open(devPath, 'r')
Once you do that, it "works", but maybe not in the way you intended. For example, for this file:
abc, def
ghi, jkl
You'll get back this:
[['abc', ' def\n'], ['ghi', ' jkl\n']]
The '\n' characters are there because for line in devFile: returns lines with the newline characters preserved. If you want to get rid of them, you have to do something, such as rstrip('\n').
The spaces are there because split doesn't do anything magical with spaces. You ask it to split 'abc, def' on ',' and you're going to get 'abc' and ' def' back. If you want to get rid of them, strip the result.
You've got various other minor problems—e.g., you never close the file—but none of them will actually stop your code from working.
So:
def deviceFile():
devPath = raw_input('Enter the full file path to your device list file: \n-->')
devFile = open(devPath, 'r')
device = []
for line in devFile:
# creating an array here to hold each line. Call with object[0][0]
device.append([value.strip() for value in line.rstrip('\n').split(',')])
return(device)
Now you'll return this:
[['abc', 'def'], ['ghi', 'jkl']]
That device.append([value.strip() for value in line.rstrip('\n').split(',')]) looks pretty complicated. Calling rstrip on each line before you can split it is no big deal, but that list comprehension to call strip on each value makes it a bit hard to read. And if you don't know what list comprehensions are (which seems likely, given that you've got that explicit loop around append for the device list), you have to do something like this:
device = []
for line in devFile:
values = []
for value in line.rstrip('\n').split(','):
values.append(value.strip())
device.append(values)
However, there's a much easier way to do this. The csv module in the standard library takes care of all that tricky stuff with the newlines and whitespace, as well as things you haven't even thought of yet (like quoted or escaped values).
def deviceFile():
devPath = raw_input('Enter the full file path to your device list file: \n-->')
with open(devPath) as devFile:
return list(csv.reader(devFile))
Correct me if I'm wrong, I think you're trying to read a file and then store each line in the file (separated by a comma) into an array. For example, if you have a text file that simply says "one, two, three" do you want this to create an array ['one', 'two', 'three']? If so, you don't need a for loop, simply do this:
def deviceFile():
devFile = raw_input('Enter the full file path to your device list file: \n-->')
myFile = open(devFile, 'r') # Open file in read mode
readFile = myFile.read() # Read the file
# Replace enter characters with commas
readFile = readFile.replace('\n', ',')
# Split the file by commas, return an array
return readFile.split(',')
The reason you don't need a for loop is because str.split() already returns an array. In fact, you don't even need to append "device", you don't need device at all. See the string documentation for more info.

Trouble sorting a list with python

I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers?
So far I have the basic stuff
#!/usr/bin/python
import sys
try:
infilename = sys.argv[1] #outfilename = sys.argv[2]
except:
print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1)
ifile = open(infilename, 'r')
#ofile = open(outfilename, 'w')
data = ifile.readlines()
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
ifile.close()
print '\n'.join(r)
#ofile.writelines(r)
#ofile.close()
The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file.
data = ifile.readlines()
is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to
data = ifile.readlines()[0].split()
You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines
with open('some.txt') as f:
data = f.read() #now data = "item 1 item2 etc..."
you can use re to filter out unwanted characters
import re
data = "ro!ad"
fixed_data = re.sub("[!?#$]","",data)
partition maybe overkill
data = "hello 23frank sam wilbur"
my_list = data.split() # ["hello","23frank","sam","wilbur"]
print sorted(my_list)
however you will need to do more to force numbers to sort maybe something like
numbers = [x for x in my_list if x[0].isdigit()]
strings = [x for x in my_list if not x[0].isdigit()]
sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space.
So your file contains a single line?
data = ifile.readlines()
This makes data into a list of the lines in your file. All 1 of them.
r = sorted(...)
This makes r the sorted version of that list.
To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).

Categories