Python- File Parsing

Python- File Parsing - python

Write a program which reads a text
file called input.txt which contains
an arbitrary number of lines of the
form ", " then records this
information using a dictionary, and
finally outputs to the screen a list
of countries represented in the file
and the number of cities contained.
For example, if input.txt contained the following:
New York, US
Angers, France
Los Angeles, US
Pau, France
Dunkerque, France
Mecca, Saudi Arabia
The program would output the following (in some order):
Saudi Arabia : 1
US : 2
France : 3
My code:
from os import dirname
def parseFile(filename, envin, envout = {}):
exec "from sys import path" in envin
exec "path.append(\"" + dirname(filename) + "\")" in envin
envin.pop("path")
lines = open(filename, 'r').read()
exec lines in envin
returndict = {}
for key in envout:
returndict[key] = envin[key]
return returndict
I get a Syntax error: invalid syntax... when I use my file name
i used file name input.txt

I don't understand what you are trying to do, so I can't really explain how to fix it. In particular, why are you execing the lines of the file? And why write exec "foo" instead of just foo? I think you should go back to a basic Python tutorial...
Anyway, what you need to do is:
open the file using its full path
for line in file: process the line and store it in a dictionary
return the dictionary
That's it, no exec involved.

Yup, that's a whole lot of crap you either don't need or shouldn't do. Here's how I'd do it prior to Python 2.7 (after that, use collections.Counter as shown in the other answers). Mind you, this'll return the dictionary containing the counts, not print it, you'd have to do that externally. I'd also not prefer to give a complete solution for homeworks, but it's already been done, so I suppose there's no real damage in explaining a bit about it.
def parseFile(filename):
with open(filename, 'r') as fh:
lines = fh.readlines()
d={}
for country in [line.split(',')[1].strip() for line in lines]:
d[country] = d.get(country,0) + 1
return d
Lets break that down a bit, shall we?
with open(filename, 'r') as fh:
lines = fh.readlines()
This is how you'd normally open a text file for reading. It will raise an IOError exception if the file doesn't exist or you don't have permissions or the likes, so you'll want to catch that. readlines() reads the entire file and splits it into lines, each line becomes an element in a list.
d={}
This simply initializes an empty dictionary
for country in [line.split(',')[1].strip() for line in lines]:
Here is where the fun starts. The bracket enclosed part to the right is called a list comprehension, and it basically generates a list for you. What it pretty much says, in plain english, is "for each element 'line' in the list 'lines', take that element/line, split it on each comma, take the second element (index 1) of the list you get from the split, strip off any whitespace from it, and use the result as an element in the new list"
Then, the left part of it just iterates over the generated list, giving the name 'country' to the current element in the scope of the loop body.
d[country] = d.get(country,0) + 1
Ok, ponder for a second what would happen if instead of the above line, we'd used the following:
d[country] = d[country] + 1
It'd crash, right (KeyError exception), because d[country] doesn't have a value the first time around.
So we use the get() method, all dictionaries have it. Here's the nifty part - get() takes an optional second argument, which is what we want to get from it if the element we're looking for doesn't exist. So instead of crashing, it returns 0, which (unlike None) we can add 1 to, and update the dictionary with the new count. Then we just return the lot of it.
Hope it helps.

I would use a defaultdict plus a list to mantain the structure of the information.
So additional statistics can be derived.
import collections
def parse_cities(filepath):
countries_cities_map = collections.defaultdict(list)
with open(filepath) as fd:
for line in fd:
values = line.strip().split(',')
if len(values) == 2:
city, country = values
countries_cities_map[country].append(city)
return countries_cities_map
def format_cities_per_country(countries_cities_map):
for country, cities in countries_cities_map.iteritems():
print " {ncities} Cities found in {country} country".format(country=country, ncities = len(cities))
if __name__ == '__main__':
import sys
filepath = sys.argv[1]
format_cities_per_country(parse_cities(filepath))

import collections
def readFile(fname):
with open(fname) as inf:
return [tuple(s.strip() for s in line.split(",")) for line in inf]
def countCountries(city_list):
return collections.Counter(country for city,country in city_list)
def main():
cities = readFile("input.txt")
countries = countCountries(cities)
print("{0} cities found in {1} countries:".format(len(cities), len(countries)))
for country, num in countries.iteritems():
print("{country}: {num}".format(country=country, num=num))
if __name__=="__main__":
main()

Related

Taking information by line from a file to tuple

I'm doing a python decryting program for a school project.
So first of all, i have a function who takes a file as argument. Then i must take all the line by line and return a tuple.
This file containt 3 things : -a number(whatever it's), -the decrypted text, -the crypted text)
import sys
fileName = sys.argv[-1]
def load_data(fileName):
tuple = ()
data = open(fileName, 'r')
content = data.readlines()
for i in contenu:
tuple += (i,)
return tuple #does nothing why?
print(tuple)
load_data(fileName)
Output:
('13\n', 'mecanisme chiffres substituer\n', "'dmnucmnn gmnuaetiihmnunofrutfrmhamprmnunshusfua f ludmuaoccsfta rtofumruvosnu vmzul ur aemudmulmnudmaetiihmhulmnucmnn gmnuaetiihmnunofrudtnpoftblmnunosnul uiohcmudusfurmxrmuaofnrtrsmudmulmrrhmnuctfsnaslmnun fnu aamfrumrudmua h armhmnubl fanuvosnun vmzuqsmulmucma ftncmudmuaetiihmcmfrusrtltnmuaofntnrmu unsbnrtrsmhulmnua h armhmnudsucmnn gmudmudmp hrup hudu srhmnumfuhmnpmar frusfudtartoff thmudmuaetiihmcmfr'")
Output needed:
(13,'mecanisme chiffres substituer','dmnucmnn gmnuaetiihmnunofrutfrmhamprmnunshusfua f ludmuaoccsfta rtofumruvosnu vmzul ur aemudmulmnudmaetiihmhulmnucmnn gmnuaetiihmnunofrudtnpoftblmnunosnul uiohcmudusfurmxrmuaofnrtrsmudmulmrrhmnuctfsnaslmnun fnu aamfrumrudmua h armhmnubl fanuvosnun vmzuqsmulmucma ftncmudmuaetiihmcmfrusrtltnmuaofntnrmu unsbnrtrsmhulmnua h armhmnudsucmnn gmudmudmp hrup hudu srhmnumfuhmnpmar frusfudtartoff thmudmuaetiihmcmfr')
The tuple need to be like this (count,word_list,crypted), 13 as count and so on..
If someone can help me it would be great.
Sorry if i'm asking wrongly my question..

You could try this to avoid the '\n' characters at the end
import sys
fileName = sys.argv[-1]
def load_data(fileName):
tuple = ()
data = open(fileName, 'r')
content = data.readlines()
for i in content:
tuple += (i.strip(''' \n'"'''),)
return tuple
print(load_data(fileName));
Note that a function ends when ever it finds a return statement, if you want to print the value of tuple do the before return statement or print the returned value.

I am a little confused about what the file in question looks like, but from what I could infer from the output you got the file appears to be something like this:
some number
decrypted text
encrypted text
If so, the most straightforward way to do this would be
with open('lines.txt','r') as f:
all_the_text = f.read()
list_of_text = all_the_text.split('\n')
tuple_of_text = tuple(list_of_text)
print(tuple_of_text)
Explanation:
The open built-in function creates an object that allows you to interact with the file. We use open with the argument 'r' to let it know we only want to read from the file. Doing this within a with statement ensures that the file gets closed properly when you are done with it. The as keyword followed by f tells us that we want the file object to be placed into the variable f. f.read() reads in all of the text in the file. String objects in python contain a split method that will place strings separated by some delimiter into a list without placing the delimiter into the separated strings. The split method will return the results in a list. To put it into a tuple, simply pass the list into tuple.

Need help iterating through a dictionary - Python

I'm a beginner in Python.
I have a simple dictionary called G8_Leaders.txt - as follows: {Cameron:UK, Merkel:Germany, Obama:USA, Putin:Russia}
and I'm trying to iterate through the pairs - display the contents in a column, using a basic "for" loop like so:
f0 = "G8_Leaders.txt"
f1 = open(f0)
for i in f1:
print(i, end=" ")
else:
print("Finished with Document: ", f0)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I'd like the results in a column, such as:
Merkel Germany
Cameron UK
Obama USA
and so on...However, the results are only printed in one line, as they are displayed in the text file.
What is the proper way to do this?

First, you have to fix the representation of your dictionary in your file to be as:
d = {'Cameron':'UK', 'Merkel':'Germany', 'Obama':'USA', 'Putin':'Russia'}
It can be done programmatically by invoking str() on each key and value of d.
Assuming that that is fixed, one solution would be:
# Create a file for test bench:
d = {'Cameron':'UK', 'Merkel':'Germany', 'Obama':'USA', 'Putin':'Russia'}
with open("G8_Leaders.txt", "w") as f:
f.write(str(d) +'\n')
#Read and print from the file:
with open("G8_Leaders.txt", "r") as f:
data = f.readlines()
for line in data:
d = eval(line)
for y, z in d.items():
print(y, z)
Output:
Cameron UK
Merkel Germany
Putin Russia
Obama USA

Ok, a couple things are going on here. f1 is a file object, which is not what you are expecting. If you want to access it as a string, you would do
`f1_string = f1.read()`
Which reads the file into the variable f1_string as a string. You could do:
for line in f1.readlines():
# operate on one line at a time in this block
if you would prefer to read a line at a time, which will work in your case if each element is on its own line in the text file. Since we don't know what exactly what your text file looks like, it's hard to give more specific help. The json module people are referring to can be found here, but I'm not entirely sure that's what you want.

Assuming that your text file is just as you've described it, then try the following:
with open(f0, 'r') as f:
for line in f:
items = line.split(',')
for item in items:
leader = item.split(':')[0].replace('{','')
country = item.split(':')[1].replace('}','')
print(leader,country)
This should work regardless of whether or not your data has line breaks in it.

Try this to iterate through a dictionary:
leaders = {'Cameron':'UK', 'Merkel':'Germany', 'Obama':'USA', 'Putin':'Russia'}
for leader in leaders:
print(leader, leaders[leader])

import json
print("\n".join(x+" "+ y for x,y in json.load(open("fname.txt" ,"rb")).iteritems()))
just load it as json(at least I think its not entirely clear if you have properly formatted json in your text file ) ... (warning though dictionaries are unordered so if you need a certain order you will have to provide more clarification)

Reading data from multiple csv-text files into a single text file using python

So I'm creating a program which should be able to read 8 separate text files and gather the information from those files into a single files.
First file contains information about the atheletes like this:
number;name;division.
The other files contain results from individual sport events like this:
number;result.
The program should be able to gather all the information about the athletes and put into a single file like this:
number;name;division;event1;event2...;event7.
The number is atheletes participant number and all other information should be "linked" to that number.
I'm really confused whether to use dict or list or both to handle and store the information from the text files.
The program is a lot more complex than explained above but I can work out the details myself. Also the allowed import libraries are math, random and time. I know these are pretty vague instructions but like I said I don't need a complete, functional program but rather guidelines how to get started. Thanks!

Consult this post for how to read a file line-by-line.
with open(...) as f:
for line in f:
<do something with line>
Consult this post on how to split each line of a CSV.
Consult this post about how to add to a dictionary. I suggest adding a tuple as each entry in the dictionary.
d['mynewkey'] = 'mynewvalue'
Then concatenate and reassign the tuples to add data from new files:
d['mynewkey']=d['mynewkey'] + (newval1, newval2, newval3)
And remember, it is the commas that make a tuple, not the parentheses.
That should get you started.

first of all
open csv file for writing, next open all of your text files.
To do this use python with statement. You can easily open all text files in one line :)
with open('result.csv', 'w') as csvfile:
# write column headers
csvfile.write('number;name;division;event1; ...')
with open('file1.txt', 'r') as f1, open('file2.txt' , 'r') as f2, open(...) as f:
f1_line = f1.readline()
f2_line = f2.readline()
# rest of your login ....
csvfile.write(';'.join(item for item in [number, name, division, event1, ...]) + '.\n')
when You open all files, read from them line by line. Collect lines from all files, extract from line what you need and write it to csv file :)
PS. I don't know how many lines your files will have, but loading everything to memory (list or dict whatever) isn't good idea....

You can use a dict with the athelete numbers as keys to identify them, and use a class to store all other information in a meaningful and nice way. The results can be added to a list of an athlete object, and the athlete object be identified by the number (which is the dict key).
Sample input athletes.csv:
1;Jordan;Basketball.
2;Michael;Soccer.
3;Ariell;Swimming.
Sample input athletes_events.csv:
2;23.5.
2;25.7.
3;174.5.
1;13.
1;15.
2;21.3.
3;159.9.
2;28.6
1;19.
Code:
class Athlete:
def __init__(self, name, division):
self.name = name
self.division = division
self.events = []
athletes = {}
with open("athletes.csv") as file:
for line in file:
number, name, division = line.strip(".\n").split(";")
# could cast number to int, but we don't have to
athletes[number] = Athlete(name, division)
with open("athletes_events.csv") as file:
for line in file:
number, result = line.strip("\n").split(";")
result = float(result.rstrip("."))
try:
athletes[number].events.append(result)
except KeyError:
print("There's no athlete with number %s" % number)
for number, athlete in sorted(athletes.items()):
print("%s - %s (%s):" % (athlete.name, athlete.division, number))
for i, result in enumerate(athlete.events, 1):
print(" Event %i = %s" % (i, result))
print()
Result:
Jordan - Basketball (1):
Event 1 = 13.0
Event 2 = 15.0
Event 3 = 19.0
Michael - Soccer (2):
Event 1 = 23.5
Event 2 = 25.7
Event 3 = 21.3
Event 4 = 28.0
Ariell - Swimming (3):
Event 1 = 174.5
Event 2 = 159.9
Just replace the print()s by some file writing operation.

How to check a list for a string then select the rest of that item,

At the moment I have a text file with people who swim and their times, such as this,
jack 12
sarah 20
ben 4
Now i would like to be able to search this for say sarah and for it to return the code.
This is what i currently have.
def Timers(swimmer):
myFile = open("race.txt","r")
lists = []
for eachLine in myFile:
lists += [eachLine.rstrip("\n")]
so I compiled all them into a single list, although i know i can check the list to see if they are there although i cannot work out how i would just select the time.
At this point i know if i get say, sarah 12 I can then use split and then just formate it to get the times.
Thank you for the help.

You want a dict, a python mapping instead, and read the file only once:
def Timers():
with open("race.txt","r") as myFile:
swimmers = {}
for eachLine in myFile:
if line.strip():
swimmer, timer = line.split()
swimmers[swimmer] = timer
return swimmers
The .split() call splits the line on whitespace, giving you a name and a timer string for each line.
Now Timers() returns a mapping containing all swimmer names as the keys, and their times as values. You can simply look up each and every swimmer:
timers = Timers()
print timers['sarah']

Another approach to the problem:
def Timer(swimmer):
myFile = open("race.txt", "r")
lists = myFile.readlines()
found = [l for l in lists if l.startswith(swimmer)][0] # Gets first found swimmer
time = found.split()[-1] # Gets last item (eg. time) in splitted list
myFile.close()
return time
print Timer('jack')
This works even if the swimmer is specified with both first and last name. I used the same way to open the file as you did. But you really should use the with-statement as in the previous answer!

How to strip out special characters of a list in python?

I'm currently doing a project for class and I need a little advice/help. I have a csv file that I'm extracting data from. (I am not using the csv module because I'm not familiar with and the instructor warned us that it's complicated.) I've gotten the data into lists using a function I created. It works fine, if the values are just a string of numbers, but if there is a percent sign or 'N/A' in in the cell, then I get an error. Here is the code:
def get_values(file, index):
'''(file object, int) -> list
Return a list of states and corresponding values at a prticular index in file.'''
values_list = []
for i in range(6):
file.readline()
for line in file:
line_list = line.split(',')
values_list.append(line_list[index])
values_list = [i.rstrip('%') for i in values_list]
values_list = [float(i) for i in values_list]
return values_list
while True:
try:
file_name = input('Enter in file name: ')
input_file = open( file_name, 'r')
break
except IOError:
print('File not found.')
heart_list = get_values(input_file, 1)
input_file.close()
input_file = input_file = open( 'riskfactors.csv', 'r')
HIV_list = get_values(input_file, 8)
input_file.close()
I would like to strip the %, but nothing I;ve trie has worked so far. Any suggestions?

Without seeing a complete SSCCE with sample inputs, it's hard to be sure, but I'm willing to bet the problem is this:
values_list = [i.rstrip('%') for i in values_list]
That will strip any '%' characters off the end of each value, but it won't strip any '%' characters anywhere else. And in a typical CSV file, that isn't good enough.
My guess is that you have a line like this:
foo , 10% , bar
This will split into:
['foo ', ' 10% ', ' bar\n']
So, you add the ' 10% ' to values_list, and the rstrip line will do nothing, because it doesn't end with a '%', it ends with a ' '.
Or, alternatively, it may just be this:
foo,bar,10%
So you get this:
['foo', 'bar', '10%\n']
… which has the same problem.
If this (either version) is the problem, what you want to do is something like:
values_list = [i.strip().rstrip('%')` for i in values_list]
Meanwhile, you can make this a lot simpler by just getting rid of the list comprehension. Why try to fix every row after the fact, when you can fix the single values as you add them? For example:
for line in file:
line_list = line.split(',')
value = line_list[index]
value = value.rstrip('%')
value = float(value)
values_list.append(value)
return values_list
And now, things are simple enough that you can merge multiple lines without making it less readable.
Of course you still need to deal with 'N/A'. The question is whether you want to treat that as 0.0, or None, or skip it over, or do something different, but whatever you decide, you might consider using try around the float instead of checking for 'N/A', to make your code more robust. For example:
value = value.rstrip('%')
try:
value = float(value)
except ValueError as e:
# maybe log the error, or log the error only if not N/A, or...
pass # or values_list.append(0.0), or whatever
else:
values_list.append(value)
By the way, dealing with this kind of stuff is exactly why you should use the csv module.
Here's how you use csv. Instead of this:
for line in file:
line_list = line.split(',')
Just do this:
for line_list in csv.reader(file):
That's complicated?
And it takes care of all of the subtleties with stripping whitespace (and quoting and escaping and all kinds of other nonsense that you'll forget to test for).
In other words, most likely, if you'd used csv, besides saving one line of code, you wouldn't have had this problem in the first place—and the same would be true for 8 of the next 10 problems you're going to run into.
But if you're learning from an instructor who thinks csv is too complicated… well, it's a good thing you're motivated enough to try to figure things out for yourself and ask questions outside of class, so there's some hope…

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python- File Parsing - python

Related

Taking information by line from a file to tuple

Need help iterating through a dictionary - Python

Reading data from multiple csv-text files into a single text file using python

How to check a list for a string then select the rest of that item,

How to strip out special characters of a list in python?

Categories

Resources