python object taking full list

python object taking full list - python

I am trying to write my first object-oriented program.
The code I have come up with is:
class Lattice:
'Commomn base for all sublattice'
latc = 0
def __init__(self, conc, name, pot):
self.atcon = atcon
self.name =name
self.pot=[]
Lattice.latc += 1
atcon=[]
with open(inpf, "r") as f:
for line in f:
match = re.search(reall, line)
if match:
if (match.group(1).strip() == "atcon"):
atcon.append(match.group(2).split())
print("CON =>"+str(atcon))
print("CON[0] =>"+str(atcon[0]))
lat0=Lattice(atcon[0],pot[0],name[0])
print("lat0 =>"+str(lat0.atcon))
I was expecting that lat0.atcon will be atcon[0]
But the result of last 3 print statement is:
CON =>[['0.5d0', '0.50d0'], ['0.98d0', '0.02d0'], ['0.98d0', '0.02d0'], ['0.98d0', '0.02d0']]
CON[0] =>['0.5d0', '0.50d0']
lat0 =>[['0.5d0', '0.50d0'], ['0.98d0', '0.02d0'], ['0.98d0', '0.02d0'], ['0.98d0', '0.02d0']]
I don't understand why. I am an absolute beginner and with no formal python training (learning using net and SO), so, please be patience.
Update:
After the accepted reply, this is the code I am currently working with. The problem with this is, I am reading everything in a list and then inserting those list to the lat0 i.e.
#if match found
atcon.append(match.group(2).split())
# after getting all match, in global scope
lat0=Lattice(atcon[0],name[0],pot[0])
So, I think I am either wasting the list or the object lat0. Is it possible the I directly populate lat0 when the match is found?
e.g.
#if match found for <name>
lat0=Lattice(name)
mini.py:
#!/usr/bin/env python3
import sys
import re
class Lattice:
'Commomn base for all sublattice'
latc = 0
def __init__(self, conc, names, pots):
self.atcon = conc
self.name =names
self.pot=pots
Lattice.latc += 1
reall='(.*)\s*=\s*(.*)'
inpf = sys.argv[1]
print(inpf)
with open(inpf, "r") as f:
pot=[]
name=[]
atcon=[]
for line in f:
match = re.search(reall, line)
if match:
if (match.group(1).strip() == "atcon"):
atcon.append(match.group(2).split())
if (match.group(1).strip() == "pot"):
pot.append(match.group(2).split())
if (match.group(1).strip() == "name"):
name.append(match.group(2).split())
lat0=Lattice(atcon[0],name[0],pot[0])
print("POT =>"+str(pot))
print("NAME =>"+str(name))
print("CON =>"+str(atcon))
print("CON[0] =>"+str(atcon[0]))
print("lat0 =>"+str(lat0.pot))
Typical Input
pot=Rh1.pot Rh2.pot Fe1a.pot Fe2a.pot
name=Rh-up Fe-up
atcon=0.5d0 0.50d0
pot=Rh3.pot Rh4.pot Fe3a.pot Fe4a.pot
name=Rh-up Fe-up
atcon=0.98d0 0.02d0

I'm betting that you either wrote or tested this class in IDLE. At which point I'm sure it got really confusing but the error is very simple. When you instantiate your class it's generally advisable to use the values you sent to the __init__ and not reference other ones.
class Lattice:
'Commomn base for all sublattice'
latc = 0
def __init__(self, conc, name, pot):
self.atcon = conc
self.name =name
self.pot=pot
Lattice.latc += 1
What happened is that atcon, pon and name were defined in the global scope and you referenced them as in the example bellow:
atcon=[1, 2, 3, 4, 5, 6]
pot = [7,8,9]
name = ["foo", "bar"]
class globs:
def __init__(self):
self.atcon = atcon
self.pot = pot
self.name = name
Which gave the following output:
>>> g = globs()
>>> g.atcon
[1, 2, 3, 4, 5, 6]
>>> g.pot
[7, 8, 9]
>>> g.name
['foo', 'bar']
EDIT Answer to extended edit to original question.
I think I get it. Two things still confuse me:
If I follow the code, it looks like you only want to get the 1st hit in the file as lat0. But you haven't explained if you only want the 1st hit in the file, or a list of objects of all hits.
You do a split, but according to your sample input that will still return a list i.e. ["Rh1.pot", "Rh2.pot", "Fe1a.pot", "Fe2a.pot"], I could be presumptuous but I've added a [0] after the split to retrieve only the 1st hit. Remove that if I missed the point.
Here is code that will stop the loop after the first hit is found. I declare atcon, pot, and name as lists because .split() will return a list, but I don't append the results not to waste memory. I also return Lattice object to exit the function and avoid wasting time parsing the rest of the lines.
Additionally the final if atcon and pot and name is there to avoid returning in case there is a piece of text that matches, but doesn't contain all important info. In python if of a empty list will be False. You can leave the rest of the code as is (except the print statements).
inpf = sys.argv[1]
print(inpf)
def parse(inpf):
atcon, pot, name = [], [], []
reall='(.*)\s*=\s*(.*)'
with open(inpf, "r") as f:
for line in f:
print(line)
if match:
if (match.group(1).strip() == "atcon"):
atcon = match.group(2).split()[0]
if (match.group(1).strip() == "pot"):
pot = match.group(2).split()[0]
if (match.group(1).strip() == "name"):
name = match.group(2).split()[0]
if atcon and pot and name:
return Lattice(atcon, name, pot)
lat0 = parse("test.txt")
print("lat0 =>"+str(lat0.pot)+" "+str(lat0.name)+" "+str(lat0.atcon))
Tested on
atcon=0.5d0 0.50d0
atcon=0.5d0 0.50d0
atcon=0.5d0 0.50d0
pot=Rh1.pot Rh2.pot Fe1a.pot Fe2a.pot
name=Rh-up Fe-up
atcon=0.5d0 0.50d0
pot=Rh3.pot Rh4.pot Fe3a.pot Fe4a.pot
name=Rh-up Fe-up
atcon=0.98d0 0.02d0
Output:
lat0 =>Rh1.pot Rh-up 0.5d0

Related

Reading data from a file in python, but taking specific data from the file throughout

Hi I am trying to read data from a .dat file, the file has information like this:
1
Carmella Henderson
24.52
13.5
21.76
2
Christal Piper
14.98
11.01
21.75
3
Erma Park
12.11
13.51
18.18
4
Dorita Griffin
20.05
10.39
21.35
5
Marlon Holmes
18.86
13.02
13.36
From this data I need the person number, name and the first number, like so:
1 #person number
Marlon Holmes #Name
18.86 # First number
13.02
13.36
However currently my code is reading the data from the file but not these specific parts, it is simply printing the file
This is my code currently for this specific part:
def Cucumber_Scoreboard():
with open('veggies_2015.dat', 'r') as f:
count = 0
for line in f:
count **= 1
if count % 2 == 0:
print (line)
Im not sure where it's going wrong, I tried to put the data from the file into a list and try it from there but had no success, any help would be greatly appreciated
Whole file code if needed:
def menu():
exit = False
while not exit:
print("To enter new competitior data, type new")
print("To view the competition score boards, type Scoreboard")
print("To view the Best Overall Growers Scoreboard, type Podium")
print("To review this years and previous data, type Data review")
print("Type quit to exit the program")
choice = raw_input("Which option would you like?")
if choice == 'new':
new_competitor()
elif choice == 'Scoreboard':
scoreboard_menu()
elif choice == 'Podium':
podium_place()
elif choice == 'Data review':
data_review()
elif choice == 'quit':
print("Goodbye")
raise SystemExit
"""Entering new competitor data: record competitor's name and vegtables lengths"""
def competitor_data():
global competitor_num
l = []
print("How many competitors would you like to enter?")
competitors = raw_input("Number of competitors:")
num_competitors = int(competitors)
for i in range(num_competitors):
name = raw_input("Enter competitor name:")
Cucumber = raw_input("Enter length of Cucumber:")
Carrot = raw_input("Enter length of Carrot:")
Runner_Beans = raw_input("Enter length of Runner Beans:")
l.append(competitor_num)
l.append(name)
l.append(Cucumber)
l.append(Carrot)
l.append(Runner_Beans)
competitor_num += 1
return (l)
def new_competitor():
with open('veggies_2016.txt', 'a') as f:
for item in competitor_data():
f.write("%s\n" %(item))
def scoreboard_menu():
exit = False
print("Which vegetable would you like the scoreboard for?")
vegetable = raw_input("Please type either Cucumber, Carrot or Runner Beans:")
if vegetable == "Cucumber":
Cucumber_Scoreboard()
elif vegetable == "Carrot":
Carrot_Scoreboard()
elif vegetable == "Runner Beans":
Runner_Beans_Scoreboard()
def Cucumber_Scoreboard():
with open('veggies_2015.dat', 'r') as f:
count = 0
for line in f:
count **= 1
if count % 2 == 0:
print (line)

This doesn't feel the most elegant way of doing it, but if you're going line by line, you need an extra counter in there which results in nothing happening for a set amount of "surplus" lines, before resetting your counters. Note that excess_count only needs to be incremented once because you want the final else to reset both counters, again which will not result in something being printed but still results in a skipped line.
def Cucumber_Scoreboard():
with open('name_example.txt', 'r') as f:
count = 0
excess_count = 0
for line in f:
if count < 3:
print (line)
count += 1
elif count == 3 and excess_count < 1:
excess_count += 1
else:
count = 0
excess_count = 0
EDIT: Based on your comments, I have extended this answer. Really, what you have asked should be raised as another question because it is detached from your main question. As pointed out by jDo, this is not ideal code because it will fail instantly if there is a blank line or missing data causing a line to skip artificially. Also, the new code is stuffed in around my initial answer. Use this only as an illustration of resetting counters and lists in loops, it's not stable for serious things.
from operator import itemgetter
def Cucumber_Scoreboard():
with open('name_example.txt', 'r') as f:
count = 0
excess_count = 0
complete_person_list = []
sublist = []
for line in f:
if count < 3:
print (line)
sublist.append(line.replace('\n', ''))
count += 1
elif count == 3 and excess_count < 1:
excess_count += 1
else:
count = 0
excess_count = 0
complete_person_list.append(sublist)
sublist = []
sorted_list = sorted(complete_person_list, key=itemgetter(2), reverse = True)
return sorted_list
a = Cucumber_Scoreboard()

You could make the program read the whole file line by line getting all the information. Then because the data is in a known format (eg position, name ...) skip the unneeded lines with file.readline() which will move you to the next line.

Someone recently asked how to save the player scores for his/her game and I ended up writing a quick demo. I didn't post it though since it would be a little too helpful. In its current form, it doesn't fit your game exactly but maybe you can use it for inspiration. Whatever you do, not relying on line numbers, modulo and counting will save you a headache down the line (what if someone added an empty/extra line?).
There are advantages and drawbacks associated with all datatypes. If we compare your current data format (newline separated values with no keys or category/column labels) to json, yours is actually more efficient in terms of space usage. You don't have any repetitions. In key/value pair formats like json and python dictionaries, you often repeat the keys over and over again. This makes it a human-readable format, it makes order insignificant and it means that the entire thing could be written on a single line. However, it goes without saying that repeating all the keys for every player is not efficient. If there were 100.000 players and they all had a firstname, lastname, highscore and last_score, you'd be repeating these 4 words 100.000 times. This is where actual databases become the sane choice for data storage. In your case though, I think json will suffice.
import json
import pprint
def scores_load(scores_json_file):
""" you hand this function a filename and it returns a dictionary of scores """
with open(scores_json_file, "r") as scores_json:
return json.loads(scores_json.read())
def scores_save(scores_dict, scores_json_file):
""" you hand this function a dictionary and a filename to save the scores """
with open(scores_json_file, "w") as scores_json:
scores_json.write(json.dumps(scores_dict))
# main dictionary of dictionaries - empty at first
scores_dict = {}
# a single player stat dictionary.
# add/remove keys and values at will
scores_dict["player1"] = {
"firstname" : "whatever",
"lastname" : "whateverson",
"last_score" : 3,
"highscore" : 12,
}
# another player stat dictionary
scores_dict["player2"] = {
"firstname" : "something",
"lastname" : "somethington",
"last_score" : 5,
"highscore" : 15,
}
# here, we save the main dictionary containing stats
# for both players in a json file called scores.json
scores_save(scores_dict, "scores.json")
# here, we load them again and turn them into a
# dictionary that we can easily read and write to
scores_dict = scores_load("scores.json")
# add a third player
scores_dict["player3"] = {
"firstname" : "person",
"lastname" : "personton",
"last_score" : 2,
"highscore" : 3,
}
# save the whole thing again
scores_save(scores_dict, "scores.json")
# print player2's highscore
print scores_dict["player2"]["highscore"]
# we can also invent a new category (key/value pair) on the fly if we want to
# it doesn't have to exist for every player
scores_dict["player2"]["disqualified"] = True
# print the scores dictionary in a pretty/easily read format.
# this isn't necessary but just looks nicer
pprint.pprint(scores_dict)
"""
The contents of the scores.json pretty-printed in my terminal:
$ cat scores.json | json_pp
{
"player3" : {
"firstname" : "person",
"last_score" : 2,
"lastname" : "personton",
"highscore" : 3
},
"player2" : {
"highscore" : 15,
"lastname" : "somethington",
"last_score" : 5,
"firstname" : "something"
},
"player1" : {
"firstname" : "whatever",
"last_score" : 3,
"lastname" : "whateverson",
"highscore" : 12
}
}
"""

Create a function that reads one "record" (5 lines) at a time, then call it repeatedly:
def read_data(in_file):
rec = {}
rec["num"] = in_file.next().strip()
rec["name"] = in_file.next().strip()
rec["cucumber"] = float(in_file.next().strip())
# skip 2 lines
in_file.next()
in_file.next()
return rec
EDIT: improved the code + added a usage example
The read_data() function reads one 5-line record from a file and returns its data as a dictionary. An example of using this function:
def Cucumber_Scoreboard():
with open('veggies_2015.dat', 'r') as in_file:
data = []
try:
while True:
rec = read_data(in_file)
data.append(rec)
except StopIteration:
pass
data_sorted = sorted(data, key = lambda x: x["cucumber"])
return data_sorted
cucumber = Cucumber_Scoreboard()
from pprint import pprint
pprint(cucumber)

Using Python Classes and Lists to print reports from a csv

I have a homework assignment that I have been stuck on for several days.
Basic problem description:
Incident class has properties: ID, time, type, location, narrative and status
methods: init, brief, isMorning, resolve
script takes one argument, the full path of crime report csv.
First few lines of CSV:
ID Time Type Location Narrative
1271 11:54 AM Drug Violation Wolf Ridge Report of possible drug violation. Student was referred to the university.
My code so far:
import sys
class Incident:
def __init__(self, ID, time, type, location, narrative, status):
self.ID = id
self.time = time
self.type = type
self.location = location
self.narrative = narrative
self.status = status
def brief(self):
print '''{0}: {1}, {2}
{3}
'''.format(self.ID, self.type, self.status, self.narrative)
def isMorning(self):
if 'AM' in self.time:
return True
else:
return False
def resolve(self):
if self.status == 'Pending':
self.status = 'Resolved'
try:
dataset = sys.argv[1] except IndexError:
print 'Usage: Requires full path input file name.'
sys.exit()
# Create an empty list to contain the Incident objects. crimeList = []
# Read the crime report. with open(dataset, 'r') as f:
# Read the header.
headers = f.readline().split(',')
# Read each record and parse the attributes.
for line in f:
lineList = line.strip().split(',')
reportNumber = lineList[0]
timeReported = lineList[1]
incidentType = lineList[2]
location = lineList[3]
narrative = lineList[4]
status = lineList[5].strip()
### Create initialize an Incident object instance and store it in a variable
crime = Incident(reportNumber, timeReported, incidentType, location, narrative, status)
### Append the new Incident object to the crimeList.
crimeList.append(crime)
What i'm stuck on:
I need to access the "nth" Incident in the crimeList and run various methods. I can't seem to find a way to access the item and have it functional to run methods on.
I've tried enumerating and splicing but just can't get anything to work?
Anyone have any suggestions?

Look up the nth crime from your crimeList like so: x=crimeList[n], and then call the methods on that instance: x.brief(), x.resolve(), etc.

How do I instantiate a group of objects from a text file?

I have some log files that look like many lines of the following:
<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>
<tickSize tickerId=0, field=3, size=25>
<tickSize tickerId=0, field=8, size=534349>
<tickPrice tickerId=0, field=2, price=201.82, canAutoExecute=1>
I need to define a class of type tickPrice or tickSize. I will need to decide which to use before doing the definition.
What would be the Pythonic way to grab these values? In other words, I need an effective way to reverse str() on a class.
The classes are already defined and just contain the presented variables, e.g., tickPrice.tickerId. I'm trying to find a way to extract these values from the text and set the instance attributes to match.
Edit: Answer
This is what I ended up doing-
with open(commandLineOptions.simulationFilename, "r") as simulationFileHandle:
for simulationFileLine in simulationFileHandle:
(date, time, msgString) = simulationFileLine.split("\t")
if ("tickPrice" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickPrice()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.price = float(msgList[3][6:])
msg.canAutoExecute = int(msgList[4][15:])
elif ("tickSize" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickSize()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.size = int(msgList[3][5:])
else:
print "Unsupported tick message type"

I'm not sure how you want to dynamically create objects in your namespace, but the following will at least dynamically create objects based on your loglines:
Take your line:
line = '<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>'
Remove chars that aren't interesting to us, then split the line into a list:
line = line.translate(None, ''.join('<>,'))
line = line.split(' ')
Name the potential class attributes for convenience:
line_attrs = line[1:]
Then create your object (name, base tuple, dictionary of attrs):
tickPriceObject = type(line[0], (object,), { key:value for key,value in [at.split('=') for at in line_attrs]})()
Prove it works as we'd expect:
print(tickPriceObject.field)
# 2

Approaching the problem with regex, but with the same result as tristan's excellent answer (and stealing his use of the type constructor that I will never be able to remember)
import re
class_instance_re = re.compile(r"""
<(?P<classname>\w[a-zA-Z0-9]*)[ ]
(?P<arguments>
(?:\w[a-zA-Z0-9]*=[0-9.]+[, ]*)+
)>""", re.X)
objects = []
for line in whatever_file:
result = class_instance_re.match(line)
classname = line.group('classname')
arguments = line.group('arguments')
new_obj = type(classname, (object,),
dict([s.split('=') for s in arguments.split(', ')]))
objects.append(new_obj)

Python: How to loop through blocks of lines and copy specific text within lines

Input file:
DATE: 07/01/15 # 0800 HYRULE HOSPITAL PAGE 1
USER: LINK Antibiotic Resistance Report
--------------------------------------------------------------------------------------------
Activity Date Range: 01/01/15 - 02/01/15
--------------------------------------------------------------------------------------------
HH0000000001 LINK,DARK 30/M <DIS IN 01/05> (UJ00000001) A001-01 0A ZELDA,PRINCESS MD
15:M0000001R COMP, Coll: 01/02/15-0800 Recd: 01/02/15-0850 (R#00000001) ZELDA,PRINCESS MD
Source: SPUTUM
PSEUDOMONAS FLUORESCENS LEVOFLOXACIN >=8 R
--------------------------------------------------------------------------------------------
HH0000000002 FAIRY,GREAT 25/F <DIS IN 01/06> (UJ00000002) A002-01 0A ZELDA,PRINCESS MD
15:M0000002R COMP, Coll: 01/03/15-2025 Recd: 01/03/15-2035 (R#00000002) ZELDA,PRINCESS MD
Source: URINE- STRAIGHT CATH
PROTEUS MIRABILIS CEFTRIAXONE-other R
--------------------------------------------------------------------------------------------
HH0000000003 MAN,OLD 85/M <DIS IN 01/07> (UJ00000003) A003-01 0A ZELDA,PRINCESS MD
15:M0000003R COMP, Coll: 01/04/15-1800 Recd: 01/04/15-1800 (R#00000003) ZELDA,PRINCESS MD
Source: URINE-CLEAN VOIDED SPEC
ESCHERICHIA COLI LEVOFLOXACIN >=8 R
--------------------------------------------------------------------------------------------
Completely new to programming/scripting and Python. How do you recommend looping through this sample input to grab specific text in the fields?
Each patient has a unique identifier (e.g. HH0000000001). I want to grab specific text from each line.
Output should look like:
Date|Time|Name|Account|Specimen|Source|Antibiotic
01/02/15|0800|LINK, DARK|HH0000000001|PSEUDOMONAS FLUORESCENS|SPUTUM|LEVOFLOXACIN
01/03/15|2025|FAIRY, GREAT|HH0000000002|PROTEUS MIRABILIS|URINE- STRAIGHT CATH|CEFTRIAXONE-other
Edit: My current code looks like this:
(Disclaimer: I am fumbling around in the dark, so the code is not going to be pretty at all.
input = open('report.txt')
output = open('abx.txt', 'w')
date = '' # Defining global variables outside of the loop
time = ''
name = ''
name_last = ''
name_first = ''
account = ''
specimen = ''
source = ''
output.write('Date|Time|Name|Account|Specimen|Source\n')
lines = input.readlines()
for index, line in enumerate(lines):
print index, line
if last_line_location:
new_patient = True
if not first_time_through:
output.write("{}|{}|{}, {}|{}|{}|{}\n".format(
'Date', # temporary placeholder
'Time', # temporary placeholder
name_last.capitalize(),
name_first.capitalize(),
account,
'Specimen', # temporary placeholder
'Source' # temporary placeholder
) )
last_line_location = False
first_time_through = False
for each in lines:
if line.startswith('HH'): # Extract account and name
account = line.split()[0]
name = line.split()[1]
name_last = name.split(',')[0]
name_first = name.split(',')[1]
last_line_location = True
input.close()
output.close()
Currently, the output will skip the first patient and will only display information for the 2nd and 3rd patient. Output looks like this:
Date|Time|Name|Account|Specimen|Source
Date|Time|Fairy, Great|HH0000000002|Specimen|Source
Date|Time|Man, Old|HH0000000003|Specimen|Source
Please feel free to make suggestions on how to improve any aspect of this, including output style or overall strategy.

You code actually works if you add...
last_line_location = True
first_time_through = True
...before your for loop
You asked for pointers as well though...
As has been suggested in the comments, you could look at the re module.
I've knocked something together that shows this. It may not be suitable for all data because three records is a very small sample, and I've made some assumptions.
The last item is also quite contrived because there's nothing definite to search for (such as Coll, Source). It will fail if there are no spaces at the start of the final line, for example.
This code is merely a suggestion of another way of doing things:
import re
startflag = False
with open('report.txt','r') as infile:
with open('abx.txt','w') as outfile:
outfile.write('Date|Time|Name|Account|Specimen|Source|Antibiotic\n')
for line in infile:
if '---------------' in line:
if startflag:
outfile.write('|'.join((date, time, name, account, spec, source, anti))+'\n')
else:
startflag = True
continue
if 'Activity' in line:
startflag = False
acc_name = re.findall('HH\d+ \w+,\w+', line)
if acc_name:
account, name = acc_name[0].split(' ')
date_time = re.findall('(?<=Coll: ).+(?= Recd:)', line)
if date_time:
date, time = date_time[0].split('-')
source_re = re.findall('(?<=Source: ).+',line)
if source_re:
source = source_re[0].strip()
anti_spec = re.findall('^ +(?!Source)\w+ *\w+ + \S+', line)
if anti_spec:
stripped_list = anti_spec[0].strip().split()
anti = stripped_list[-1]
spec = ' '.join(stripped_list[:-1])
Output
Date|Time|Name|Account|Specimen|Source|Antibiotic
01/02/15|0800|LINK,DARK|HH0000000001|PSEUDOMONAS FLUORESCENS|SPUTUM|LEVOFLOXACIN
01/03/15|2025|FAIRY,GREAT|HH0000000002|PROTEUS MIRABILIS|URINE- STRAIGHT CATH|CEFTRIAXONE-other
01/04/15|1800|MAN,OLD|HH0000000003|ESCHERICHIA COLI|URINE-CLEAN VOIDED SPEC|LEVOFLOXACIN
Edit:
Obviously, the variables should be reset to some dummy value between writes on case of a corrupt record. Also, if there is no line of dashes after the last record it won't get written as it stands.

append in a list causing value overwrite

I am facing a peculiar problem. I will describe in brief bellow
Suppose i have this piece of code -
class MyClass:
__postBodies = []
.
.
.
for the_file in os.listdir("/dir/path/to/file"):
file_path = os.path.join(folder, the_file)
params = self.__parseFileAsText(str(file_path)) #reads the file and gets some parsed data back
dictData = {'file':str(file_path), 'body':params}
self.__postBodies.append(dictData)
print self.__postBodies
dictData = None
params = None
Problem is, when i print the params and the dictData everytime for different files it has different values (the right thing), but as soon as the append occurs, and I print __postBodies a strange thing happens. If there are thee files, suppose A,B,C, then
first time __postBodies has the content = [{'body':{A dict with some
data related to file A}, 'file':'path/of/A'}]
second time it becomes = [{'body':{A dict with some data relaed to
file B}, 'file':'path/of/A'}, {'body':{A dict with some data relaed to
file B}, 'file':'path/of/B'}]
AND third time = [{'body':{A dict with some data relaed to file C},
'file':'path/of/A'}, {'body':{A dict with some data relaed to file C},
'file':'path/of/B'}, {'body':{A dict with some data relaed to file C},
'file':'path/of/C'}]
So, you see the 'file' key is working very fine. Just strangely the 'body' key is getting overwritten for all the entries with the one last appended.
Am i making any mistake? is there something i have to? Please point me to a direction.
Sorry if I am not very clear.
EDIT ------------------------
The return from self.__parseFileAsText(str(file_path)) call is a dict that I am inserting as 'body' in the dictData.
EDIT2 ----------------------------
as you asked, this is the code, but i have checked that params = self.__parseFileAsText(str(file_path)) call is returning a diff dict everytime.
def __parseFileAsText(self, fileName):
i = 0
tempParam = StaticConfig.PASTE_PARAMS
tempParam[StaticConfig.KEY_PASTE_PARAM_NAME] = ""
tempParam[StaticConfig.KEY_PASTE_PARAM_PASTEFORMAT] = "text"
tempParam[StaticConfig.KEY_PASTE_PARAM_EXPIREDATE] = "N"
tempParam[StaticConfig.KEY_PASTE_PARAM_PRIVATE] = ""
tempParam[StaticConfig.KEY_PASTE_PARAM_USER] = ""
tempParam[StaticConfig.KEY_PASTE_PARAM_DEVKEY] = ""
tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] = ""
for line in fileinput.input([fileName]):
temp = str(line)
temp2 = temp.strip()
if i == 0:
postValues = temp2.split("|||")
if int(postValues[(len(postValues) - 1)]) == 0 or int(postValues[(len(postValues) - 1)]) == 2:
tempParam[StaticConfig.KEY_PASTE_PARAM_NAME] = str(postValues[0])
if str(postValues[1]) == '':
tempParam[StaticConfig.KEY_PASTE_PARAM_PASTEFORMAT] = 'text'
else:
tempParam[StaticConfig.KEY_PASTE_PARAM_PASTEFORMAT] = postValues[1]
if str(postValues[2]) != "N":
tempParam[StaticConfig.KEY_PASTE_PARAM_EXPIREDATE] = str(postValues[2])
tempParam[StaticConfig.KEY_PASTE_PARAM_PRIVATE] = str(postValues[3])
tempParam[StaticConfig.KEY_PASTE_PARAM_USER] = StaticConfig.API_USER_KEY
tempParam[StaticConfig.KEY_PASTE_PARAM_DEVKEY] = StaticConfig.API_KEY
else:
tempParam[StaticConfig.KEY_PASTE_PARAM_USER] = StaticConfig.API_USER_KEY
tempParam[StaticConfig.KEY_PASTE_PARAM_DEVKEY] = StaticConfig.API_KEY
i = i+1
else:
if tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] != "" :
tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] = str(tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE])+"\n"+temp2
else:
tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] = temp2
return tempParam

You are likely returning the same dictionary with every call to MyClass.__parseFileAsText(), a couple of common ways this might be happening:
__parseFileAsText() accepts a mutable default argument (the dict that you eventually return)
You modify an attribute of the class or instance and return that instead of creating a new one each time
Making sure that you are creating a new dictionary on each call to __parseFileAsText() should fix this problem.
Edit: Based on your updated question with the code for __parseFileAsText(), your issue is that you are reusing the same dictionary on each call:
tempParam = StaticConfig.PASTE_PARAMS
...
return tempParam
On each call you are modifying StaticConfig.PASTE_PARAMS, and the end result is that all of the body dictionaries in your list are actually references to StaticConfig.PASTE_PARAMS. Depending on what StaticConfig.PASTE_PARAMS is, you should change that top line to one of the following:
# StaticConfig.PASTE_PARAMS is an empty dict
tempParam = {}
# All values in StaticConfig.PASTE_PARAMS are immutable
tempParam = dict(StaticConfig.PASTE_PARAMS)
If any values in StaticConfig.PASTE_PARAMS are mutable, you could use copy.deepcopy but it would be better to populate tempParam with those default values on your own.

What if __postBodies wasn't a class attribute, as it is defined now, but just an instance attribute?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python object taking full list - python

Related

Reading data from a file in python, but taking specific data from the file throughout

Using Python Classes and Lists to print reports from a csv

How do I instantiate a group of objects from a text file?

Python: How to loop through blocks of lines and copy specific text within lines

append in a list causing value overwrite

Categories

Resources