Removing duplicates from an attribute of a class variable - python

I'm extremely new to python and was having some trouble with removing duplicate values from an attribute of a class (I think this is the correct terminology).
Specifically I want to remove every value that is the same year. I should note that I'm printing only the first four value and searching for the first four values. The data within the attribute is actually in Yearmonthday format (example: 19070101 is the year 1907 on the first on january).
Anyways, here is my code:
import csv
import os
class Datatype:
'Data from the weather station'
def __init__ (self, inputline):
[ self.DATE,
self.PRCP] = inputline.split(',')
filename ='LAWe.txt'
LAWd = open(filename, 'r')
LAWefile = LAWd.read()
LAWd.close()
'Recognize the line endings for MS-DOS, UNIX, and Mac and apply the .split() method to the string wholeFile'
if '\r\n' in LAWefile:
filedat = LAWefile.split('\r\n') # the split method, applied to a string, produces a list
elif '\r' in LAWefile:
filedat = LAWefile.split('\r')
else:
filedat = LAWefile.split('\n')
collection = dict()
date= dict()
for thisline in filedat:
thispcp = Datatype(thisline) # here is where the Datatype object is created (running the __init__ function)
collection[thispcp.DATE] = thispcp # the dictionary will be keyed by the ID attribute
for thisID in collection.keys():
studyPRP = collection[thisID]
if studyPRP.DATE.isdigit():
list(studyPRP.DATE)
if len(date[studyPRP.DATE][0:4]):
pass #if year is seen once, then skip and go to next value in attribute
else:
print studyPRP.DATE[0:4] #print value in this case the year)
date[studyPRP.DATE]=studyPRP.DATE[0:4]
I get a this error:
Traceback (most recent call last):
File "project.py", line 61, in
if len(date[studyPRP.DATE][0:4]):
KeyError: '19770509'
A key error (which means a value isn't in a list? but it is for my data) can be fixed by using a set function (or so I've read), but I have 30,000 pieces of information I'm dealing with and it seems like you have to manually type in that info so that's not an option for me.
Any help at all would be appreciated
Sorry if this is confusing or nonsensical as I'm extremely new to python.

Replace this
if len(date[studyPRP.DATE][0:4])
by this
if len(date[studyPRP.DATE[0:4]]):
Explanation :
In the first line you are selecting the whole date as the key KeyError: '19770509' in the 4 first entry of date
In the correction you send the the first 4 character of the date(the year) in the dictionary

Don't know what exactly you want here. I'll reply based on I can help you on what.
Your error is because you are accessing your year in data before you are adding it.
Also, what you are adding to your collection is like
{
<object>.DATE: <object>
}
I don't know what you need here. Your lower for loop can be written as under:
for thisID in collection:
if thisID.isdigit():
if thisID[0:4] in date and len(date[thisID[0:4]]):
#if year is seen once, then skip and go to next
# value in attribute
pass
else:
print thisID[0:4] #print value in this case the year)
date[thisID[0:4]]=thisID[0:4]
Note your studyPRP.DATE is same as thisID.

Related

How to fix the errors in my code for making a dictionary from a file

This is what I am supposed to do in my assignment:
This function is used to create a bank dictionary. The given argument
is the filename to load. Every line in the file will look like key:
value Key is a user's name and value is an amount to update the user's
bank account with. The value should be a number, however, it is
possible that there is no value or that the value is an invalid
number.
What you will do:
Try to make a dictionary from the contents of the file.
If the key doesn't exist, create a new key:value pair.
If the key does exist, increment its value with the amount.
You should also handle cases when the value is invalid. If so, ignore that line and don't update the dictionary.
Finally, return the dictionary.
Note: All of the users in the bank file are in the user account file.
Example of the contents of 'filename' file:
Brandon: 115.5
James: 128.87
Sarah: 827.43
Patrick:'18.9
This is my code:
bank = {}
with open(filename) as f:
for line in f:
line1 = line
list1 = line1.split(": ")
if (len(list1) == 2):
key = list1[0]
value = list1[1]
is_valid = value.isnumeric()
if is_valid == True
value1 = float(value)
bank[(key)] = value1
return bank
My code returns a NoneType object which causes an error but I don't know where the code is wrong. Also, there are many other errors. How can I improve/fix the code?
Try this code and let me explain everything on it because it depends on how much you're understanding Python Data structure:
Code Syntax
adict = {}
with open("text_data.txt") as data:
"""
adict (dict): is a dictionary variable which stores the data from the iteration
process that's happening when we're separating the file syntax into 'keys' and 'values'.
We're doing that by iterate the file lines from the file and looping into them.
The `line` is each line from the func `readlines()`. Now the magic happens here,
you're playing with the line using slicing process which helps you to choose
the location of the character and play start from it. BUT,
you'll face a problem with how will you avoid the '\n' that appears at the end of each line.
you can use func `strip` to remove this character from the end of the file.
"""
adict = {line[:line.index(':')]: line[line.index(':')+1: ].strip('\n') for line in data.readlines()}
print(adict)
Output
{' Brandon': '115.5', ' James': '128.87', ' Sarah': '827.43', ' Patrick': "'18.9"}
In term of Value Validation by little of search you will find that you can check the value if its a number or not
According to Detect whether a Python string is a number or a letter
a = 5
def is_number(a):
try:
float (a)
except ValueError:
return False
else:
return True
By Calling the function
print(is_number(a))
print(is_number(1.4))
print(is_number('hello'))
OUTPUT
True
True
False
Now, let's back to our code to edit;
All you need to do is to add condition to this dict..
adict = {line[:line.index(':')]: line[line.index(':')+1: ].strip(' \n') for line in data.readlines() if is_number(line[line.index(':')+1: ].strip('\n')) == True}
OUTPUT
{'Brandon': '115.5', 'James': '128.87', 'Sarah': '827.43'}
You can check the value of the dict by passing it to the function that we created
Code Syntax
print(is_number(adict['Brandon']))
OUTPUT
True
You can add more extensions to the is_number() function if you want.
You're likely hitting the return in the else statement, which doesn't return anything (hence None). So as soon as there is one line in your file that does not contain 2 white-space separated values, you're returning nothing.
Also note that your code is only trying to assign a value to a key in a dictionary. It is not adding a value to an existing key if it already exists, as per the documentation.
This should effectively do the job:
bank = {}
with open(filename) as file:
for line in file:
key, val = line.rsplit(": ", 1) # This will split on the last ': ' avoiding ambiguity of semi-colons in the middle
# Using a trial and error method to convert number to float
try:
bank[key] = float(val)
except ValueError as e:
print(e)
return bank

Is there a way to fix appending float values to dictionaries in Python?

I'm currently writing code that's supposed to read a file that has the dates and magnitudes of the major earthquakes in recent years and return a dictionary where the keys are the dates the earthquakes took place, and then the values are the magnitudes of the earthquakes that happened on that date.
My code currently looks like this:
def magnitudedictionary():
earth = open("earthquakes.txt", "r")
magdict = {}
for line in earth:
alist = line.split()
magnitude= float(alist[0])
date = alist[1]
if date in magdict:
magdict[date].append(magnitude)
else:
magdict[date] = magnitude
earth.close()
return magdict
But whenever I try to run the code, I always get a Traceback that says:
Traceback (most recent call last):
File "/Users/MargaretJagger/PycharmProjects/Homework 6/Q2.py", line 18, in <module> magnitudedictionary()
File "/Users/MargaretJagger/PycharmProjects/Homework 6/Q2.py", line 10, in magnitudedictionary
magdict[date].append(magnitude)
AttributeError: 'float' object has no attribute 'append'
Process finished with exit code 1
I'm not quite sure what the issue is exactly, but I know that it has something to do with the float and the dictionary values not matching up.
You probably want a defaultDict for this. Then you can avoid the test and just push into the values.
Here's a simple mockup:
from collections import defaultdict
earth = '''7.6 20190801
8.2 20180201
7.1 20190801
6.5 20190801
4.2 20180201'''
magdict = defaultdict(list) # values will default to new lists
for line in earth.split('\n'):
alist = line.split(' ')
magnitude= float(alist[0])
date = alist[1]
magdict[date].append(magnitude) #magdict[date] will default to a list if the key doesn't already exist
print(magdict['20190801'])
>>> [7.6, 7.1, 6.5]
the values are the magnitudes of the earthquakes that happened on that date.
Since you are talking of “magnitudes”, plural, I assume that you want to be able to store multiple values per date. That means that you should also make sure that your dictionary values are actual lists that store multiple values, instead of just a single value.
Compare the following example dictionaries:
{
"2019-04-17": 2.1,
"2019-04-18": 3.5
}
{
"2019-04-17": [1.7, 2.5],
"2019-04-18": [3.2]
}
The first dictionary only maps the date to a single float. So for every date key, you get a single value. The second dictionary maps to a list of floats. Such a list can only contain a single value, or many (it could also contain none).
When you look at your code that sets the values in the dictionary, you can see that you actually built this with multiple values in mind:
if date in magdict:
magdict[date].append(magnitude)
else:
magdict[date] = magnitude
When there’s already the date in the dictionary, then you want to append it. Otherwise you set the date/value pair directly (which adds the key). It’s just that the way you do it, you are setting a single float value (i.e. the first dictionary type above) instead of a list of floats.
So what you need to do instead is create a list of floats here:
if date in magdict:
magdict[date].append(magnitude)
else:
magdict[date] = [magnitude]
The [magnitude] create a one-element list with magnitude as the first value. Since the value in your dictionary is now a list, calls to append() will succeed and correctly add another value to the list.
The error is in the else clause.
It should be magdict[date] = [magnitude] and not magdict[date] = magnitude.
The python dictionary has a very nice method, setdefault, that should help here:
def magnitudedictionary():
earth = open("earthquakes.txt", "r")
magdict = {}
for line in earth:
alist = line.split()
magnitude= float(alist[0])
date = alist[1]
magdict.setdefault(date, []).append(magnitude)
earth.close()
return magdict
Here is a small bit of documentation on the method in question: https://www.tutorialspoint.com/python/dictionary_setdefault.htm

Don't understand the cause of this AttributeError

What's wrong with that code? When I run it tells me this:
Traceback (most recent call last):
line 24, in <module>
people.append(Dict)
AttributeError: 'str' object has no attribute 'append'
My code:
live = 1
while live == 1:
#reading Database
dataRead = open ("db.txt","r")
if dataRead.read() != " ":
dataRead.close()
people = open ('db.txt','r').read()
do = input ('What Do You Want ? (Search , add) :\n')
#add people
if do == 'add':
#Get The New Data
n_Name = input ('enter the new name:\n')
n_age = input ('enter the new age:\n')
#new Dict
Dict = {'Name:':n_Name,'age':n_age}
people.append(Dict)
#adding people to file
dataWrite = open ("db.txt","w")
dataWrite.write(str(people))
dataWrite.close()
live = 0
The problem is, on line 24, you try to append a dictionary to a string. When you read the db file, it read it as a string. Also the code is really messy and there are a lot better ways to do it. But that's besides the point, the append() method is for lists and the variable "people" is a string, according to your error output.
It says that people is str then it doesn't have an append method. You should just concatenate strings to get them together.
Do:
people += '<append string>'
Have in mind you are trying to append a dictionary to a string. This will throw TypeError cause those type of elements can't be concatenated that way. You should do first: str(dict) to concatenate them.
You're also using a reserved word like dict as a variable. Change it to my_dict or other allowed name.

Python 3 Read a json file with missing objects within lines

I'm reading a json file with the structure below:
[{"id":1,"gender":"Male","first_name":"Andrew","last_name":"Scott","email":"ascott0#shutterfly.com","ville":"Connecticut"},
{"id":3,"first_name":"Mary","last_name":"Richards","email":"mrichards2#japanpost.jp","ville":"Minnesota"}]
So, as you can see in the second "line" the field "gender" it'is not present.I realize that because my code to read the file got wrong in this line.
my code:
import json
def jsonreader():
##Reader for json files
##Open files using json library
with open('cust_data.json') as file:
data = json.load(file)
resultlist = list()
for line in data:
print(line["id"],line["gender"])
I got the error:-
C:/xxxxx/x.py
1 Male
Traceback (most recent call last):
2 Female
File "C:/xxxxx/x", line 67, in <module>
jsonreader()
File "C:/xxxxx/x", line 56, in jsonreader
print(line["id"],line["gender"])
KeyError: 'gender'
Before answer guys, you should know that I have a method to define the default value in "gender", voila my method:
def definegender(x):
if x is None:
x = 'unknown'
return x
elif (x =='Male') or (x=='Female'):#not None:
return {
'Male':'M',
'Female': 'F'
}.get(x)
else:
return x
So, in this case, I could not use something like a default value reading the values because I need to send some value to my method.
Some one of you guys would know how should be the best way to read this kind of files when we have missing objects. Thanks
why not using a default value for your dictionary in dict.get?
print(line["id"],line.get("gender","unknown"))
And since you want to transform input further, you could nest two dict.get together, the first one with None as default value, and a new table, like this:
gender_dict = {"Male":"M", "Female":"F", None : "unknown"}
print(line["id"],gender_dict.get(line.get("gender")))
(note that you don't need your overcomplex gender conversion method anymore)
Although this already has a perfect answer, my point of view is that there can be alternatives too. So here it is:
for line in data:
try:
print(line["id"],line["gender"])
except KeyError:
print(line["id"],"Error!!! no gender!")
This is called ErrorHandling. Read the docs here:
https://docs.python.org/3.6/tutorial/errors.html
update: Do you mean this?
update2 corrected misstake
try:
gender = definegender(line["gender"])
except KeyError:
gender = definegender(None)
print(line["id"],gender)
update3: (for future purposes)
as .get() returns None by default the most simple solution would be
gender = definegender(line.get("gender"))
print(line["id"],gender)
Why not simplify this with an if-statement?
for line in data:
if "gender" in line:
print(line)

dynamic variables from list

This is similar to, Python creating dynamic global variable from list, but I'm still confused.
I get lots of flo data in a semi proprietary format. I've already used Python to strip the data to my needs and save the data into a json file called badactor.json and are saved in the following format:
[saddr as a integer, daddr as a integer, port, date as Julian, time as decimal number]
An arbitrary example [1053464536, 1232644361, 2222, 2014260, 15009]
I want to go through my weekly/monthly flo logs and save everything by Julian date. To start I want to go through the logs and create a list that is named according to the Julian date it happened, i.e, 2014260 and then save it to the same name 2014260.json. I have the following, but it is giving me an error:
#!/usr/bin/python
import sys
import json
import time
from datetime import datetime
import calendar
#these are varibles I've had to use throughout, kinda a boiler plate for now
x=0
templist2 = []
templist3 = []
templist4 = []
templist5 = []
bad = {}
#this is my list of "bad actors", list is in the following format
#[saddr as a integer, daddr as a integer, port, date as Julian, time as decimal number]
#or an arbitrary example [1053464536, 1232644361, 2222, 2014260, 15009]
badactor = 'badactor.json'
with open(badactor, 'r') as f1:
badact = json.load(f1)
f1.close()
for i in badact:
print i[3] #troubleshooting to verify my value is being read in
tmp = str(i[3])
print tmp#again just troubleshooting
tl=[i[0],i[4],i[1],i[2]]
bad[tmp]=bad[tmp]+tl
print bad[tmp]
Trying to create the variable is giving me the following error:
Traceback (most recent call last):
File "savetofiles.py", line 39, in <module>
bad[tmp]=bad[tmp]+tl
KeyError: '2014260'
By the time your code is executed, there is no key "2014260" in the "bad" dict.
Your problem is here:
bad[tmp]=bad[tmp]+tl
You're saying "add t1 to something that doesn't exist."
Instead, you seem to want to do:
bad[tmp]=tl
I suggest you initialize bad to be an empty collections.defaultdict instead of just regular built-in dict. i.e.
import collections
...
bad = collections.defaultdict(list)
That way, initial empty list values will be created for you automatically the first time a date key is encountered and the error you're getting from the bad[tmp]=bad[tmp]+tl statement will go away since it will effectively become bad[tmp]=list()+tl — where the list() call just creates and returns an empty list — the first time a particular date is encountered.
It's also not clear whether you really need the tmp = str(i[3]) conversion because values of any non-mutable type are valid dictionary (or defaultdict) keys, not just strings — assuming i[3] isn't a string already. Regardless, subsequent code would be more readable if you named the result something else, like julian_date = i[3] (or julian_date = str(i[3]) if the conversion really is required).

Categories