Split a file into a list

Split a file into a list - python

This is what I have so far:
EX1 = open('ex1.txt')
EX1READ = EX1.read()
X1READ.splitlines(0)
['jk43:23 Marfield Lane:Plainview:NY:10023',
'axe99:315 W. 115th Street, Apt. 11B:New York:NY:10027',
'jab44:23 Rivington Street, Apt. 3R:New York:NY:10002',
'ap172:19 Boxer Rd.:New York:NY:10005',
'jb23:115 Karas Dr.:Jersey City:NJ:07127',
'jb29:119 Xylon Dr.:Jersey City:NJ:07127',
'ak9:234 Main Street:Philadelphia:PA:08990']
I'd like to be able to just grab the userId from this list and print it alphabetized. Any hints would be great.

userIds = []
EX1 = open('ex1.txt')
X1READ = EX1.readlines()
for line in X1READ:
useridname = line.split(" ")[0].split(":")[0];
userid = line.split(" ")[0].split(":")[1]
userIds.append([useridname, userid])
I'm sure there are more Pythonic ways to do this, but my method will return an list of lists, where each child list in the parent list is formatted like this:
["jk43", "23"]
So to get the first user id and id number, you'd do this:
firstUserId = userIds[0][0] + ": " + userIds[0][1]
Which would output
"jk43: 23"
To sort the list of IDs, you'd do something like this:
userIds = sorted(userIds, key = id: id[0])

Assuming the part before the first ":" is the userID you could do it in a more pythonic way like that:
with open("ex1.txt") as f:
lines = f.readlines()
userIDs = [l.split(":",1)[0] for l in lines]
print "\n".join(sorted(userIDs))

This does it:
IDs=[]
with open('ex1.txt', 'rb') as f:
for line in f:
IDs.append(line.split(':')[0])
print sorted(IDs)
Prints:
['ak9', 'ap172', 'axe99', 'jab44', 'jb23', 'jb29', 'jk43']
If your user id's like jk43:23 use IDs.append(line.split(' ')[0]) and that prints:
['ak9:234', 'ap172:19', 'axe99:315', 'jab44:23', 'jb23:115', 'jb29:119', 'jk43:23']
If your user ids are the number only, use IDs.append(int(line.split(' ')[0].split(':')[1])) which prints:
[19, 23, 23, 115, 119, 234, 315]

Related

I am getting a KeyError an am not sure how to fix it

I have written out my code and when I run it, I get a KeyError:
Traceback (most recent call last):
File "C:/Users/sagar/Desktop/Sagar CS131B Files/convert_to_fixed.py", line 21, in <module>
birthdate = sample['Birthdate']
KeyError: 'Birthdate'
my code:
inputFile = 'raw.data.py'
data = list()
columns = ['First name','Last name','Telephone','Address','City','State','Birthdate']
for line in open(inputFile):
# Assuming comments in the text file as '#'
if line.startswith('#'): continue
row = line.strip().split(':')
data.append(dict(zip(columns, row)))
#print(data)
formatted_data = list()
for sample in data:
birthdate = sample['Birthdate']
mm,dd,yy = birthdate.split('/')
if len(yy)==2:
yy = '19' + yy
birthdate = '/'.join([mm,dd,yy])
sample['Birthdate'] = birthdate
modified_row = ':'.join(
[sample['Last name'], sample['First name'],
sample['Telephone'], sample['Address'],
sample['City'], sample['State'], sample['Birthdate']])
formatted_data.append(modified_row + '\n')
with open('fixed.data','w') as f:
f.writelines(formatted_data)
I have looked up how to fix it, just not sure on the execution of a try-except function. If someone could help me out with this that would be amazing..
This is what is inside the file given:
'Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:6/23/1923',
'Ephram:Hardy:293-259-5395:235 Carlton Lane:Joliet:IL:8/12/1920',
'Fred:Fardbarkle:674-843-1385:20 Parak Lane:DeLuth:MN:4/12/23',
'Igor:Chevsky:385-375-8395:3567 Populus Place:Caldwell:NJ:6/18/68',
'James:Ikeda:834-938-8376:23445 Aster Ave.:Allentown:NJ:12/1/1938',
'Jennifer:Cowan:548-834-2348:408 Laurel Ave.:Kingsville:TX:10/1/35',
'Jesse:Neal:408-233-8971:45 Rose Terrace:San Francisco:CA:2/3/2001',
'Jon:DeLoach:408-253-3122:123 Park St.:San Jose:CA:7/25/53',
'Jose:Santiago:385-898-8357:38 Fife Way:Abilene:TX:1/5/58',
'Karen:Evich:284-758-2867:23 Edgecliff Place:Lincoln:NB:11/3/35',
'Lesley:Kirstin:408-456-1234:4 Harvard Square:Boston:MA:4/22/2001',
'Lori:Gortz:327-832-5728:3465 Mirlo Street:Peabody:MA:10/2/65',
'Norma:Corder:397-857-2735:74 Pine Street:Dearborn:MI:3/28/45',
'Paco:Gutierrez:835-365-1284:454 Easy Street:Decatur:IL:2/28/53',
'Popeye:Sailor:156-454-3322:945 Bluto Street:Anywhere:USA:3/19/35',
'Sir:Lancelot:837-835-8257:474 Camelot Boulevard:Bath:WY:5/13/69',
'Steve:Blenheim:238-923-7366:95 Latham Lane:Easton:PA:11/12/1956',
'Tommy:Savage:408-724-0140:1222 Oxbow Court:Sunnyvale:CA:5/19/66',
'Vinh:Tranh:438-910-7449:8235 Maple Street:Wilmington:VM:9/23/63',
'William:Kopf:846-836-2837:6937 Ware Road:Milton:PA:9/21/46',
'Yukio:Takeshida:387-827-1095:13 Uno Lane:Ashville:NC:7/1/29',
'Zippy:Pinhead:834-823-8319:2356 Bizarro Ave.:Farmount:IL:1/1/67',
'Andy:Warhol:212-321-7654:231 East 47th Street:New York City:NY:8/6/1928'

zip() only produces results up to the shorter iterables length:
print(list(zip([1,2],[1,2,3,4,5,6]))) # [(1, 1), (2, 2)]
Your source data somehow at least one line with less elements in it that is why one of your dicts does not have the 'Birthdate' key (the last one).
You can guard against it:
data = list()
columns = ['First name', 'Last name', 'Telephone',
'Address', 'City', 'State', 'Birthdate']
# use a context manager for file open
with open(inputFile) as f:
for line in f:
# Assuming comments in the text file as '#'
if line.startswith('#'):
continue
# ignore empty lines (you can combine with above)
if not line.strip():
continue
row = line.strip().split(':')
# raise exception if not enough data found
if len(row) != len(columns):
raise AttributeError("Not enough datapoints in line: ", line)
data.append(dict(zip(columns, row)))

Search in List; Display names based on search input

I have sought different articles here about searching data from a list, but nothing seems to be working right or is appropriate in what I am supposed to implement.
I have this pre-created module with over 500 list (they are strings, yes, but is considered as list when called into function; see code below) of names, city, email, etc. The following are just a chunk of it.
empRecords="""Jovita,Oles,8 S Haven St,Daytona Beach,Volusia,FL,6/14/1965,32114,386-248-4118,386-208-6976,joles#gmail.com,http://www.paganophilipgesq.com,;
Alesia,Hixenbaugh,9 Front St,Washington,District of Columbia,DC,3/3/2000,20001,202-646-7516,202-276-6826,alesia_hixenbaugh#hixenbaugh.org,http://www.kwikprint.com,;
Lai,Harabedian,1933 Packer Ave #2,Novato,Marin,CA,1/5/2000,94945,415-423-3294,415-926-6089,lai#gmail.com,http://www.buergimaddenscale.com,;
Brittni,Gillaspie,67 Rv Cent,Boise,Ada,ID,11/28/1974,83709,208-709-1235,208-206-9848,bgillaspie#gillaspie.com,http://www.innerlabel.com,;
Raylene,Kampa,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,12/19/2001,46514,574-499-1454,574-330-1884,rkampa#kampa.org,http://www.hermarinc.com,;
Flo,Bookamer,89992 E 15th St,Alliance,Box Butte,NE,12/19/1957,69301,308-726-2182,308-250-6987,flo.bookamer#cox.net,http://www.simontonhoweschneiderpc.com,;
Jani,Biddy,61556 W 20th Ave,Seattle,King,WA,8/7/1966,98104,206-711-6498,206-395-6284,jbiddy#yahoo.com,http://www.warehouseofficepaperprod.com,;
Chauncey,Motley,63 E Aurora Dr,Orlando,Orange,FL,3/1/2000,32804,407-413-4842,407-557-8857,chauncey_motley#aol.com,http://www.affiliatedwithtravelodge.com
"""
a = empRecords.strip().split(";")
And I have the following code for searching:
import empData as x
def seecity():
empCitylist = list()
for ct in x.a:
empCt = ct.strip().split(",")
empCitylist.append(empCt)
t = sorted(empCitylist, key=lambda x: x[3])
for c in t:
city = (c[3])
print(city)
live_city = input("Enter city: ")
for cy in city:
if live_city in cy:
print(c[1])
# print("Name: "+ c[1] + ",", c[0], "| Current City: " + c[3])
Forgive my idiotic approach as I am new to Python. However, what I am trying to do is user will input the city, then the results should display the employee's last name, first name who are living in that city (I dunno if I made sense lol)
By the way, the code I used above doesn't return any answers. It just loops to the input.
Thank you for helping. Lovelots. <3
PS: the format of the empData is: first name, last name, address, city, country, birthday, zip, phone, and email

You can use the csv module to read easily a file with comma separated values
import csv
with open('test.csv', newline='') as csvfile:
records = list(csv.reader(csvfile))
def search(data, elem, index):
out = list()
for row in data:
if row[index] == elem:
out.append(row)
return out
#test
print(search(records, 'Orlando', 3))

Based on your original code, you can do it like this:
# Make list of list records, sorted by city
t = sorted((ct.strip().split(",") for ct in x.a), key=lambda x: x[3])
# List cities
print("Cities in DB:")
for c in t:
city = (c[3])
print("-", city)
# Define search function
def seecity():
live_city = input("Enter city: ")
for c in t:
if live_city == c[3]:
print("Name: "+ c[1] + ",", c[0], "| Current City: " + c[3])
seecity()
Then, after you understand what's going on, do as #Hoxha Alban suggested, and use the csv module.

The beauty of python lies in list comprehension.
empRecords="""Jovita,Oles,8 S Haven St,Daytona Beach,Volusia,FL,6/14/1965,32114,386-248-4118,386-208-6976,joles#gmail.com,http://www.paganophilipgesq.com,;
Alesia,Hixenbaugh,9 Front St,Washington,District of Columbia,DC,3/3/2000,20001,202-646-7516,202-276-6826,alesia_hixenbaugh#hixenbaugh.org,http://www.kwikprint.com,;
Lai,Harabedian,1933 Packer Ave #2,Novato,Marin,CA,1/5/2000,94945,415-423-3294,415-926-6089,lai#gmail.com,http://www.buergimaddenscale.com,;
Brittni,Gillaspie,67 Rv Cent,Boise,Ada,ID,11/28/1974,83709,208-709-1235,208-206-9848,bgillaspie#gillaspie.com,http://www.innerlabel.com,;
Raylene,Kampa,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,12/19/2001,46514,574-499-1454,574-330-1884,rkampa#kampa.org,http://www.hermarinc.com,;
Flo,Bookamer,89992 E 15th St,Alliance,Box Butte,NE,12/19/1957,69301,308-726-2182,308-250-6987,flo.bookamer#cox.net,http://www.simontonhoweschneiderpc.com,;
Jani,Biddy,61556 W 20th Ave,Seattle,King,WA,8/7/1966,98104,206-711-6498,206-395-6284,jbiddy#yahoo.com,http://www.warehouseofficepaperprod.com,;
Chauncey,Motley,63 E Aurora Dr,Orlando,Orange,FL,3/1/2000,32804,407-413-4842,407-557-8857,chauncey_motley#aol.com,http://www.affiliatedwithtravelodge.com
"""
rows = empRecords.strip().split(";")
data = [ r.strip().split(",") for r in rows ]
then you can use any condition to filter the list, like
print ( [ "Name: " + emp[1] + "," + emp[0] + "| Current City: " + emp[3] for emp in data if emp[3] == "Washington" ] )
['Name: Hixenbaugh,Alesia| Current City: Washington']

Python - How to count specific section in a list

I'm brand new to python and I'm struggling how to add certain sections of a cvs file in python. I'm not allowed to use "import cvs"
I'm importing the TipJoke CVS file from https://vincentarelbundock.github.io/Rdatasets/datasets.html
This is the only code I have so far that worked and I'm at a total loss on where to go from here.
if __name__ == '__main__':
from pprint import pprint
from string import punctuation
f = open("TipJoke.csv", "r")
tipList = []
for line in f:
#deletes the quotes
line = line.replace('"', '')
tipList.append(line)
pprint(tipList[])
Output:
[',Card,Tip,Ad,Joke,None\n',
'1,None,1,0,0,1\n',
'2,Joke,1,0,1,0\n',
'3,Ad,0,1,0,0\n',
'4,None,0,0,0,1\n',
'5,None,1,0,0,1\n',
'6,None,0,0,0,1\n',
'7,Ad,0,1,0,0\n',
'8,Ad,0,1,0,0\n',
'9,None,0,0,0,1\n',
'10,None,0,0,0,1\n',
'11,None,1,0,0,1\n',
'12,Ad,0,1,0,0\n',
'13,None,0,0,0,1\n',
'14,Ad,1,1,0,0\n',
'15,Joke,1,0,1,0\n',
'16,Joke,0,0,1,0\n',
'17,Joke,1,0,1,0\n',
'18,None,0,0,0,1\n',
'19,Joke,0,0,1,0\n',
'20,None,0,0,0,1\n',
'21,Ad,1,1,0,0\n',
'22,Ad,1,1,0,0\n',
'23,Ad,0,1,0,0\n',
'24,Joke,0,0,1,0\n',
'25,Joke,1,0,1,0\n',
'26,Joke,0,0,1,0\n',
'27,None,1,0,0,1\n',
'28,Joke,1,0,1,0\n',
'29,Joke,1,0,1,0\n',
'30,None,1,0,0,1\n',
'31,Joke,0,0,1,0\n',
'32,None,1,0,0,1\n',
'33,Joke,1,0,1,0\n',
'34,Ad,0,1,0,0\n',
'35,Joke,0,0,1,0\n',
'36,Ad,1,1,0,0\n',
'37,Joke,0,0,1,0\n',
'38,Ad,0,1,0,0\n',
'39,Joke,0,0,1,0\n',
'40,Joke,0,0,1,0\n',
'41,Joke,1,0,1,0\n',
'42,None,0,0,0,1\n',
'43,None,0,0,0,1\n',
'44,Ad,0,1,0,0\n',
'45,None,0,0,0,1\n',
'46,None,0,0,0,1\n',
'47,Ad,0,1,0,0\n',
'48,Joke,0,0,1,0\n',
'49,Joke,1,0,1,0\n',
'50,None,1,0,0,1\n',
'51,None,0,0,0,1\n',
'52,Joke,1,0,1,0\n',
'53,Joke,1,0,1,0\n',
'54,Joke,0,0,1,0\n',
'55,None,1,0,0,1\n',
'56,Ad,0,1,0,0\n',
'57,Joke,0,0,1,0\n',
'58,None,0,0,0,1\n',
'59,Ad,0,1,0,0\n',
'60,Joke,1,0,1,0\n',
'61,Ad,0,1,0,0\n',
'62,None,1,0,0,1\n',
'63,Joke,0,0,1,0\n',
'64,Ad,0,1,0,0\n',
'65,Joke,0,0,1,0\n',
'66,Ad,0,1,0,0\n',
'67,Ad,0,1,0,0\n',
'68,Ad,0,1,0,0\n',
'69,None,0,0,0,1\n',
'70,Joke,1,0,1,0\n',
'71,None,1,0,0,1\n',
'72,None,0,0,0,1\n',
'73,None,0,0,0,1\n',
'74,Joke,0,0,1,0\n',
'75,Ad,1,1,0,0\n',
'76,Ad,0,1,0,0\n',
'77,Ad,1,1,0,0\n',
'78,Joke,0,0,1,0\n',
'79,Joke,0,0,1,0\n',
'80,Ad,1,1,0,0\n',
'81,Ad,0,1,0,0\n',
'82,None,0,0,0,1\n',
'83,Ad,0,1,0,0\n',
'84,Joke,0,0,1,0\n',
'85,Joke,0,0,1,0\n',
'86,Ad,1,1,0,0\n',
'87,None,1,0,0,1\n',
'88,Joke,1,0,1,0\n',
'89,Ad,0,1,0,0\n',
'90,None,0,0,0,1\n',
'91,None,0,0,0,1\n',
'92,Joke,0,0,1,0\n',
'93,Joke,0,0,1,0\n',
'94,Ad,0,1,0,0\n',
'95,Ad,0,1,0,0\n',
'96,Ad,0,1,0,0\n',
'97,Joke,1,0,1,0\n',
'98,None,0,0,0,1\n',
'99,None,0,0,0,1\n',
'100,None,1,0,0,1\n',
'101,Joke,0,0,1,0\n',
'102,Joke,0,0,1,0\n',
'103,Ad,1,1,0,0\n',
'104,Ad,0,1,0,0\n',
'105,Ad,0,1,0,0\n',
'106,Ad,1,1,0,0\n',
'107,Ad,0,1,0,0\n',
'108,None,0,0,0,1\n',
'109,Ad,0,1,0,0\n',
'110,Joke,1,0,1,0\n',
'111,None,0,0,0,1\n',
'112,Ad,0,1,0,0\n',
'113,Ad,0,1,0,0\n',
'114,None,0,0,0,1\n',
'115,Ad,0,1,0,0\n',
'116,None,0,0,0,1\n',
'117,None,0,0,0,1\n',
'118,Ad,0,1,0,0\n',
'119,None,1,0,0,1\n',
'120,Ad,1,1,0,0\n',
'121,Ad,0,1,0,0\n',
'122,Ad,1,1,0,0\n',
'123,None,0,0,0,1\n',
'124,None,0,0,0,1\n',
'125,Joke,1,0,1,0\n',
'126,Joke,1,0,1,0\n',
'127,Ad,0,1,0,0\n',
'128,Joke,0,0,1,0\n',
'129,Joke,0,0,1,0\n',
'130,Ad,0,1,0,0\n',
'131,None,0,0,0,1\n',
'132,None,0,0,0,1\n',
'133,None,0,0,0,1\n',
'134,Joke,1,0,1,0\n',
'135,Ad,0,1,0,0\n',
'136,None,0,0,0,1\n',
'137,Joke,0,0,1,0\n',
'138,Ad,0,1,0,0\n',
'139,Ad,0,1,0,0\n',
'140,None,0,0,0,1\n',
'141,Joke,0,0,1,0\n',
'142,None,0,0,0,1\n',
'143,Ad,0,1,0,0\n',
'144,None,1,0,0,1\n',
'145,Joke,0,0,1,0\n',
'146,Ad,0,1,0,0\n',
'147,Ad,0,1,0,0\n',
'148,Ad,0,1,0,0\n',
'149,Joke,1,0,1,0\n',
'150,Ad,1,1,0,0\n',
'151,Joke,1,0,1,0\n',
'152,None,0,0,0,1\n',
'153,Ad,0,1,0,0\n',
'154,None,0,0,0,1\n',
'155,None,0,0,0,1\n',
'156,Ad,0,1,0,0\n',
'157,Ad,0,1,0,0\n',
'158,Joke,0,0,1,0\n',
'159,None,0,0,0,1\n',
'160,Joke,1,0,1,0\n',
'161,None,1,0,0,1\n',
'162,Ad,1,1,0,0\n',
'163,Joke,0,0,1,0\n',
'164,Joke,0,0,1,0\n',
'165,Ad,0,1,0,0\n',
'166,Joke,1,0,1,0\n',
'167,Joke,1,0,1,0\n',
'168,Ad,0,1,0,0\n',
'169,Joke,1,0,1,0\n',
'170,Joke,0,0,1,0\n',
'171,Ad,0,1,0,0\n',
'172,Joke,0,0,1,0\n',
'173,Joke,0,0,1,0\n',
'174,Ad,0,1,0,0\n',
'175,None,0,0,0,1\n',
'176,Joke,1,0,1,0\n',
'177,Ad,0,1,0,0\n',
'178,Joke,0,0,1,0\n',
'179,Joke,0,0,1,0\n',
'180,None,0,0,0,1\n',
'181,None,0,0,0,1\n',
'182,Ad,0,1,0,0\n',
'183,None,0,0,0,1\n',
'184,None,0,0,0,1\n',
'185,None,0,0,0,1\n',
'186,None,0,0,0,1\n',
'187,Ad,0,1,0,0\n',
'188,None,1,0,0,1\n',
'189,Ad,0,1,0,0\n',
'190,Ad,0,1,0,0\n',
'191,Ad,0,1,0,0\n',
'192,Joke,1,0,1,0\n',
'193,Joke,0,0,1,0\n',
'194,Ad,0,1,0,0\n',
'195,None,0,0,0,1\n',
'196,Joke,1,0,1,0\n',
'197,Joke,0,0,1,0\n',
'198,Joke,1,0,1,0\n',
'199,Ad,0,1,0,0\n',
'200,None,0,0,0,1\n',
'201,Joke,1,0,1,0\n',
'202,Joke,0,0,1,0\n',
'203,Joke,0,0,1,0\n',
'204,Ad,0,1,0,0\n',
'205,None,0,0,0,1\n',
'206,Ad,0,1,0,0\n',
'207,Ad,0,1,0,0\n',
'208,Joke,0,0,1,0\n',
'209,Ad,0,1,0,0\n',
'210,Joke,0,0,1,0\n',
'211,None,0,0,0,1\n']
I'm currently trying to find the Total number of entries of the specified card type and the Percentage of tips given for the specified card type with two decimal places of precision. The tip column is the 0 or 1 right after the card type (None, Ad, Joke).

if you are allowed with pandas library then
import pandas as pd
df = pd.read_csv("TipJoke.csv")
df is a pandas dataframe object in which you can perform multiple filtering task according to your need.
for example if you want to get data for Joke you can filter like this:
print(df[df["Card"] == "Joke"])
Though, i'm just providing you the direction , not whole logic for your question.

This works
from pprint import pprint
from string import punctuation
counts = {"Joke": 0, "Ad": 0, "None": 0}
with open("TipJoke.csv", "r") as f:
for line in f:
line_clean = line.replace('"', "").replace("\n", "").split(",")
try:
counts[line_clean[1]] += int(line_clean[2])
except:
pass
print(counts)

Nested dictionary keeps overwriting data

I am trying to read in from a data file that has lines like:
2007 ANDREA 30 31.40 -71.90 05/13/18Z 25 1007 LOW
2007 ANDREA 31 31.80 -69.40 05/14/00Z 25 1007 LOW
I am trying to create a nested dictionary that has a key holding the year and then the nested dictionary will hold the name and a tuple containing statistics. I would like the return value to look like this:
{'2007': {'ANDREA': [(31.4, -71.9, '05/13/18Z', 25.0, 1007.0), (31.8, -69.4, '05/14/00Z', 25.0, 1007.0)]
However when I run the code it returns only one set of statistics. It seems to be overwriting itself because I am getting that last line of statistics in the txt file returned:
{'2007': {'ANDREA': [(31.8, -69.4, '05/14/00Z', 25.0, 1007.0)]
Here is the code:
def create_dictionary(fp):
'''Remember to put a docstring here'''
dict1 = {}
f = []
for line in fp:
a = line.split()
f.append(a)
for item in f:
a = (float(item[3]), float(item[4]), item[5], float(item[6]),
float(item[7]))
dict1 = update_dictionary(dict1, item[0], item[1], a))
print(dict1)
def update_dictionary(dictionary, year, hurricane_name, data):
if year not in dictionary:
dictionary[year] = {}
if hurricane_name not in dictionary:
dictionary[year][hurricane_name] = [data]
else:
dictionary[year][hurricane_name].append(data)
else:
if hurricane_name not in dictionary:
dictionary[year][hurricane_name] = [data]
else:
dictionary[year][hurricane_name].append(data)
return dictionary

These lines:
if hurricane_name not in dictionary:
...should be:
if hurricane_name not in dictionary[year]:

Since I was a little late here's a suggestion instead of an answer to your original question. You can simplify the logic a bit because when the year doesn't exist then the name also can't exist for that year. Everything can be put in a single function and using a "with" statement to open the file will ensure it is properly closed even if your program encounters an error.
def build_dict(file_path):
result = {}
with open(file_path, 'r') as f:
for line in f:
items = line.split()
year, name, data = items[0], items[1], tuple(items[2:])
if year in result:
if name in result[year]:
result[year][name].append(data)
else:
result[year][name] = [data]
else:
result[year] = {name: [data]}
return result
print(build_dict(file_path))
Output:
{'2007': {'ANDREA': [('30', '31.40', '-71.90', '05/13/18Z', '25', '1007', 'LOW'), ('31', '31.80', '-69.40', '05/14/00Z', '25', '1007', 'LOW')]}}

read txt file into dictionary

I have the following type of document, where each person might have a couple of names and an associated description of features:
New person
name: ana
name: anna
name: ann
feature: A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.
New person
name: tom
name: thomas
name: thimoty
name: tommy
feature: A 32-year old male that is known to be deaf.
New person
.....
What I would like is to read this file in a python dictionary, where each new person is id-ed.
i.e. Person with ID 1 will have the names ['ann','anna','ana']
and will have the feature ['A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.' ]
Any suggestions?

Assuming that your input file is lo.txt. It can be added to dictionary this way:
file = open('lo.txt')
final_data = []
feature = []
names = []
for line in file.readlines():
if ("feature") in line:
data = line.replace("\n","").split(":")
feature=data[1]
final_data.append({
'names': names,
'feature': feature
})
names = []
feature = []
if ("name") in line:
data = line.replace("\n","").split(":")
names.append(data[1])
print final_data

Something like this might work
result = {}
f = open("document.txt")
contents = f.read()
info = contents.split('==== new person ===')
for i in range(len(info)):
info[i].split('\n')
names = []
features = []
for j in range(len(info[i])):
info[i][j].split(':')
if info[i][j][0] == 'name':
names.append(info[i][j][1])
else:
features.append(info[i][j][1])
result[i] = {'names': names,'features': features}
print(result)
This should give you something like:
{0: {'names': ['ana', 'anna', 'ann'], features:['...', '...']}}
e.t.c

Here is code that may work for you:
f = open("documents.txt").readlines()
f = [i.strip('\n') for i in f]
final_condition = f[len(f)-1]
f.remove(final_condition)
names = [i.split(":")[1] for i in f]
the_dict = {}
the_dict["names"] = names
the_dict["features"] = final_condition
print the_dict
All it does is split the names at ":" and take the last element of the resulting list (the names) and keep it for the list names.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split a file into a list - python

Assuming the part before the first ":" is the userID you could do it in a more pythonic way like that: with open("ex1.txt") as f: lines = f.readlines() userIDs = [l.split(":",1)[0] for l in lines] print "\n".join(sorted(userIDs))

Related

I am getting a KeyError an am not sure how to fix it

Search in List; Display names based on search input

Python - How to count specific section in a list

Nested dictionary keeps overwriting data

read txt file into dictionary

Categories

Resources